Design Patterns and Video Games

2D Strategy Game (14): Numpy

Python is a great language, but it is pretty slow, mainly when we process many items in loops. One way to speed up these parts is to use Numpy: its functions can process data efficiently.

This post is part of the 2D Strategy Game series

Using the optimizations I present in this post, the game can handle huge maps, for instance, with one million cells:

As we can see in this video, the game update time remains low (below 20ms).

In the following, we assume that we import Numpy with np alias:

import numpy as np

With this alias, the syntax np.func() is equivalent to numpy.func().

World layers

Create the array: to use the Numpy functions, we need to put our data in Numpy arrays. They can have many dimensions (1D, 2D, 3D, or more) and can hold scalars (integers, floats, bool, etc.).

In the Layer class, we create a Numpy array to store the value of the layer cells. For a world of size (width, height), we can create a 2D Numpy array of 32-bit integers in the following way:

cells = np.zeros([width, height], dtype=np.int32)

The zeros() Numpy function initializes the array with zeros. If we want to initialize with a specific value, for instance defaultValue, we can use the full() function:

cells = np.full([width, height], defaultValue, dtype=np.int32)

Borders: we use a trick to ease the management of layer borders: we add rows and columns all around:

cells = np.full([width+2, height+2], defaultValue, dtype=np.int32)

With this setup, the inner part contains the actual cells, with x indices from 1 to with and y indices from 1 to height. It simplifies a lot the handling of borders, for instance, in the getValue() method:

def getValue(self, coords: Tuple[int, int], 
             direction: Optional[Direction] = None) -> CellValue:
    x, y = coords[0], coords[1]
    if direction:
        if direction == Direction.LEFT:
            return CellValue(self.__cells[x, y + 1])
        if direction == Direction.RIGHT:
            return CellValue(self.__cells[x + 2, y + 1])
        if direction == Direction.TOP:
            return CellValue(self.__cells[x + 1, y])
        if direction == Direction.BOTTOM:
            return CellValue(self.__cells[x + 1, y + 2])
    return CellValue(self.__cells[x + 1, y + 1])

Without this trick, we have to add checks in many places. For example, for the left direction case, without extra rows/columns in the cell array, we must check the x coordinates:

# Code without the border trick
if direction == Direction.LEFT:
    if x < 1:
        return self.__defaultValue
    return CellValue(self.__cells[x - 1, y])

Note that we cast to a CellValue because Numpy cell items are not Python integers but integers specific to Numpy. In our case, cell items are instances of np.int32, not int. These objects behave like Python integers, but operations on them are slow: we better cast them to regular integers (remind that CellValue is an IntEnum, which is like an int but with extra features).

Minimap

We render the minimap as before, where each pixel represents a world cell. However, the computation time with Numpy is much lower: on a test machine, with a world made of one million cells, the new code lasts 0.4s instead of 8s (20x speed up)!

Slicing: for this specific task, we need the cell arrays without the additional rows/columns to handle world borders (an internal trick of the Layer class). We can get these cells with slicing: we add a new cells property in the Layer class that returns this:

@property
def cells(self) -> np.ndarray:
    return self.__cells[1:-1, 1:-1]

The expression [1:-1] is a slice that represents all indices from the second one (index=1) to the second last (index=width-2 or height-2). Hence, the result of array[slice, slice] is also a Numpy array: we can use all Numpy functions on it as if it was a usual array. Slices with Numpy arrays are views: a change on the sliced array also changes the original array (and vise-versa). Furthermore, there is no overhead with regular slices (like the one in this example): the computation time is the same on the original and the sliced array.

Minimap computation: in the renderMinimap() method of the MinimapFrame class, we compute the minimap in the following way:

minimapArray: Optional[np.ndarray] = None
for name, layer in layers:
    colorMap = self.__colors[name]
    colors = colorMap[layer.cells]
    if minimapArray is None:
        minimapArray = colors
    else:
        transparent = minimapArray[..., 3] == 0
        minimapArray[transparent] = colors[transparent]
if minimapArray is not None:
    surface = pygame.surfarray.make_surface(minimapArray[..., 0:3])
    self.__minimapSurface = cast(Surface, surface.convert_alpha())

We iterate through layers as before, from the last to the first one (line 2). Then, we get a color map for the current layer (line 3). It is a Numpy 2D array that contains a 4D vector (red, green, blue, alpha) for each possible cell value. We assume that only the cell values corresponding to the current layer are non-transparent (alpha is not zero).

We use the color map to compute the pixel colors of the cells of the current layer (line 4). If this line is not clear, it is equivalent to:

# colors = colorMap[layer.cells]
width, height = layer.cells.shape
colors = np.zeros([width, height, 4], dtype=np.int32)
for y in range(height):
    for x in range(width):
        r = colorMap[layer.cells[x, y], 0]
        g = colorMap[layer.cells[x, y], 1]
        b = colorMap[layer.cells[x, y], 2]
        a = colorMap[layer.cells[x, y], 3]
        colors[x, y, 0] = r
        colors[x, y, 1] = g
        colors[x, y, 2] = b
        colors[x, y, 3] = a

The running time of colors = colorMap[layer.cells] is dramatically lower than the equivalent Python code!

Back to the minimap computation code, we init the minimap in the first iteration (lines 5-6). Then, in the following iterations, we copy news colors on transparent pixels in the current minimap (lines 7-9). For that operation, we compute a boolean array where each value corresponds to a transparent color (line 8). We use the slice expression [..., 3], meaning we want all first dimensions but only the fourth in the last. Line 8 is equivalent to:

# transparent = minimapArray[..., 3] == 0
width, height, _ = minimapArray.shape
transparent = np.zeros([width, height], dtype=np.bool)
for y in range(height):
    for x in range(width):
        if minimapArray[x, y, 3] == 0:
            transparent[x, y] = True
        else:
            transparent[x, y] = False

Line 9 of the minimap computation code only copies the values for indices where transparent is True. It is equivalent to:

# minimapArray[transparent] = colors[transparent]
width, height, _ = minimapArray.shape
for y in range(height):
    for x in range(width):
        if transparent[x, y]:
            minimapArray[x, y] = colors[x, y]

The last lines of the minimap computation code convert the Numpy array to a Pygame surface. The slice expression [..., 0:3] means we want the values from all the first dimensions, but only the ones of indices zero (included) to three (excluded) in the last dimension.

Minimap partial computation: the previous code computes the minimap for the whole world. When we only change a part of the world, we don't need to recompute everything. Thanks to Numpy, we only need to run the same computation on a slice of the layer cells! In other words, we can replace layer.cells with layer.cells[x-1:x+1,y-1:y+1] for instance, to update the local area around coordinates (x,y).

Note that we can store a slice in a variable using the Numpy function n_(), for instance:

cellsSlice = np.s_[cellMinX:cellMaxX, cellMinY:cellMaxY]

Then, we can use cellsSlice as a slice expression, like layer.cells[cellsSlice].

Layer rendering

I describe in this section the rendering of the impassable layer (mountains, rivers, etc.). The other ones use similar concepts.

Layer renderer: in the render() method of the ImpassableComponent class, we first create an instance of a new class LayerRenderer (see details in attached code):

renderer = self.createRenderer(surface)

The purpose of this class is to compute and store many valuable objects. For instance, we can get the slices of cells to render in the cellsSlice attribute. Using this slice, we can get the array of visible cell values:

cellsSlice = renderer.cellsSlice
cells = self.layer.cells[cellsSlice]

As a result, all computations in the following with cells are limited to the size of the rendered area: whatever the world map size, the processing time is the same.

Main cells: we start the rendering with all non-river cells:

validCells = cells != CellValue.NONE
validCells &= cells != CellValue.IMPASSABLE_RIVER
noise = self.noise[cellsSlice]
for dest, value, cell in renderer.coords(validCells):
    rects = tilesRects[value]
    rectIndex = int(noise[cell]) % len(rects)
    surface.blit(tileset, dest, rects[rectIndex])

We first compute the indices of non-empty cells (line 1). Then, we compute the indices of non-river cells and combine them with the previous ones (line 2). The operator &= is similar to += except that the operation is "logical and". These two first lines are as if we compute for each cell (x,y) the following:

if cells[x,y] != CellValue.NONE \
        and cells[x,y] != CellValue.IMPASSABLE_RIVER:
    validCells[x,y] = True
else:
    validCells[x,y] = False

Then, we get the noise values for the currently visible cells (line 3). It works as before and contains random integer values, except that it is a Numpy array instead of a Python list. We create this array in the layer constructor:

generator = np.random.default_rng(seed=0)
self.noise = generator.integers(0, 100000, size=self.__layer.size)
self.noise.flags.writeable = False

Like Python random generators, we can define a seed and always get the same random values (line 1). We can also tell Numpy that this array is read-only (line 2). Thanks to this option, we can ensure that the value of this array will never change.

Back to the code for rendering non-river cells, we use the coords() generator in the LayerRenderer class (line 4). It generates information for cells for which validCells is True:

cellCoords = np.transpose(np.nonzero(validCells))
for cell in cellCoords:
    cellRelX = int(cell[0])
    cellRelY = int(cell[1])
    value = CellValue(cells[cellRelX, cellRelY])
    destX = cellRelX * tileWidth - viewX % tileWidth
    destY = cellRelY * tileHeight - viewY % tileHeight
    yield (destX, destY), value, (cellRelX, cellRelY)

The first line converts a Numpy array with boolean values to a 2D array with coordinates with True values. For instance, it converts [[True,False],[False,True]] to [[0,0],[1,1]].

Thanks to this filtering, we only process non-empty and non-rivers cells in this loop. If there are only empty cells, the for loop does nothing! Also, note that the overhead (computing the valid cells) is negligible: a Numpy operation on an array of hundreds of values is fast (a few microseconds).

River cells: for these cells, we need the tile border codes as before:

neighbors = self.layer.getAreaNeighbors4(cellsBox)
masks = neighbors == CellValue.IMPASSABLE_RIVER
masks |= neighbors == CellValue.IMPASSABLE_MOUNTAIN
groundNeighbors = self.__ground.getAreaNeighbors4(cellsBox)
masks |= groundNeighbors == CellValue.GROUND_SEA
codes = code4np(masks)

valid = cells == CellValue.IMPASSABLE_RIVER
for dest, value, cell in renderer.coords(valid):
    rect = self.__river_code2rect[codes[cell]]
    surface.blit(tileset, dest, rect)

This time, we don't compute the code cell by cell but run operations one after another on all (visible) cells. We first get a 3D Numpy array such as neighbors[x,y,0] contains the cell value at (x-1,y), neighbors[x,y,1] at (x,y-1), neighbors[x,y,2] at (x,y+1) and neighbors[x,y,3] at (x+1,y). It is computed in the getAreaNeighbors4() of the Layer class:

def getAreaNeighbors4(self, cellsBox: Tuple[int, int, int, int]) -> np.ndarray:
    minX, maxX, minY, maxY = cellsBox
    top = self.__cells[minX + 1:maxX + 1, minY:maxY]
    left = self.__cells[minX:maxX, minY + 1:maxY + 1]
    right = self.__cells[minX + 2:maxX + 2, minY + 1:maxY + 1]
    bottom = self.__cells[minX + 1:maxX + 1, minY + 2:maxY + 2]
    return np.stack((left, top, bottom, right), axis=2)

Note that we do not run any border checks since we added extra rows/columns in the layer cell array! It is also why we shift all coordinates by (1,1), as in the getValue() method. The stack() Numpy function concatenates arrays along a new dimension (line 7). In our case, we want to create a 3D array from four 2D arrays and create a new dimension at the end.

Back to the code for rendering river cells, we compute a mask array as we did previously (lines 2-5). We replace the mask4() function call by Numpy array comparison ==, combine4() by a logical or |, and code4() by code4np(). These operations are the same, except that they process all values in the array instead of one by one. The code4np(masks) returns masks.dot(weights) with weights4 = np.array((1, 2, 4, 8)). The dot() Numpy function performs a weighted sum on the last dimension, e.g. a.dot(b)[x,y] = sum_k(a[x,y,k] * b[x,y,k])

River mouths: for river mouths, we combine the masks and the codes to find which cells to render:

valid = cells == CellValue.NONE
groundCells = self.__ground.cells[cellsSlice]
valid &= groundCells == CellValue.GROUND_SEA
codes = code4np(neighbors == CellValue.IMPASSABLE_RIVER)
valid &= codes != 0
cellMinX, _, cellMinY, _ = renderer.cellsBox
for dest, _, cell in renderer.coords(valid):
    for direction in directions:
        cellX, cellY = cellMinX + cell[0], cellMinY + cell[1]
        value = self.layer.getValue((cellX, cellY), direction)
        if value == CellValue.IMPASSABLE_RIVER:
            rect = self.__riverMouth[direction]
            surface.blit(tileset, dest, rect)

We first build a mask for empty cells in the current layer (line 1) and sea cells in the ground layer (lines 2-3). Then, we need to find cells surrounded by at least one river cell. If we compute codes for rivers (line 4), cells with a non-zero code satisfy this property (line 5)! Then, each cell selected by valid has one or more rivers around, but we don't know which ones. It is the reason why we use a small for loop (lines 7-10). Note that we could also create a different selection for each case or combine cases, which would lead to a lot of code.

Final program

Download code and assets

In the next post, we add units.