 ## 2D Strategy Game (14): Numpy

Python is a great language, but it is pretty slow, mainly when we process many items in loops. One way to speed up these parts is to use Numpy: its functions can process data efficiently.

This post is part of the 2D Strategy Game series

Using the optimizations I present in this post, the game can handle huge maps, for instance, with one million cells:

As we can see in this video, the game update time remains low (below 20ms).

In the following, we assume that we import Numpy with `np` alias:

``import numpy as np``

With this alias, the syntax `np.func()` is equivalent to `numpy.func()`.

## World layers

Create the array: to use the Numpy functions, we need to put our data in Numpy arrays. They can have many dimensions (1D, 2D, 3D, or more) and can hold scalars (integers, floats, bool, etc.).

In the `Layer` class, we create a Numpy array to store the value of the layer cells. For a world of size (width, height), we can create a 2D Numpy array of 32-bit integers in the following way:

``cells = np.zeros([width, height], dtype=np.int32)``

The `zeros()` Numpy function initializes the array with zeros. If we want to initialize with a specific value, for instance `defaultValue`, we can use the `full()` function:

``cells = np.full([width, height], defaultValue, dtype=np.int32)``

Borders: we use a trick to ease the management of layer borders: we add rows and columns all around:

``cells = np.full([width+2, height+2], defaultValue, dtype=np.int32)``

With this setup, the inner part contains the actual cells, with x indices from 1 to with and y indices from 1 to height. It simplifies a lot the handling of borders, for instance, in the `getValue()` method:

``````def getValue(self, coords: Tuple[int, int],
direction: Optional[Direction] = None) -> CellValue:
x, y = coords, coords
if direction:
if direction == Direction.LEFT:
return CellValue(self.__cells[x, y + 1])
if direction == Direction.RIGHT:
return CellValue(self.__cells[x + 2, y + 1])
if direction == Direction.TOP:
return CellValue(self.__cells[x + 1, y])
if direction == Direction.BOTTOM:
return CellValue(self.__cells[x + 1, y + 2])
return CellValue(self.__cells[x + 1, y + 1])``````

Without this trick, we have to add checks in many places. For example, for the left direction case, without extra rows/columns in the cell array, we must check the x coordinates:

``````# Code without the border trick
if direction == Direction.LEFT:
if x < 1:
return self.__defaultValue
return CellValue(self.__cells[x - 1, y])``````

Note that we cast to a `CellValue` because Numpy cell items are not Python integers but integers specific to Numpy. In our case, cell items are instances of `np.int32`, not `int`. These objects behave like Python integers, but operations on them are slow: we better cast them to regular integers (remind that `CellValue` is an `IntEnum`, which is like an `int` but with extra features).

## Minimap

We render the minimap as before, where each pixel represents a world cell. However, the computation time with Numpy is much lower: on a test machine, with a world made of one million cells, the new code lasts 0.4s instead of 8s (20x speed up)!

Slicing: for this specific task, we need the cell arrays without the additional rows/columns to handle world borders (an internal trick of the `Layer` class). We can get these cells with slicing: we add a new `cells` property in the `Layer` class that returns this:

``````@property
def cells(self) -> np.ndarray:
return self.__cells[1:-1, 1:-1]``````

The expression `[1:-1]` is a slice that represents all indices from the second one (index=1) to the second last (index=width-2 or height-2). Hence, the result of `array[slice, slice]` is also a Numpy array: we can use all Numpy functions on it as if it was a usual array. Slices with Numpy arrays are views: a change on the sliced array also changes the original array (and vise-versa). Furthermore, there is no overhead with regular slices (like the one in this example): the computation time is the same on the original and the sliced array.

Minimap computation: in the `renderMinimap()` method of the `MinimapFrame` class, we compute the minimap in the following way:

``````minimapArray: Optional[np.ndarray] = None
for name, layer in layers:
colorMap = self.__colors[name]
colors = colorMap[layer.cells]
if minimapArray is None:
minimapArray = colors
else:
transparent = minimapArray[..., 3] == 0
minimapArray[transparent] = colors[transparent]
if minimapArray is not None:
surface = pygame.surfarray.make_surface(minimapArray[..., 0:3])
self.__minimapSurface = cast(Surface, surface.convert_alpha())``````

We iterate through layers as before, from the last to the first one (line 2). Then, we get a color map for the current layer (line 3). It is a Numpy 2D array that contains a 4D vector (red, green, blue, alpha) for each possible cell value. We assume that only the cell values corresponding to the current layer are non-transparent (alpha is not zero).

We use the color map to compute the pixel colors of the cells of the current layer (line 4). If this line is not clear, it is equivalent to:

``````# colors = colorMap[layer.cells]
width, height = layer.cells.shape
colors = np.zeros([width, height, 4], dtype=np.int32)
for y in range(height):
for x in range(width):
r = colorMap[layer.cells[x, y], 0]
g = colorMap[layer.cells[x, y], 1]
b = colorMap[layer.cells[x, y], 2]
a = colorMap[layer.cells[x, y], 3]
colors[x, y, 0] = r
colors[x, y, 1] = g
colors[x, y, 2] = b
colors[x, y, 3] = a``````

The running time of `colors = colorMap[layer.cells]` is dramatically lower than the equivalent Python code!

Back to the minimap computation code, we init the minimap in the first iteration (lines 5-6). Then, in the following iterations, we copy news colors on transparent pixels in the current minimap (lines 7-9). For that operation, we compute a boolean array where each value corresponds to a transparent color (line 8). We use the slice expression `[..., 3]`, meaning we want all first dimensions but only the fourth in the last. Line 8 is equivalent to:

``````# transparent = minimapArray[..., 3] == 0
width, height, _ = minimapArray.shape
transparent = np.zeros([width, height], dtype=np.bool)
for y in range(height):
for x in range(width):
if minimapArray[x, y, 3] == 0:
transparent[x, y] = True
else:
transparent[x, y] = False``````

Line 9 of the minimap computation code only copies the values for indices where `transparent` is `True`. It is equivalent to:

``````# minimapArray[transparent] = colors[transparent]
width, height, _ = minimapArray.shape
for y in range(height):
for x in range(width):
if transparent[x, y]:
minimapArray[x, y] = colors[x, y]``````

The last lines of the minimap computation code convert the Numpy array to a Pygame surface. The slice expression `[..., 0:3]` means we want the values from all the first dimensions, but only the ones of indices zero (included) to three (excluded) in the last dimension.

Minimap partial computation: the previous code computes the minimap for the whole world. When we only change a part of the world, we don't need to recompute everything. Thanks to Numpy, we only need to run the same computation on a slice of the layer cells! In other words, we can replace `layer.cells` with `layer.cells[x-1:x+1,y-1:y+1]` for instance, to update the local area around coordinates (x,y).

Note that we can store a slice in a variable using the Numpy function `n_()`, for instance:

``cellsSlice = np.s_[cellMinX:cellMaxX, cellMinY:cellMaxY]``

Then, we can use `cellsSlice` as a slice expression, like `layer.cells[cellsSlice]`.

## Layer rendering

I describe in this section the rendering of the impassable layer (mountains, rivers, etc.). The other ones use similar concepts.

Layer renderer: in the `render()` method of the `ImpassableComponent` class, we first create an instance of a new class `LayerRenderer` (see details in attached code):

``renderer = self.createRenderer(surface)``

The purpose of this class is to compute and store many valuable objects. For instance, we can get the slices of cells to render in the `cellsSlice` attribute. Using this slice, we can get the array of visible cell values:

``````cellsSlice = renderer.cellsSlice
cells = self.layer.cells[cellsSlice]``````

As a result, all computations in the following with `cells` are limited to the size of the rendered area: whatever the world map size, the processing time is the same.

Main cells: we start the rendering with all non-river cells:

``````validCells = cells != CellValue.NONE
validCells &= cells != CellValue.IMPASSABLE_RIVER
noise = self.noise[cellsSlice]
for dest, value, cell in renderer.coords(validCells):
rects = tilesRects[value]
rectIndex = int(noise[cell]) % len(rects)
surface.blit(tileset, dest, rects[rectIndex])``````

We first compute the indices of non-empty cells (line 1). Then, we compute the indices of non-river cells and combine them with the previous ones (line 2). The operator `&=` is similar to `+=` except that the operation is "logical and". These two first lines are as if we compute for each cell (x,y) the following:

``````if cells[x,y] != CellValue.NONE \
and cells[x,y] != CellValue.IMPASSABLE_RIVER:
validCells[x,y] = True
else:
validCells[x,y] = False``````

Then, we get the noise values for the currently visible cells (line 3). It works as before and contains random integer values, except that it is a Numpy array instead of a Python list. We create this array in the layer constructor:

``````generator = np.random.default_rng(seed=0)
self.noise = generator.integers(0, 100000, size=self.__layer.size)
self.noise.flags.writeable = False``````

Like Python random generators, we can define a seed and always get the same random values (line 1). We can also tell Numpy that this array is read-only (line 2). Thanks to this option, we can ensure that the value of this array will never change.

Back to the code for rendering non-river cells, we use the `coords()` generator in the `LayerRenderer` class (line 4). It generates information for cells for which `validCells` is `True`:

``````cellCoords = np.transpose(np.nonzero(validCells))
for cell in cellCoords:
cellRelX = int(cell)
cellRelY = int(cell)
value = CellValue(cells[cellRelX, cellRelY])
destX = cellRelX * tileWidth - viewX % tileWidth
destY = cellRelY * tileHeight - viewY % tileHeight
yield (destX, destY), value, (cellRelX, cellRelY)``````

The first line converts a Numpy array with boolean values to a 2D array with coordinates with `True` values. For instance, it converts `[[True,False],[False,True]]` to `[[0,0],[1,1]]`.

Thanks to this filtering, we only process non-empty and non-rivers cells in this loop. If there are only empty cells, the for loop does nothing! Also, note that the overhead (computing the valid cells) is negligible: a Numpy operation on an array of hundreds of values is fast (a few microseconds).

River cells: for these cells, we need the tile border codes as before:

``````neighbors = self.layer.getAreaNeighbors4(cellsBox)
groundNeighbors = self.__ground.getAreaNeighbors4(cellsBox)

valid = cells == CellValue.IMPASSABLE_RIVER
for dest, value, cell in renderer.coords(valid):
rect = self.__river_code2rect[codes[cell]]
surface.blit(tileset, dest, rect)``````

This time, we don't compute the code cell by cell but run operations one after another on all (visible) cells. We first get a 3D Numpy array such as `neighbors[x,y,0]` contains the cell value at (x-1,y), `neighbors[x,y,1]` at (x,y-1), `neighbors[x,y,2]` at (x,y+1) and `neighbors[x,y,3]` at (x+1,y). It is computed in the `getAreaNeighbors4()` of the `Layer` class:

``````def getAreaNeighbors4(self, cellsBox: Tuple[int, int, int, int]) -> np.ndarray:
minX, maxX, minY, maxY = cellsBox
top = self.__cells[minX + 1:maxX + 1, minY:maxY]
left = self.__cells[minX:maxX, minY + 1:maxY + 1]
right = self.__cells[minX + 2:maxX + 2, minY + 1:maxY + 1]
bottom = self.__cells[minX + 1:maxX + 1, minY + 2:maxY + 2]
return np.stack((left, top, bottom, right), axis=2)``````

Note that we do not run any border checks since we added extra rows/columns in the layer cell array! It is also why we shift all coordinates by (1,1), as in the `getValue()` method. The `stack()` Numpy function concatenates arrays along a new dimension (line 7). In our case, we want to create a 3D array from four 2D arrays and create a new dimension at the end.

Back to the code for rendering river cells, we compute a mask array as we did previously (lines 2-5). We replace the `mask4()` function call by Numpy array comparison `==`, `combine4()` by a logical or `|`, and `code4()` by `code4np()`. These operations are the same, except that they process all values in the array instead of one by one. The `code4np(masks)` returns `masks.dot(weights)` with `weights4 = np.array((1, 2, 4, 8))`. The `dot()` Numpy function performs a weighted sum on the last dimension, e.g. `a.dot(b)[x,y] = sum_k(a[x,y,k] * b[x,y,k])`

River mouths: for river mouths, we combine the masks and the codes to find which cells to render:

``````valid = cells == CellValue.NONE
groundCells = self.__ground.cells[cellsSlice]
valid &= groundCells == CellValue.GROUND_SEA
codes = code4np(neighbors == CellValue.IMPASSABLE_RIVER)
valid &= codes != 0
cellMinX, _, cellMinY, _ = renderer.cellsBox
for dest, _, cell in renderer.coords(valid):
for direction in directions:
cellX, cellY = cellMinX + cell, cellMinY + cell
value = self.layer.getValue((cellX, cellY), direction)
if value == CellValue.IMPASSABLE_RIVER:
rect = self.__riverMouth[direction]
surface.blit(tileset, dest, rect)``````

We first build a mask for empty cells in the current layer (line 1) and sea cells in the ground layer (lines 2-3). Then, we need to find cells surrounded by at least one river cell. If we compute codes for rivers (line 4), cells with a non-zero code satisfy this property (line 5)! Then, each cell selected by `valid` has one or more rivers around, but we don't know which ones. It is the reason why we use a small for loop (lines 7-10). Note that we could also create a different selection for each case or combine cases, which would lead to a lot of code.