Python is a great language, but it is pretty slow, mainly when we process many items in loops. One way to speed up these parts is to use Numpy: its functions can process data efficiently.
This post is part of the 2D Strategy Game series
Using the optimizations I present in this post, the game can handle huge maps, for instance, with one million cells:
As we can see in this video, the game update time remains low (below 20ms).
In the following, we assume that we import Numpy with
import numpy as np
With this alias, the syntax
np.func() is equivalent to
Create the array: to use the Numpy functions, we need to put our data in Numpy arrays. They can have many dimensions (1D, 2D, 3D, or more) and can hold scalars (integers, floats, bool, etc.).
Layer class, we create a Numpy array to store the value of the layer cells. For a world of size (width, height), we can create a 2D Numpy array of 32-bit integers in the following way:
cells = np.zeros([width, height], dtype=np.int32)
zeros() Numpy function initializes the array with zeros. If we want to initialize with a specific value, for instance
defaultValue, we can use the
cells = np.full([width, height], defaultValue, dtype=np.int32)
Borders: we use a trick to ease the management of layer borders: we add rows and columns all around:
cells = np.full([width+2, height+2], defaultValue, dtype=np.int32)
With this setup, the inner part contains the actual cells, with x indices from 1 to with and y indices from 1 to height. It simplifies a lot the handling of borders, for instance, in the
def getValue(self, coords: Tuple[int, int], direction: Optional[Direction] = None) -> CellValue: x, y = coords, coords if direction: if direction == Direction.LEFT: return CellValue(self.__cells[x, y + 1]) if direction == Direction.RIGHT: return CellValue(self.__cells[x + 2, y + 1]) if direction == Direction.TOP: return CellValue(self.__cells[x + 1, y]) if direction == Direction.BOTTOM: return CellValue(self.__cells[x + 1, y + 2]) return CellValue(self.__cells[x + 1, y + 1])
Without this trick, we have to add checks in many places. For example, for the left direction case, without extra rows/columns in the cell array, we must check the x coordinates:
# Code without the border trick if direction == Direction.LEFT: if x < 1: return self.__defaultValue return CellValue(self.__cells[x - 1, y])
Note that we cast to a
CellValue because Numpy cell items are not Python integers but integers specific to Numpy. In our case, cell items are instances of
int. These objects behave like Python integers, but operations on them are slow: we better cast them to regular integers (remind that
CellValue is an
IntEnum, which is like an
int but with extra features).
We render the minimap as before, where each pixel represents a world cell. However, the computation time with Numpy is much lower: on a test machine, with a world made of one million cells, the new code lasts 0.4s instead of 8s (20x speed up)!
Slicing: for this specific task, we need the cell arrays without the additional rows/columns to handle world borders (an internal trick of the
Layer class). We can get these cells with slicing: we add a new
cells property in the
Layer class that returns this:
def cells(self) -> np.ndarray: return self.__cells[1:-1, 1:-1]
[1:-1] is a slice that represents all indices from the second one (index=1) to the second last (index=width-2 or height-2). Hence, the result of
array[slice, slice] is also a Numpy array: we can use all Numpy functions on it as if it was a usual array. Slices with Numpy arrays are views: a change on the sliced array also changes the original array (and vise-versa). Furthermore, there is no overhead with regular slices (like the one in this example): the computation time is the same on the original and the sliced array.
Minimap computation: in the
renderMinimap() method of the
MinimapFrame class, we compute the minimap in the following way:
minimapArray: Optional[np.ndarray] = None for name, layer in layers: colorMap = self.__colors[name] colors = colorMap[layer.cells] if minimapArray is None: minimapArray = colors else: transparent = minimapArray[..., 3] == 0 minimapArray[transparent] = colors[transparent] if minimapArray is not None: surface = pygame.surfarray.make_surface(minimapArray[..., 0:3]) self.__minimapSurface = cast(Surface, surface.convert_alpha())
We iterate through layers as before, from the last to the first one (line 2). Then, we get a color map for the current layer (line 3). It is a Numpy 2D array that contains a 4D vector (red, green, blue, alpha) for each possible cell value. We assume that only the cell values corresponding to the current layer are non-transparent (alpha is not zero).
We use the color map to compute the pixel colors of the cells of the current layer (line 4). If this line is not clear, it is equivalent to:
# colors = colorMap[layer.cells] width, height = layer.cells.shape colors = np.zeros([width, height, 4], dtype=np.int32) for y in range(height): for x in range(width): r = colorMap[layer.cells[x, y], 0] g = colorMap[layer.cells[x, y], 1] b = colorMap[layer.cells[x, y], 2] a = colorMap[layer.cells[x, y], 3] colors[x, y, 0] = r colors[x, y, 1] = g colors[x, y, 2] = b colors[x, y, 3] = a
The running time of
colors = colorMap[layer.cells] is dramatically lower than the equivalent Python code!
Back to the minimap computation code, we init the minimap in the first iteration (lines 5-6). Then, in the following iterations, we copy news colors on transparent pixels in the current minimap (lines 7-9). For that operation, we compute a boolean array where each value corresponds to a transparent color (line 8). We use the slice expression
[..., 3], meaning we want all first dimensions but only the fourth in the last. Line 8 is equivalent to:
# transparent = minimapArray[..., 3] == 0 width, height, _ = minimapArray.shape transparent = np.zeros([width, height], dtype=np.bool) for y in range(height): for x in range(width): if minimapArray[x, y, 3] == 0: transparent[x, y] = True else: transparent[x, y] = False
Line 9 of the minimap computation code only copies the values for indices where
True. It is equivalent to:
# minimapArray[transparent] = colors[transparent] width, height, _ = minimapArray.shape for y in range(height): for x in range(width): if transparent[x, y]: minimapArray[x, y] = colors[x, y]
The last lines of the minimap computation code convert the Numpy array to a Pygame surface. The slice expression
[..., 0:3] means we want the values from all the first dimensions, but only the ones of indices zero (included) to three (excluded) in the last dimension.
Minimap partial computation: the previous code computes the minimap for the whole world. When we only change a part of the world, we don't need to recompute everything. Thanks to Numpy, we only need to run the same computation on a slice of the layer cells! In other words, we can replace
layer.cells[x-1:x+1,y-1:y+1] for instance, to update the local area around coordinates (x,y).
Note that we can store a slice in a variable using the Numpy function
n_(), for instance:
cellsSlice = np.s_[cellMinX:cellMaxX, cellMinY:cellMaxY]
Then, we can use
cellsSlice as a slice expression, like
I describe in this section the rendering of the impassable layer (mountains, rivers, etc.). The other ones use similar concepts.
Layer renderer: in the
render() method of the
ImpassableComponent class, we first create an instance of a new class
LayerRenderer (see details in attached code):
renderer = self.createRenderer(surface)
The purpose of this class is to compute and store many valuable objects. For instance, we can get the slices of cells to render in the
cellsSlice attribute. Using this slice, we can get the array of visible cell values:
cellsSlice = renderer.cellsSlice cells = self.layer.cells[cellsSlice]
As a result, all computations in the following with
cells are limited to the size of the rendered area: whatever the world map size, the processing time is the same.
Main cells: we start the rendering with all non-river cells:
validCells = cells != CellValue.NONE validCells &= cells != CellValue.IMPASSABLE_RIVER noise = self.noise[cellsSlice] for dest, value, cell in renderer.coords(validCells): rects = tilesRects[value] rectIndex = int(noise[cell]) % len(rects) surface.blit(tileset, dest, rects[rectIndex])
We first compute the indices of non-empty cells (line 1). Then, we compute the indices of non-river cells and combine them with the previous ones (line 2). The operator
&= is similar to
+= except that the operation is "logical and". These two first lines are as if we compute for each cell (x,y) the following:
if cells[x,y] != CellValue.NONE \ and cells[x,y] != CellValue.IMPASSABLE_RIVER: validCells[x,y] = True else: validCells[x,y] = False
Then, we get the noise values for the currently visible cells (line 3). It works as before and contains random integer values, except that it is a Numpy array instead of a Python list. We create this array in the layer constructor:
generator = np.random.default_rng(seed=0) self.noise = generator.integers(0, 100000, size=self.__layer.size) self.noise.flags.writeable = False
Like Python random generators, we can define a seed and always get the same random values (line 1). We can also tell Numpy that this array is read-only (line 2). Thanks to this option, we can ensure that the value of this array will never change.
Back to the code for rendering non-river cells, we use the
coords() generator in the
LayerRenderer class (line 4). It generates information for cells for which
cellCoords = np.transpose(np.nonzero(validCells)) for cell in cellCoords: cellRelX = int(cell) cellRelY = int(cell) value = CellValue(cells[cellRelX, cellRelY]) destX = cellRelX * tileWidth - viewX % tileWidth destY = cellRelY * tileHeight - viewY % tileHeight yield (destX, destY), value, (cellRelX, cellRelY)
The first line converts a Numpy array with boolean values to a 2D array with coordinates with
True values. For instance, it converts
Thanks to this filtering, we only process non-empty and non-rivers cells in this loop. If there are only empty cells, the for loop does nothing! Also, note that the overhead (computing the valid cells) is negligible: a Numpy operation on an array of hundreds of values is fast (a few microseconds).
River cells: for these cells, we need the tile border codes as before:
neighbors = self.layer.getAreaNeighbors4(cellsBox) masks = neighbors == CellValue.IMPASSABLE_RIVER masks |= neighbors == CellValue.IMPASSABLE_MOUNTAIN groundNeighbors = self.__ground.getAreaNeighbors4(cellsBox) masks |= groundNeighbors == CellValue.GROUND_SEA codes = code4np(masks) valid = cells == CellValue.IMPASSABLE_RIVER for dest, value, cell in renderer.coords(valid): rect = self.__river_code2rect[codes[cell]] surface.blit(tileset, dest, rect)
This time, we don't compute the code cell by cell but run operations one after another on all (visible) cells. We first get a 3D Numpy array such as
neighbors[x,y,0] contains the cell value at (x-1,y),
neighbors[x,y,1] at (x,y-1),
neighbors[x,y,2] at (x,y+1) and
neighbors[x,y,3] at (x+1,y). It is computed in the
getAreaNeighbors4() of the
def getAreaNeighbors4(self, cellsBox: Tuple[int, int, int, int]) -> np.ndarray: minX, maxX, minY, maxY = cellsBox top = self.__cells[minX + 1:maxX + 1, minY:maxY] left = self.__cells[minX:maxX, minY + 1:maxY + 1] right = self.__cells[minX + 2:maxX + 2, minY + 1:maxY + 1] bottom = self.__cells[minX + 1:maxX + 1, minY + 2:maxY + 2] return np.stack((left, top, bottom, right), axis=2)
Note that we do not run any border checks since we added extra rows/columns in the layer cell array! It is also why we shift all coordinates by (1,1), as in the
getValue() method. The
stack() Numpy function concatenates arrays along a new dimension (line 7). In our case, we want to create a 3D array from four 2D arrays and create a new dimension at the end.
Back to the code for rendering river cells, we compute a mask array as we did previously (lines 2-5). We replace the
mask4() function call by Numpy array comparison
combine4() by a logical or
code4np(). These operations are the same, except that they process all values in the array instead of one by one. The
weights4 = np.array((1, 2, 4, 8)). The
dot() Numpy function performs a weighted sum on the last dimension, e.g.
a.dot(b)[x,y] = sum_k(a[x,y,k] * b[x,y,k])
River mouths: for river mouths, we combine the masks and the codes to find which cells to render:
valid = cells == CellValue.NONE groundCells = self.__ground.cells[cellsSlice] valid &= groundCells == CellValue.GROUND_SEA codes = code4np(neighbors == CellValue.IMPASSABLE_RIVER) valid &= codes != 0 cellMinX, _, cellMinY, _ = renderer.cellsBox for dest, _, cell in renderer.coords(valid): for direction in directions: cellX, cellY = cellMinX + cell, cellMinY + cell value = self.layer.getValue((cellX, cellY), direction) if value == CellValue.IMPASSABLE_RIVER: rect = self.__riverMouth[direction] surface.blit(tileset, dest, rect)
We first build a mask for empty cells in the current layer (line 1) and sea cells in the ground layer (lines 2-3). Then, we need to find cells surrounded by at least one river cell. If we compute codes for rivers (line 4), cells with a non-zero code satisfy this property (line 5)! Then, each cell selected by
valid has one or more rivers around, but we don't know which ones. It is the reason why we use a small for loop (lines 7-10). Note that we could also create a different selection for each case or combine cases, which would lead to a lot of code.