Optimising Register Pressure on PowerVR#
Like any other GPU architecture, PowerVR has a limited amount of register space to use. Failure to stay within the bounds of available register space could cause register spilling, which may result in sub-optimal performance.
There are several ways to improve this:
The number of registers used at any time is equal to the number of variables a shader invocation needs to “remember” at that specific moment. Therefore, register pressure can be minimised by keeping global variable usage to a minimum, and also by making sure local variables stay in scope as little as possible.
The PowerVR Rogue architecture has scalar ALUs. Therefore, if the last components are not needed, it is better to work with two or three component vectors, rather than four. This also applies to matrix operations – for example, when transforming a three-component vector, it is better to use a 3x3 matrix to save register space. This also applies if the
w
component will be0
. And finally, affine transformations usually only require a 4x3 matrix (rotate, scale, translate) as the last column is always(0,0,0,1)
.The PowerVR Rogue architecture is particularly good at handling FP16 operations, as often this results in twice as many instructions executed in a single cycle. FP16 math also has the advantage that two variables can be packed into a single FP32 register. Therefore, FP16 is not only faster, but also results in less register pressure.
Note
Minimising the use of branching is recommended. On most GPU architectures, branching is always costly as GPUs are designed to handle parallel workloads. Branching not only results in extra cycles consumed, but also comes with increased register pressure.