Balancing Workloads on PowerVR to Eliminate Bottlenecks

Analysing workload distribution across GPU processing capabilities can help identify and eliminate performance bottlenecks

Application developers often face the problem of a performance bottleneck, and sometimes it can be challenging to eliminate these without reducing application quality. These performance bottlenecks are often a result of excessively heavy usage of one of the GPU's processing capabilities. For example, an application might use very ALU-heavy shaders. However, while some parts of the GPU are fully utilised, other parts of the GPU may remain under-utilised or almost idle.

One solution is to balance the GPU workloads to ensure all the GPU is reasonably well-utilised. This means shifting the workload from one part of the GPU to another. This balancing work may depend on the GPU's capabilities, so different optimisations may be required for low-end GPUs compared to high-end GPUs.

The following resources are distinguishable on PowerVR:

  • ALU (shader processing load)
  • texturing load
  • ISP load
  • renderer active
  • tiler active.
Using PVRTune or PVRMonitor, it is possible to observe the usage values for these resources. If one or more resource is discovered to be a bottleneck, and others are under-utilised, then it is worth investigating possible optimisation strategies.

Here are some examples of optimisation strategies:

  • ALU utilisation can be traded for texturing load if some of the equations are precalculated, and the result is stored in a lookup table (LUT).
  • Texturing load can be traded for ALU utilisation by replacing texture fetches with procedural texture functions.
  • The number of shader invocations can be reduced by using depth and stencil testing to reject some pixels. This reduces the texturing load or ALU utilisation depending on the bottleneck.
  • ALU based alpha testing can be traded for a depth prepass and depth testing. This doubles draw call and geometry costs, but it can significantly reduce ALU utilisation and register pressure.
  • Alpha testing and noise functions are usually used in combination to achieve a level-of-detail transition effect. This is often quite ALU-heavy. ALU can be traded for ISP in this scenario by running a stencil prepass.
  • Visual fidelity can be increased by either running more complex shaders and using higher resolution textures, or by increasing the polygon count. In certain cases when the render utilisation is already almost at maximum, it may be sensible to add more polygon complexity rather than fragment heavy effects.