On-Chip Memory Performance

Making efficient use of PowerVR's on-chip memory is vital to getting the best performance

Every PowerVR Rogue (and later) architecture GPU contains some amount of on-chip memory, typically 256 bits on high end GPUs and 128 bits on low end GPUs. The depth buffer is stored separately and does not count toward the on-chip memory. This memory is used to accelerate some of the per-fragment fixed-function pipeline such as alpha blending, depth testing, and stencil testing. It can also be used programmatically via the Pixel Local Storage extension in OpenGL, and Vulkan's subpass functionality.

Important: Tailoring algorithms around this high performance on-chip memory is key to achieving the maximum amount of performance on PowerVR.

While on-chip memory is fast and should be used as much as possible, the following should be considered:

  • On-chip memory has a finite amount of bandwidth.
  • Bits used for storage cannot be used elsewhere, such as for register space.

The Developer Technology team tested the performance of the on-chip memory by modifying the Deferred Shading example from the PowerVR SDK to use increasing amounts of on-chip memory. The GPU used was a PowerVR GX6250 GPU. The results can be found in the table below.

Table 1. On-chip memory performance results on PowerVR GX6250
Configuration GX6250 time/frame (ms)
96bit + D32 20
128bit + D32 21
160bit + D32 23
192bit + D32 24
224bit + D32 28
256bit + D32 29
288bit + D32 39
The much higher 10ms gap at the last step is proof that the GX6250 has 256bits of on-chip memory. The increasing frametime as the on-chip memory usage goes up indicates that it has a finite amount of bandwidth.

Therefore, it is advisable to keep G-buffers as small as possible, and within the limits of the target GPU.