On-Chip Memory Performance

Every PowerVR Rogue (and later) architecture GPU contains some amount of on-chip memory, typically 256 bits on high end GPUs and 128 bits on low end GPUs. The depth buffer is stored separately and does not count toward the on-chip memory. This memory is used to accelerate some of the per-fragment fixed-function pipeline such as alpha blending, depth testing, and stencil testing. It can also be used programmatically via the Pixel Local Storage extension in OpenGL, and Vulkan's subpass functionality.

Note: Tailoring algorithms around this high performance on-chip memory is key to achieving the maximum amount of performance on PowerVR

While on-chip memory is fast and should be used as much as possible, the following should be considered:

  • On-chip memory has a finite amount of bandwidth;

  • Bits used for storage cannot be used elsewhere, such as for register space.

The Developer Technology team tested the performance of the on-chip memory by modifying the Deferred Shading example from the PowerVR SDK to use increasing amounts of on-chip memory. The GPU used was a PowerVR GX6250 GPU. The results are shown below:

ConfigurationGX6250 time/frame (ms)
96bit + D3220
128bit + D3221
160bit + D3223
192bit + D3224
224bit + D3228
256bit + D3229
288bit + D3239

The much higher 10ms gap at the last step is proof that the GX6250 has 256bits of on-chip memory. The increasing frametime as the on-chip memory usage goes up indicates that it has a finite amount of bandwidth. Therefore, it is advisable to keep G-buffers as small as possible, and within the limits of the target GPU.