On-Chip Memory Performance#

Every PowerVR Rogue (and later) architecture GPU contains some amount of on-chip memory, typically 256 bits on high end GPUs and 128 bits on low end GPUs. The depth buffer is stored separately and does not count toward the on-chip memory. This memory is used to accelerate some of the per-fragment fixed-function pipeline such as alpha blending, depth testing, and stencil testing. It can also be used programmatically via the Pixel Local Storage extension in OpenGL, and Vulkan’s subpass functionality.

Note

Tailoring algorithms around high on-chip memory performance is key to achieving the maximum amount of performance on PowerVR.

While on-chip memory is fast and should be used as much as possible, the following should be considered:

  • On-chip memory has a finite amount of bandwidth;

  • Bits used for storage cannot be used elsewhere, such as for register space.

The Developer Technology team tested the performance of the on-chip memory by modifying the Deferred Shading example from the PowerVR SDK to use increasing amounts of on-chip memory. The GPU used was a PowerVR GX6250 GPU. The results are shown below:

Configuration

GX6250 time/frame (ms)

96bit + D32

20

128bit + D32

21

160bit + D32

23

192bit + D32

24

224bit + D32

28

256bit + D32

29

288bit + D32

39

The much higher 10ms gap at the last step is proof that the GX6250 has 256bits of on-chip memory. The increasing frametime as the on-chip memory usage goes up indicates that it has a finite amount of bandwidth. Therefore, it is advisable to keep G-buffers as small as possible, and within the limits of the target GPU.