Using Multiple Render Targets Efficiently#

Multiple Render Targets (MRTs) are available in a variety of APIs, and are supported on PowerVR Rogue hardware. MRTs allow developers to render images to multiple render target textures at once. These textures can then be used as inputs to other shaders, applied to a 3D model, presented to the screen and so on. A common use case for MRTs is deferred shading, whereby the lighting calculations are stored in multiple render targets, and then used to light the scene after it has been drawn.

Tile-based architectures such as PowerVR hardware can efficiently use render targets by storing per-pixel render target data for a single tile (32 x 32 pixels) entirely in on-chip memory, also known as Pixel Local Storage (PLS). This has the advantage of significantly reducing system memory access compared to immediate mode renderers.

On most PowerVR devices, the recommended maximum size per pixel for a render target is 128 bits plus a depth attachment. On some graphics cores, the amount of available memory for PLS may be increased to 256 bits plus a depth attachment.

It is highly recommended that applications do not exceed the amount of available per-pixel storage as this will result in the render target data being spilled out into system memory. This is extremely expensive as the render target data will need to be read for each fragment from system memory when a shader accesses the data stored in the render target. This essentially negates one of the main benefits of tile based rendering, and costs huge amounts of memory bandwidth and performance.

Exceeding the per-pixel storage will also likely result in reduced Unified Shader Cluster (USC) occupancy. Therefore, the maximum number of active threads (shaders) executing in parallel per USC will be severely limited, resulting in reduced efficiency and performance.

On PowerVR hardware, applications can use a variety of render target formats. If the per-pixel render target data can fit into on-chip memory, then all texture accesses are handled by the on-chip memory bus, and therefore all formats equally provide the same performance. This is because no transactions from system memory to the chip are required to load and store the data.

In addition to memory transaction and performance considerations, when render targets spill in system memory, not all render target formats will be supported at full rate over the system memory bus. Therefore, transfer rates may be further reduced depending on the format and the Texture Processing Unit (TPU) available in the graphics core.

The transfer rates are as follows:

  • RGBA8 can be accessed at full rate.

  • RGB10A2 can be accessed at almost full rate.

  • RG11B10 can be accessed at half rate.

  • RGBA16F can be accessed at half rate.

  • RGBA32F can be accessed at quarter rate (no bilinear filtering).

RGBM and RGBdiv8#

Both RGBM and RGBdiv8 texture formats require the developer to implement encoding and decoding functionality into the shader, as these formats are not natively supported by the hardware. This costs additional USC cycles, so if an application is USC limited it should not employ these formats.

These formats do have the advantage that they cost very little in terms of memory bandwidth as they cost the same bandwidth as RGBA8. Therefore, if an application is bound by memory bandwidth, it may be useful to explore these formats.