Using Multiple Render Targets Efficiently#

Multiple Render Targets (MRTs) are available in a variety of APIs, and are supported on PowerVR Rogue hardware. MRTs allow developers to render images to multiple render target textures at once. These textures can then be used as inputs to other shaders, applied to a 3D model, presented to the screen and so on. A common use case for MRTs is deferred shading, whereby the lighting calculations are stored in multiple render targets, and then used to light the scene after it has been drawn.

Tile-based architectures such as PowerVR hardware can efficiently use render targets by storing per-pixel render target data for a single tile (32 x 32 pixels) entirely in on-chip memory, also known as Pixel Local Storage (PLS). This has the advantage of significantly reducing system memory access compared to immediate mode renderers.

On most PowerVR devices, the recommended maximum size per pixel for a render target is 128 bits plus a depth attachment. On some graphics cores, the amount of available memory for PLS may be increased to 256 bits plus a depth attachment.

It is highly recommended that applications do not exceed the amount of available per-pixel storage as this will result in the render target data being spilled out into system memory. This is extremely expensive as the render target data will need to be read for each fragment from system memory when a shader accesses the data stored in the render target. This essentially negates one of the main benefits of tile based rendering, and costs huge amounts of memory bandwidth and performance.

Exceeding the per-pixel storage will also likely result in reduced Unified Shader Cluster (USC) occupancy. Therefore, the maximum number of active threads (shaders) executing in parallel per USC will be severely limited, resulting in reduced efficiency and performance.

On PowerVR hardware, applications can use a variety of render target formats. If the per-pixel render target data can fit into on-chip memory, then all texture accesses are handled by the on-chip memory bus, and therefore all formats equally provide the same performance. This is because no transactions from system memory to the chip are required to load and store the data.

In addition to memory transaction and performance considerations, when render targets spill in system memory, not all render target formats will be supported at full rate over the system memory bus. Therefore, transfer rates may be further reduced depending on the format and the Texture Processing Unit (TPU) available in the graphics core.

The transfer rates are as follows:

RGBA8 can be accessed at full rate.
RGB10A2 can be accessed at almost full rate.
RG11B10 can be accessed at half rate.
RGBA16F can be accessed at half rate.
RGBA32F can be accessed at quarter rate (no bilinear filtering).

Recommended HDR texture formats#

There are several texture formats available that can be used to store HDR texture data. Each format has its benefits and drawbacks. The following table details several HDR-suitable texture formats that are currently available for use.

Texture Format	Bandwidth Cost	USC Cost	Filtering	Precision	Alpha
`RGB10A2`	Same as `RGBA8`	None	Hardware accelerated, slightly slower than `RGBA8`	RGB channels have greater precision over `RGBA8` at the cost of alpha precision	Supports alpha – only four unique values
`RGBA16F`	2x `RGBA8`	None	Hardware accelerated but performs at half the rate of `RGBA8`	Far greater precision than `RGBA8` – 216 values per channel	Supports alpha
`RG11B10`	2x `RGBA8`(internally stored as `RGBA16F`)	None	Hardware accelerated but performs at half the rate of `RGBA8`	Same as `RGBA16F`	Does not have an alpha channel
`RGBA32F`	4x `RGBA8`	None	Hardware accelerated but performs at quarter the rate of `RGBA8` and only supports nearest sampling	Vastly greater precision than any other format – 232 values per channel	Supports alpha
`RGBM` (`RGBA8`)	Same as `RGBA8`	Moderate USC cost for encoding / decoding the data	Hardware does not natively support filtering on this format	Encoding algorithm improves the range of values that can be represented by the RGB channels compared to standard `RGBA8`	No alpha sacrificed to provide improved RGB range
`RGBdiv8`(`RGBA8`)	Same as `RGBA8`	Slightly more complex than `RGBM` to encode / decode the data	Hardware does not natively support filtering on this format	Encoding algorithm improves the range of values that can be represented by the RGB channels compared to standard `RGBA8`	No alpha sacrificed to provide improved RGB range

The appropriate HDR texture format will depend on several factors such as available memory bandwidth, precision (quality), alpha support, and so on.

For HDR texture formats which are natively supported by the hardware, use either RGB10A2, or RGBA16F (which has increased bandwidth). These textures provide a good balance between quality, performance (filtering), and memory bandwidth usage.

`RGBM` and `RGBdiv8`#

Both RGBM and RGBdiv8 texture formats require the developer to implement encoding and decoding functionality into the shader, as these formats are not natively supported by the hardware. This costs additional USC cycles, so if an application is USC limited it should not employ these formats.

These formats do have the advantage that they cost very little in terms of memory bandwidth as they cost the same bandwidth as RGBA8. Therefore, if an application is bound by memory bandwidth, it may be useful to explore these formats.

Using Multiple Render Targets Efficiently#

Recommended HDR texture formats#

RGBM and RGBdiv8#

`RGBM` and `RGBdiv8`#