Using Multiple Render Targets Efficiently#
Multiple Render Targets (MRTs) are available in a variety of APIs, and are supported on PowerVR Rogue hardware. MRTs allow developers to render images to multiple render target textures at once. These textures can then be used as inputs to other shaders, applied to a 3D model, presented to the screen and so on. A common use case for MRTs is deferred shading, whereby the lighting calculations are stored in multiple render targets, and then used to light the scene after it has been drawn.
Tile-based architectures such as PowerVR hardware can efficiently use render targets by storing per-pixel render target data for a single tile (32 x 32 pixels) entirely in on-chip memory, also known as Pixel Local Storage (PLS). This has the advantage of significantly reducing system memory access compared to immediate mode renderers.
On most PowerVR devices, the recommended maximum size per pixel for a render target is 128 bits plus a depth attachment. On some graphics cores, the amount of available memory for PLS may be increased to 256 bits plus a depth attachment.
It is highly recommended that applications do not exceed the amount of available per-pixel storage as this will result in the render target data being spilled out into system memory. This is extremely expensive as the render target data will need to be read for each fragment from system memory when a shader accesses the data stored in the render target. This essentially negates one of the main benefits of tile based rendering, and costs huge amounts of memory bandwidth and performance.
Exceeding the per-pixel storage will also likely result in reduced Unified Shader Cluster (USC) occupancy. Therefore, the maximum number of active threads (shaders) executing in parallel per USC will be severely limited, resulting in reduced efficiency and performance.
On PowerVR hardware, applications can use a variety of render target formats. If the per-pixel render target data can fit into on-chip memory, then all texture accesses are handled by the on-chip memory bus, and therefore all formats equally provide the same performance. This is because no transactions from system memory to the chip are required to load and store the data.
In addition to memory transaction and performance considerations, when render targets spill in system memory, not all render target formats will be supported at full rate over the system memory bus. Therefore, transfer rates may be further reduced depending on the format and the Texture Processing Unit (TPU) available in the graphics core.
The transfer rates are as follows:
RGBA8
can be accessed at full rate.RGB10A2
can be accessed at almost full rate.RG11B10
can be accessed at half rate.RGBA16F
can be accessed at half rate.RGBA32F
can be accessed at quarter rate (no bilinear filtering).
Recommended HDR texture formats#
There are several texture formats available that can be used to store HDR texture data. Each format has its benefits and drawbacks. The following table details several HDR-suitable texture formats that are currently available for use.
Texture Format |
Bandwidth Cost |
USC Cost |
Filtering |
Precision |
Alpha |
---|---|---|---|---|---|
|
Same as |
None |
Hardware accelerated, slightly slower than |
RGB channels have greater precision over |
Supports alpha – only four unique values |
|
2x |
None |
Hardware accelerated but performs at half the rate of |
Far greater precision than |
Supports alpha |
|
2x |
None |
Hardware accelerated but performs at half the rate of |
Same as |
Does not have an alpha channel |
|
4x |
None |
Hardware accelerated but performs at quarter the rate of |
Vastly greater precision than any other format – 232 values per channel |
Supports alpha |
|
Same as |
Moderate USC cost for encoding / decoding the data |
Hardware does not natively support filtering on this format |
Encoding algorithm improves the range of values that can be represented by the RGB channels compared to standard |
No alpha sacrificed to provide improved RGB range |
|
Same as |
Slightly more complex than |
Hardware does not natively support filtering on this format |
Encoding algorithm improves the range of values that can be represented by the RGB channels compared to standard |
No alpha sacrificed to provide improved RGB range |
The appropriate HDR texture format will depend on several factors such as available memory bandwidth, precision (quality), alpha support, and so on.
For HDR texture formats which are natively supported by the hardware, use either RGB10A2
, or RGBA16F
(which has increased bandwidth). These textures provide a good balance between quality, performance (filtering), and memory bandwidth usage.
RGBM
and RGBdiv8
#
Both RGBM
and RGBdiv8
texture formats require the developer to implement encoding and decoding functionality into the shader, as these formats are not natively supported by the hardware. This costs additional USC cycles, so if an application is USC limited it should not employ these formats.
These formats do have the advantage that they cost very little in terms of memory bandwidth as they cost the same bandwidth as RGBA8
. Therefore, if an application is bound by memory bandwidth, it may be useful to explore these formats.