Render Pass Optimisations#

The PowerVR SDK is designed to work with any conformant OpenGL ES or Vulkan implementation. Most optimisation guidance provided is sensible for any platform, but some guidance may be critical for PowerVR Platforms. The recommended optimisations will not normally be detrimental to the performance of other platforms, but they may not actually improve them.

This section details strategies for optimisations relating to efficiently using multi-subpass render passes (Vulkan) or multi-pass rendering (OpenGL ES). All of these optimisations are suitable for any platform that supports them, but their effect on PowerVR architectures makes them crucial to use whenever possible.

These optimisations are expected to benefit any platform, or at worst be neutral and have no effect. However, tile-based architectures (which applies to some mobile) and unified memory architectures (practically all mobile) are expected to hugely benefit.

Starting the Render Pass#

In Vulkan and PVRVk, when creating a RenderPass object, set the LoadOp and the StoreOp to it.

The LoadOp means “when starting a render pass, what is necessary to do with whatever contents the frame buffer where we are rendering contains?”

There are three options here:

Clear means “forget what’s in there, use this colour”. This is usually the recommended operation.
Don’t Care means “the entire scene will be rendered anyway, so it doesn’t matter, don’t load it”
Preserve means “the scene will be rendered incrementally, using whatever is already in the framebuffer, so the contents of it need to be preserved.”

Clear and Don’t Care may sound different, but it is important to realise that their effect is practically the same as far as the important parts of performance go. They both allow the driver to ignore what is in the frame buffer. In the case of Clear, the driver will just be using the clear colour instead of the contents of the frame buffer. Don’t Care is similar, but also tells the driver that no specific colour is required.

Never use Preserve unless absolutely certain it is needed as it will introduce an entire round-trip to main memory. Its performance cost on bandwidth cannot be overstated. It is recommended to double-check the application design if Preserve is actually required.

In OpenGL ES, the situation is very similar. When glClear is called at the start of a frame, or glInvalidate depth/stencil before swapping, the driver may be allowed to discard the contents of the frame buffer / depth buffer before the next frame.

The specific flags depend on usage, but the baseline should be as follows:

Recommendations for LoadOp#

Clear for depth/stencil, using the maximum depth value/whatever the stencil needs to be.
Clear for colour, if any part of the screen may not be rendered.
Ignore, if it is guaranteed that every single pixel on the screen will be rendered to. It would be almost the same to always set Clear in every case, but it does not hurt to be pedantic and set ignore if it is suitable. Never set Ignore and have pixels on screen that have not been specifically overwritten, as then there is undefined behaviour and there may be artifacts or flickering.

Conversely, for OpenGL ES:

glClear both colour and depth at the start of the frame.

Recommendations for StoreOp#

The StoreOp is much the same, but it should be even more obvious. In nearly every case, it is necessary to:

Store the colour so that it can be displayed on screen.
Discard the depth and stencil as their work is done.

Conversely, for OpenGL ES, before calling eglSwapBuffers:

Do not do anything special for colour (EGL_PRESERVE in EGL swap behaviour).
glInvalidateFrameBuffers/glDiscardFrameBuffers any FBOs that are not being rendered, and all depth/stencil attachments.

In short:

Colour usually needs to be cleared on load, unless the contents of the frame buffer need to be explicitly read. A need to load the colour is very commonly a hint that subpasses/pixel local storage should be used instead if possible.
Colour usually needs to be stored at the end of the frame, in order to be presented.
Depth and stencil almost always need to be cleared to max value at the start of the pass.
Depth and stencil almost never need to be stored at the end of the pass, as they are not required for rendering.

Subpasses/Pixel Local Storage#

Subpasses are one of those optimisations that applications should be designed around. Use them if at all possible, explore them if remotely possible, and rewrite applications to take advantage of them. One of the first questions that should be asked when doing a multi-pass application is: “Can region-local subpasses be used with it?”

Conceptually, a subpass is a run through the graphics pipeline (from vertex shader->… -> framebuffer output) whose output will be an input for a later step. For instance, rendering the G-Buffer in deferred shading can be a subpass.

This is similar to rendering to a texture of screen size in one run and sampling the corresponding texel at the same position as the rendered pixel on the next pass.

If this is designed properly, this allows the implementation to do a powerful optimisation on tiled architectures. The output of the fragment shader of the first subpass is not stored at all to main memory as it is known that it will not need to be displayed. Instead it is kept on very fast on-chip memory (register files) and accessed again from the fragment shader of the next subpass.

For example, in Deferred Shading, the G-Buffer contents can be kept on-chip to be used in the lighting pass. This can have great performance benefits in mobile architectures, as they are commonly bandwidth limited.

The caveat is that for this to happen, each pixel must only use the information from the corresponding input pixel. It cannot sample from arbitrary locations, and it cannot sample at all from the previous contents.

For Vulkan, in order to collapse subpasses in this way:

Render into the images in one subpass.
Use these images as input attachments in the other subpass.
Use Transient and Lazily Allocated flags for those attachments.

For OpenGL ES, the same effect is done with enabling the GL_PIXEL_LOCAL_STORAGE extension. Additionally, the shaders must have been written to explicitly take advantage of it.

In short, use subpass folding wherever suitable. With multiple passes, see if they are suitable for subpass optimisation. For both of these cases, see the DeferredShading example.