Subpasses/Pixel Local Storage

Subpasses and pixel local storage should be used wherever possible in an application

Subpasses are one of those optimisations that applications should be designed around. Use them if at all possible, explore them if remotely possible, and rewrite applications to take advantage of them. One of the first questions that should be asked when doing a multi-pass application is: "Can region-local subpasses be used with it?"

Conceptually, a subpass is a run through the graphics pipeline (from vertex shader->… -> framebuffer output) whose output will be an input for a later step. For instance, rendering the G-Buffer in deferred shading can be a subpass.

This is similar to rendering to a texture of screen size in one run and sampling the corresponding texel at the same position as the rendered pixel on the next pass.

If this is designed properly, this allows the implementation to do a powerful optimisation on tiled architectures. The output of the fragment shader of the first subpass is not stored at all to main memory as it is known that it will not need to be displayed. Instead it is kept on very fast on-chip memory (register files) and accessed again from the fragment shader of the next subpass.

For example, in Deferred Shading, the G-Buffer contents can be kept on-chip to be used in the lighting pass. This can have great performance benefits in mobile architectures, as they are commonly bandwidth limited.

The caveat is that for this to happen, each pixel must only use the information from the corresponding input pixel. It cannot sample from arbitrary locations, and it cannot sample at all from the previous contents.

For Vulkan, in order to collapse subpasses in this way:

  • Render into the images in one subpass.
  • Use these images as input attachments in the other subpass.
  • Use Transient and Lazily Allocated flags for those attachments.

For OpenGL ES, the same effect is done with enabling the GL_PIXEL_LOCAL_STORAGE extension. Additionally, the shaders must have been written to explicitly take advantage of it.

In short, use subpass folding wherever suitable. With multiple passes, see if they are suitable for subpass optimisation. For both of these cases, see the DeferredShading example.