How Vulkan Achieves Better Efficiency

Although Vulkan has the potential for high efficiency, it requires applications to really consider the usage of the API. If an application was designed without consideration for Vulkan and optimised for an earlier API without leveraging any of the opportunities Vulkan offers, it might not see any performance benefits, or may even be slower.

Validation and Errors

Vulkan features very little runtime error checking in order to remove overhead. What this means is that in various cases, if an application suffers an error, the driver will not notice, which in turn may lead to undefined behaviour, including program termination. This may seem drastic, but outside of development and debug teams, there are very few error checks a release build of an application needs to perform. This means that when someone is using an application, even if it is functioning as intended, there is an overhead to validate everything for no reason. Vulkan development instead relies on tooling and debugging layers to identify application errors before they ever reach a consumer device.

As Vulkan is an explicit API permitting direct control over how GPUs work, no validation or error-checking is performed inside a Vulkan driver. This is due to applications now having full control and responsibility, so any errors in using Vulkan are liable to result in a crash as a result. The Vulkan SDK provides standard validation layers that can be enabled to ease development by helping developers verify their applications correctly to make use of the API.

These validation layers work with the DEBUG_REPORT extension to provide the application or the user with validation feedback. When a validation layer is enabled, it will read the vk_layer_settings.txt file to determine its behaviour. An application can also register callback functions via the DEBUG_REPORT extension to receive callbacks when specified validation events occur. Application callbacks happen regardless of the settings in vk_layer_settings.txt

Hazard Tracking and Synchronisation

OpenGL ES performs implicit tracking of resource usage and synchronisation for an application, along with concessions for explicit management in more recent versions via fence sync objects, queries, and memory barriers. Yet with all these explicit operations, most synchronisation still goes on within the driver, which is expensive to manage. Because OpenGL ES has to manage all of this for all applications, determining a better path requires complex heuristics.

The Vulkan API leaves hazard tracking and synchronisation entirely up to the developer - they decide how things are executed and in what order, as well as how resources are managed. This way, developers know how they plan to use resources and synchronise work, giving the opportuniy for a much lower overhead than that achieved by a general-purpose driver. However, this also means more difficult synchronisation for the developers.

Pipeline State Objects

Changing GPU state can be a costly operation, complicated by the fact that most state changes require modifying shader code. PowerVR GPUs, for example, support programmable blending; this means that state has to be patch-blended into the fragment shader. Different GPUs have similar issues with other pieces of fixed-function state. It may also be necessary to compile or translate state that doesn't affect the shader directly.

An application should provide most of the relevant information well in advance of draw time, baking it into Pipeline State Objects (PSOs) outside of the main render loop. PSOs are responsible for the largest saving of CPU work during draw command generation, as they handle all the validation, compilation, and translation of the API state to GPU code.

Command Buffer Reuse

The command submission model used by Vulkan requires draw calls to first be generated into command buffers and then submitted. This is in conrast to APIs such as OpenGL ES where a draw command is executed immediately. Command Buffers are dedicated objects in Vulkan, and are not normally discarded once submitted, allowing them to be re-used. This is because command generation may end up being very expensive, especially considering the overall smaller overhead of Vulkan, so it is good practice to skip the process altogether by reusing already created command buffers.

Figure 1: Command Buffer Usage

The resources referenced by a command buffer can be modified when not in use by the GPU, allowing for a large amount of dynamism in the scene. For example, the provided Library demo uses two command buffers for the entire scene. Scene-to-scene, the camera transformation and fadeout values are simply updated via data stored in Uniform Buffers, referenced by the command buffers. Having two command buffers allows for one to be rendered by the GPU while the other's Uniform Buffers are being updated by the CPU.

Because most applications are becoming more data-driven and more work can be specified on the GPU, most command buffers will be potentially reusable a number of times.

Multithreading

It is accepted that multiple cores running at a lower frequency, or with a smaller workload will run cooler and consume less power than having all the work being run on a single core and maxing it out. Thanks to efficient multithreading, Vulkan allows applications to spread their workload wider and take advantage of this, leading to a cooler cheap consuming less power. More information on how multithreading works can be found in CPU Bottlenecks.

Though the majority of efficiency increases relate to CPU, Vulkan does also provide some GPU efficiency improvements. Architecture Positive provides more details about various aspects of the API that result in improved performance and efficiency.