CPU Bottlenecks

Modern CPUs have multiple cores, so to get the best performance out of one it is important to make effective use of the additional cores by using multiple threads. Not taking advantage of the other cores limits performance to that of a single core, which in turn, limits efficiency. The GPU waiting on the CPU as a function of an ineffecient API has already been discussed. Even with a high overhead, this problem could be somewhat mitigated by spreading the workload across multiple threads.

For example, as illustrated in the Gnome Horde demo in High Efficiency on Mobile, OpenGL ES spikes a lot higher than Vulkan for generating the same content. The API is continually maxing out either core 0 or core 1 and suffering a performance penalty as a result - despite there being four Intel CPU cores available on the Nexus player. Though it appears only two are idle, in actuality, three are idling at any point as the OS is transferring the thread between cores in an attempt to keep the chip cool. In this case, OpenGL ES is performing excess work and is unable to distribute that work to processors that could otherwise help.

Older APIs do not scale multiple software threads effectively, at most managing resource streaming on another thread with diminishing returns if any other operations are run. Numerous high-performance applications or engines resort to ensuring that they can perform at minimal efficiency on a rendering thread with all the logic handled on other threads, which tends to result in them generating their own command buffers.

Using all the cores

Distributing the rendering thread's workload across multiple threads leads to improved CPU performance. This is best illustrated by PowerVR's Gnome Horde demo, available as part of the PowerVR SDK and Tools package. 15 seconds into the demo, it begins generating and destroying a lot of figures per frame - on the order of 150k new draw calls per second, plus 250k re-used ones. In OpenGL ES there's not much that can be done to improve on that as the only core being used is already maxed out. Vulkan, however, makes use of all the available cores by redistributing the workload across a number of threads.

Figure 1: Vulkan CPU vs OpenGL ES CPU