◽ Dieser Artikel ist nur auf Englisch verfügbar.
By Ben Schwarz, Greening of Streaming President
In a world increasingly concerned with sustainability, the energy demands of digital infrastructure are coming under scrutiny, especially in streaming, where encoding and decoding dominate processing costs. At a recent Greening of Streaming meeting, G&L's Martin Schmalohr provided a deep dive into the comparative energy profiles of CPU-based versus hardware-accelerated (VPU) encoding workflows, drawing on his recent R&D work. The results point decisively toward a greener future when hardware is used optimally.
To ensure meaningful and repeatable results, Martin obtained the measurements from G&L’s Audio Video Processing Unit (AVPU), which supports both CPU and hardware acceleration. This unit is housed in a Supermicro 1U server featuring an Ampere Altra Max ARM CPU and up to 10 NETINT Quadra T1U VPUs. These VPUs, NVMe-form-factor ASICs, handle video encoding, decoding, and scaling in hardware. Software used includes FFmpeg (custom-compiled for Quadra support), libx264/x265, libsvtav1, and Ubuntu containers orchestrated by Docker. The setup is representative of a VoD transcoding environment.
Fifteen diverse 20-second test clips were used, each representing a unique scene type, ranging from high-motion events, such as marathons, to low-motion animations. These were encoded using three widely used codecs:
Each codec was tested using both CPU-based libraries (libx264/x265, libsvtav1) and hardware acceleration via Quadra VPU implementations. Rate control modes included constant bitrate (CBR), capped variable bitrate (VBR), and constant rate factor (CRF).
CPU encoding struggled to keep up: H.265 and AV1 on CPUs often failed to reach real-time encoding speeds (especially for 1080p50 content), even with multiple cores. AV1, in particular, consumed high resources without hitting real-time thresholds, likely due to constraints in hyper-threading implementation and respective CPU allocation.
In contrast, the VPU (hardware-accelerated) setup maintained real-time speeds across almost all scenarios, even at higher quality settings. A single Quadra card typically delivers up to 70 fps at under 75% load, while encoding 15 HD clips at 1080p in parallel, effectively doubling or tripling throughput compared to CPU-only workflows, consuming up to 35 ARM cores.
Energy use was logged every two seconds via OpenBMC, focusing exclusively on incremental energy (excluding idle server load, ~240W). CPU-encoding 20-second clips consumed 15–20 Wh, roughly equivalent to the energy required to brew a standard cup of coffee. With ASIC-based encoding, this dropped dramatically to 2–5 Wh per clip, using a single Quadra card at 66% load (with 15 parallel HD encodes).
When encoding was spread over 10 Quadra cards, power consumption increased to 5–10 Wh. Still, load per card was only ~6%, highlighting a key insight: hardware energy efficiency scales best when utilisation is high.
Video quality, assessed using VMAF (Netflix’s perceptual quality metric), showed no significant overall difference between CPU and VPU encodes at the same bitrate and codec profile, confirming that energy savings don’t come at the cost of fidelity, with the caveat, that codec parameters are not identical among different implementations especially from CPU to ASIC based compression, which requires some codec specific finetuning. Interestingly, rate control was less stable for libsvtav1, which occasionally deviated from the target bitrate, emphasising the maturity gap between some CPU libraries and hardware-accelerated encoding implementations.
Future work with Greening of Streaming will explore if encode optimisation could impact decoding power requirements for different device-specific decodes (Smart TV, STB, mobile devices), even marginally, which is especially important as end-user devices scale. This investigation will be worthwhile even if we reach the expected answer of no.
Questions also remain about embedded carbon, component lifespan, and whether programmable GPUs can compete with ASICs, such as the Quadra, in terms of efficiency.
Another simple test to complete these findings will be to define the relationship between the VPU’s added energy efficiency and its usage: is there a sweet spot where it delivers most of the energy savings below full load, or do we need to get as close to 100% VPU load as possible? Is the relationship linear or does it follow a typical curve?
Martin & G&L’s work underlines a vital message for streaming infrastructure: hardware acceleration isn’t just about speed, it’s about sustainability. To unlock its full potential, the system’s architecture must be fine-tuned for the specific video processing task and run hot and heavy. In encoding, as in many things, efficiency loves company.
Interested in VPU-powered encoding, energy metrics, or related topics? Please don’t hesitate to reach out.