High Performance Video Encoding: G&L Systemhaus zeigt, wie spezialisierte Hardware die Energieeffizienz verdreifachen kann

Geschrieben von Ben Schwarz | 09.07.2025 08:06:58

◽ Dieser Artikel ist nur auf Englisch verfügbar.

By Ben Schwarz, Greening of Streaming President

Introduction

In a world increasingly concerned with sustainability, the energy demands of digital infrastructure are coming under scrutiny, especially in streaming, where encoding and decoding dominate processing costs. At a recent Greening of Streaming meeting, G&L's Martin Schmalohr provided a deep dive into the comparative energy profiles of CPU-based versus hardware-accelerated (VPU) encoding workflows, drawing on his recent R&D work. The results point decisively toward a greener future when hardware is used optimally.

Test Environment: A Balanced, Comparable Setup

To ensure meaningful and repeatable results, Martin obtained the measurements from G&L’s Audio Video Processing Unit (AVPU), which supports both CPU and hardware acceleration. This unit is housed in a Supermicro 1U server featuring an Ampere Altra Max ARM CPU and up to 10 NETINT Quadra T1U VPUs. These VPUs, NVMe-form-factor ASICs, handle video encoding, decoding, and scaling in hardware. Software used includes FFmpeg (custom-compiled for Quadra support), libx264/x265, libsvtav1, and Ubuntu containers orchestrated by Docker. The setup is representative of a VoD transcoding environment.

G&L's AVPU, Single CPU/ Supermicro Server; Measurements taken on the same, but Dual CPU/ Gigabyte Server.

Content and Codecs: Simulating Real Workloads

Fifteen diverse 20-second test clips were used, each representing a unique scene type, ranging from high-motion events, such as marathons, to low-motion animations. These were encoded using three widely used codecs:

H.264 (AVC): Older but still dominant
H.265 (HEVC): Offers better compression at the cost of more computation
AV1: The latest in efficiency, but also the most computationally demanding

Each codec was tested using both CPU-based libraries (libx264/x265, libsvtav1) and hardware acceleration via Quadra VPU implementations. Rate control modes included constant bitrate (CBR), capped variable bitrate (VBR), and constant rate factor (CRF).

Encoding Speed: Real-Time Achievable Only With Hardware

CPU encoding struggled to keep up: H.265 and AV1 on CPUs often failed to reach real-time encoding speeds (especially for 1080p50 content), even with multiple cores. AV1, in particular, consumed high resources without hitting real-time thresholds, likely due to constraints in hyper-threading implementation and respective CPU allocation.

In contrast, the VPU (hardware-accelerated) setup maintained real-time speeds across almost all scenarios, even at higher quality settings. A single Quadra card typically delivers up to 70 fps at under 75% load, while encoding 15 HD clips at 1080p in parallel, effectively doubling or tripling throughput compared to CPU-only workflows, consuming up to 35 ARM cores.

Energy Consumption: Encoding at the Cost of Coffee

Energy use was logged every two seconds via OpenBMC, focusing exclusively on incremental energy (excluding idle server load, ~240W). CPU-encoding 20-second clips consumed 15–20 Wh, roughly equivalent to the energy required to brew a standard cup of coffee. With ASIC-based encoding, this dropped dramatically to 2–5 Wh per clip, using a single Quadra card at 66% load (with 15 parallel HD encodes).

When encoding was spread over 10 Quadra cards, power consumption increased to 5–10 Wh. Still, load per card was only ~6%, highlighting a key insight: hardware energy efficiency scales best when utilisation is high.

Quality and Control: No Major Tradeoffs

Video quality, assessed using VMAF (Netflix’s perceptual quality metric), showed no significant overall difference between CPU and VPU encodes at the same bitrate and codec profile, confirming that energy savings don’t come at the cost of fidelity, with the caveat, that codec parameters are not identical among different implementations especially from CPU to ASIC based compression, which requires some codec specific finetuning. Interestingly, rate control was less stable for libsvtav1, which occasionally deviated from the target bitrate, emphasising the maturity gap between some CPU libraries and hardware-accelerated encoding implementations.

Click to enlarge

Optimisation Takeaways

In the tested setup, NETINT ASIC acceleration is 2–3 times more energy-efficient than ARM CPU encoding. This is just a first data point, and such a metric requires validation with additional data points from various hardware setups.
H.264 shows less sensitivity to acceleration; this could be due to decades of optimisation.
AV1 benefits most from hardware, but may require tuning to maintain consistent rate control.
Parallelisation is essential: encoding multiple streams in tandem on a VPU amplifies energy savings.
Underutilised hardware loses its efficiency edge; 10 lightly used cards consume more energy than a single well-loaded one.

Next Frontier: Decoding, Systemic Integration, and Optimal VPU Load

Future work with Greening of Streaming will explore if encode optimisation could impact decoding power requirements for different device-specific decodes (Smart TV, STB, mobile devices), even marginally, which is especially important as end-user devices scale. This investigation will be worthwhile even if we reach the expected answer of no.

Questions also remain about embedded carbon, component lifespan, and whether programmable GPUs can compete with ASICs, such as the Quadra, in terms of efficiency.

Another simple test to complete these findings will be to define the relationship between the VPU’s added energy efficiency and its usage: is there a sweet spot where it delivers most of the energy savings below full load, or do we need to get as close to 100% VPU load as possible? Is the relationship linear or does it follow a typical curve?

Conclusion

Martin & G&L’s work underlines a vital message for streaming infrastructure: hardware acceleration isn’t just about speed, it’s about sustainability. To unlock its full potential, the system’s architecture must be fine-tuned for the specific video processing task and run hot and heavy. In encoding, as in many things, efficiency loves company.

Start the Conversation

Interested in VPU-powered encoding, energy metrics, or related topics? Please don’t hesitate to reach out.

Vollständigen Beitrag anzeigen