An overview of the currently used subtitle formats
The subtitling of video content in broadcast is partly mandatory for public broadcasters in Europe – in the USA, it is basically mandatory (FCC ADA – USA, CRTC – Canada, IFT – Mexico) – also for online offerings.
In the USA, this situation has meant that the eco-system for the playback of subtitles must work on all end devices and that there are also sufficient tools for the creation and encoding/distribution for broadcast and online.
In most cases, the CEA-608 standard is used. For old SD formats, the subtitle information is transmitted in binary form in the blanking interval in line 21 and comprises only 2 bytes per frame. The subtitles are therefore distributed over several frames. This makes the post-processing of the subtitles complicated.
The possibilities of formatting the text and the possible fonts (Western Encoding) as well as the number of simultaneous languages are limited to a maximum of four (practically two).
CEA-608 in streaming formats is mostly found as SEI messages in the H264/H265 bytestream and is referred to e.g. in HLS (https://tools.ietf.org/html/rfc8216#section-18.104.22.168) or DASH (DASH-IF IOP v3.3, Section 22.214.171.124, https://dashif.org/identifiers/subtitleclosed-captioning/) (ATSC A72/1-3 – https://www.atsc.org/standard/a72-parts-1-2-and-3/).
QuickTime supports CEA608 as CC tracks, MFX as SMPTE 436m VBI/VANC tracks.
CEA-708 as a new format uses Unicode as the font and has considerably more formatting options. CEA-708 is backwards compatible with CEA-608.
However, CEA-708 is rarely used; the smallest subtitle denominator in the US/Canada is still CEA-608 – and will probably remain so for a long time.
In Europe, subtitles as a teletext service (ETSI EN 300 706 V1.2.1 (2003)) have been known to viewers since the 1970s and this is the technology for distribution. As with CEA-608, the subtitle information is located in the blanking interval of the video signal, but has a higher information density.
Teletext in HD video signals is standardised in SMPTE2031/OP47. Teletext does not play a role as a distribution format in streaming formats, but it is frequently used as a contribution format for encoding live streams.
In addition to Teletext, the DVB-Subtitles format (ETSI EN 300 743) is used as a distribution format, e.g. for DVDs. In contrast to Teletext/CEA608/CEA708, the subtitles are available as bitmaps.
Back to the subtitle formats for streaming, WebVTT is currently the common format for HTTP live streaming. WebVTT (https://www.w3.org/TR/webvtt1/ ) is a text format inspired by the SubRip (.srt) format. It can be supplemented with styling information similar to CSS type. It is specified as a recommendation by W3C and is natively supported by many HTML browsers. Many developers complain about the lack of support for the features defined in WebVTT parsers.
For HTTP live streaming, the text format is divided into individual temporal segments – similar to video/audio streams. Each individual time sequence must receive all text information to be displayed. Synchronisation with the video/audio streams is done via a metadata header in the WebVTT text that establishes a reference to the MPEG-TS PTS times (X-TIMESTAMP-MAP – https://tools.ietf.org/html/rfc8216#section-3.5).
WebVTT in HTTP Livestreaming is also supported by many players like the Google EXO player for Android or hls.js or JWPlayer/Theoplayer/Bitmovin player and is just the smallest denominator for subtitles in Livestreaming.
TTML (https://www.w3.org/TR/ttml1/) is another standard for subtitles – also a W3C standard – and is based on XML. One may wonder about the sense of having both such different formats specified as standards by W3C.
In the streaming format DASH, the subtitle formats TTML and EBU-TT-D (EBU Tech 3380 – https://tech.ebu.ch/docs/tech/tech3380.pdf), a sub-profile of TTML, are frequently encountered. EBU-TT-D is intended to be stored in the ISOBMFF fmp4 fragments for DVB-DASH (ETSI TS 103 285, 2017 (1)) and HBBTV 2.0 (ISO IEC 14496-30, 2014) (EBU Tech 3381).
In the DASH manifest, subtitles are referenced with language and other metadata and their own CoDec format.
The SMPTE specifies its own subtitle format, based on TTML – SMPTE-TT (SMPTE ST 2052-1:2013 – https://ieeexplore.ieee.org/document/7291854/?arnumber=7291854) – also a subset of TTML.
ISMC (https://www.w3.org/TR/ttml-imsc1.0.1) – another format – is intended to put an end to the proliferation of different TTML derivatives – and that is a good thing. As a sub-profile of TTML, IMSC includes the set of EBU-TT-D. This theoretically means that an EBU-TT-D parser can also read and represent ISMC. ISMC1 is supported by Apple as the preferred subtitle format in Fragmented MP4 (https://developer.apple.com/documentation/http_live_streaming/hls_authoring_specification_for_apple_devices).
As with EBU-TT-D, IMSC1 must be stored for Apple in the ISOBMFF fmp4 fragments. IMSC thus presents itself as a bridge of format for CMAF between HTTP live streaming and DASH, making this format so interesting. Theoretically, the same fmp4 fragments can be used for subtitles for HTTP live streaming as for DASH (DVB-DASH/HBBTV 2.0). The SMPTE 2052-10 and SMPTE-2052-11 specifications are intended to help convert the ageing CEA608 format to IMSC1.
Nevertheless, for all the above subtitle formats, the live sources are still mostly teletext/CEA608. Subtitle specifications – especially for IP contribution formats such as SMPTE 2110 as a sidecar – are being worked on, e.g. in the specification EBU-TT Part 3 (EBU TECH 3370 – EBU-TT Part 3 Live Subtitling). Ready-made OTT encoders for EBU-TT Part 3 Ingest do not (yet) exist.
One can only hope that the different subtitle formats will be consolidated in contribution, authoring and distribution. Side note: Final Cut Pro X now supports CEA608 authoring ;-)
Further links on the topic: