Reducing Latency in HLS Streaming: Key Tips

This is some text inside of a div block.

Join Our Newsletter for the Latest in Streaming Technology

HTTP Live Streaming (HLS) uses HTTP as the standard for video delivery. As streaming protocols shifted from proprietary solutions like Flash and RTMP, they adopted HTTP for its reliability. Modern HTML5 players rely on HTTP for streaming.

But HTTP wasn't built for live streaming. It focuses on reliable delivery, not speed. As streaming platforms push for real-time interaction, this creates a fundamental problem: How do we achieve low latency while keeping HTTP's advantages?

‍

What is HLS streaming?

HLS breaks video into small HTTP-delivered chunks that play in any HTML5 player. Your browser loads these chunks like any other web content - through the same networks that deliver websites and images. This makes HLS reliable and scalable, but it also introduces delay as chunks move through the delivery chain.

The protocol's design serves two key goals: broad device support and reliable playback. It succeeds at both, but at the cost of added latency, often 20-30 seconds between capture and playback. Unlike real-time protocols, HLS prioritizes consistent playback over immediate delivery.

‍

Why does HLS cause latency?

HLS introduces latency because it breaks video into small chunks (segments) that must be fully downloaded before playback can start. Imagine waiting for a full bucket of water before pouring instead of letting it flow continuously. This segmented approach introduces multiple delays.

Segment duration: First, segment duration directly impacts latency. Each six to ten-second chunk must finish encoding before delivery starts. Longer segments mean more stable playback but also more delay. Every frame within that segment waits for the last frame to encode before moving forward.
Playlist refresh cycle: Players must repeatedly check for new segments, usually every half-segment duration. This polling creates a constant back-and-forth that delays content discovery as players may wait seconds to recognize new segments, even when they are ready.
Buffer requirements: Players maintain a buffer of three segments (18-30 seconds) to prevent playback issues during network fluctuations. However, this buffer further delays live content delivery, as frames must sit in the buffer before being displayed.
HTTP overhead: HTTP adds overhead with each segment request requiring a new connection, which involves resolving the server address and establishing a connection. These delays accumulate across numerous segments in a stream, impacting overall performance.

‍

Overview of HLS technology

HLS moves video through a structured pipeline, each step building on the last. Think of it like an assembly line where raw video transforms into downloadable chunks.

Encoding

The encoder receives raw input of live video as a continuous stream of frames. The encoder compresses this raw feed in real time, turning bulky video data into efficient H.264 or HEVC formats. The first step:

Processes each frame immediately to prevent latency buildup
Balances quality and bitrate for different viewing conditions
Applies adaptive encoding parameters based on content complexity
Creates multiple quality renditions for adaptive streaming

Segmentation

Next comes segmentation. Compressed video is split into playable chunks, typically as MPEG-2 Transport Stream (*.ts) files. Each segment contains a few seconds of video and audio, packaged together in a format players can understand. These segments form the basic building blocks of HLS delivery.

The player can:

Easily switch between different quality levels during playback
Start from any segment so users can jump to specific points in the video instantly.

Manifest files

Manifest files track these segments. The master manifest (*.m3u8) lists different quality versions of your stream. Child manifests detail the segments within each quality level. As new segments are created, these manifests update to reflect the latest content. It's a constant process of listing, updating, and expiring old segments.

Expired segments are deleted to keep the playlist accurate
Metadata is included for playlist adjustments based on conditions

Media player

Players move through the list of manifests, starting from the top and going down to find the segments they need. They start with the master manifest, choose a quality level, fetch the child manifest, and finally download segments. Each step requires its own HTTP request, moving through the chain methodically.

Real-world impact of HLS latency

Live sports: Sports viewers see scoring alerts on their phones before the action appears on screen. The excitement of live moments diminishes when social media spoils the play.
Live auctions: When buyers see prices with 30-second delays, real-time bidding becomes impossible. The auction room and online buyers fall out of sync, creating unfair advantages and frustrated participants.
Interactive streaming: Interactive streaming suffers as streamers cannot respond to viewer comments in real-time. With responses delayed by half a minute, the spontaneous interaction that makes live streaming engaging is lost.
Video chat: Video chat through HLS becomes unusable. A natural conversation needs sub-second response times. When participants wait 30 seconds between exchanges, communication falls apart. You're not having a conversation; you're sending delayed messages back and forth.

‍

HLS vs. other streaming protocols

Different streaming protocols tackle the latency challenge through distinct approaches. Each makes specific trade-offs between speed, scale, and complexity.

In comparing HLS with other streaming protocols, FastPix’s video API uses HLS for efficient streaming. This approach ensures broad compatibility across devices, making it a reliable choice for adaptive bitrate streaming.

‍

WebRTC vs. HLS

WebRTC takes the direct approach. By establishing peer-to-peer connections over User Datagram Protocol (UDP), it achieves sub-second latency. Video flows directly between participants without server-side packaging or buffering.

This works great for video calls but struggles at scale. Each new viewer needs its connection, which quickly overwhelms servers as audience size grows. Maintaining quality across varying network conditions can be challenging, as peer-to-peer connections are sensitive to bandwidth fluctuations.

For interactive experiences like video conferencing and online gaming, WebRTC is the clear choice. It enables real-time data channels with a latency of 200-500ms.

‍

DASH vs. HLS

DASH mirrors many HLS concepts but adds flexibility. Its design supports various media formats and delivery methods. Recent DASH-LL (DASH – Low Latency) implementations achieve 3-6 second latency through chunk-based delivery and HTTP/2 push.

HTTP/2 push is a feature that allows the server to send multiple pieces of content to the player proactively without waiting for the player to request each one. This means that when a player connects to the server, it can receive not just the initial video segment but also additional segments or resources that it is likely to need soon.

For broadcast-scale streaming with millions of viewers, DASH with CMAF works best, utilizing existing HTTP infrastructure and supporting standard DRM.

CMAF is a media format that simplifies video delivery, while DASH is a streaming protocol for adaptive bitrate streaming. When used together, CMAF provides a standardized way to package media content, which DASH can then stream efficiently.

‍

SRT vs. HLS

SRT prioritizes reliability instead of speed. It adds error correction and congestion control on top of UDP, creating strong streams that can manage network issues effectively. This makes it perfect for sending video between production points or to distribution servers.

However, limited player support restricts its use for direct viewer delivery because not all media players are compatible with SRT. For live video contributions, SRT shines by handling unstable networks with excellent security, achieving a latency of 400ms-1s.

The choice between protocols depends on your specific needs. Each excels in particular scenarios:

WebRTC for small-scale interactive video
DASH for broad device support with moderate latency
SRT for reliable contribution feeds
HLS for scalable delivery to Apple devices

‍

HLS latency and streaming quality

HLS latency affects stream quality beyond simple delay. When latency increases, video segments pile up in the buffer, using more device memory and processing power. This forces players to choose between smooth playback and staying close to live content.

With higher latency, the player's adaptive bitrate (ABR) algorithm becomes less effective. By the time the algorithm detects network congestion and switches to a lower bitrate, the viewer has already buffered several high-bitrate segments. This leads to more severe quality drops instead of smooth transitions.

High latency also affects error recovery. When network errors occur, players must reload the manifest, download new segments, and rebuild the buffer. With a 30-second latency, viewers miss significant content during recovery. In comparison, low-latency streams can recover faster and skip less content.

‍

Shift to low-latency HLS

Traditional HLS wasn't designed for live streaming. Its 20-30 second latency became a major limitation as live streaming grew. Platforms needed faster delivery for sports, gaming, and interactive content.

Early attempts to reduce latency focused on shorter segments, reducing segment duration from 6 seconds to 2 seconds. However, this led to new challenges, such as increased server load and players struggling with frequent requests, while the fundamental HLS architecture still imposed delays.

CDN providers attempted to optimize edge delivery and reduce cache times, which helped but didn't solve the core issue: HLS's segment-based design required multiple segments in the buffer before playback could start.

‍

Apple’s Low-Latency HLS (LL-HLS)

Apple launched LL-HLS in 2019 as a significant upgrade to the HLS protocol. This enhancement redefined how HLS manages live streaming, aiming for sub-2-second latency while preserving the scalability and reliability that HLS is known for.

LL-HLS brought three main features:

Partial segment delivery: Allows players to start playback before full segments are complete.
HTTP/2 push: Eliminates polling delays for manifest updates.
Blocking playlist reload: Reduces unnecessary requests during playback.

These changes maintain backward compatibility. Older players can still play LL-HLS streams using traditional methods, while newer players use the low-latency features.

‍

LL-HLS technical changes

The core innovation in LL-HLS is partial segment delivery. Instead of waiting for 6-second segments, the encoder creates small chunks called "parts" - typically 200-500ms each. Players can start playback as soon as they receive the first few parts.

The manifest structure changes to support this:

‍

1#EXT-X-PART:DURATION=0.33334,URI="filePart1.mp4"
2#EXT-X-PART:DURATION=0.33334,URI="filePart2.mp4"

‍

#EXT-X-PART: This tag indicates the start of a part in the media segment.
DURATION=0.33334: This specifies the duration of the part in seconds (approximately 330 milliseconds).
URI="filePart1.mp4": This provides the location of the media file for this part.

Blocking playlist reload transforms manifest updates. Players maintain a persistent HTTP connection, and servers push updates when new parts arrive. This eliminates polling delay and reduces server load.

Players can request the next part before it's complete, reducing gaps between parts during playback. The preload hints feature helps players prepare for upcoming segments:

‍

1#EXT-X-PRELOAD-HINT:TYPE=PART,URI="nextPart.mp4"

‍

#EXT-X-PRELOAD-HINT: This tag indicates a hint for the player to preload the next part of the media segment.

‍

Practical tips for minimizing latency in HLS streaming

Managing HLS latency involves careful attention to both encoding and delivery methods. Focusing on synchronization and optimizing settings can improve the responsiveness of your stream.

Optimize encoder settings: Smart encoder configuration is key to reducing latency. While default settings handle basic streams, fine-tuning specific parameters can cut delivery time. Set your GOP length to match segment duration for immediate switching when network conditions change.
Fine-tune CDN configuration: CDN optimization demands careful attention. Cache-control headers should match your segment duration; if set too long, viewers receive outdated content, and if it is too short, your origin server faces excessive load. While modern CDNs support HTTP/2 push, it must be configured correctly. Enable it for manifest files and upcoming segments, but avoid pushing segments too far ahead.
Monitor quality: Quality switches tell you about the viewer experience. A high switch frequency may indicate aggressive buffering or unstable networks, while a low frequency suggests conservative settings that increase latency. Aim to find the right balance for your content and audience.

‍

Simplify your video streaming with FastPix

Want to make video streaming easier? Whatever your latency needs may be, FastPix gets it done. Our video API supports on-demand video and live streaming, making it easy to deliver great content. With features like in-video AI and a customizable player, you can create a viewing experience your audience will enjoy.

Frequently Asked Questions (FAQs)

What causes latency in HLS streaming, and why is it important to reduce?

Latency in HLS streaming mainly results from segment size, buffering, and network conditions. HLS typically uses chunked segments that introduce delays, as each segment must load before playing. Reducing latency is crucial, especially for live sports, gaming, and interactive streams, where real-time engagement is key. Minimizing latency ensures viewers experience content close to real-time, enhancing engagement and reducing lag-induced frustrations.

What are the primary techniques for reducing latency in HLS?

Key techniques include using Low-Latency HLS (LL-HLS), which allows smaller, faster segments and reduces latency to as low as 2-5 seconds. LL-HLS relies on smaller GOPs (group of pictures) and CMAF (Common Media Application Format) to speed up data delivery. Implementing adaptive bitrate streaming and leveraging edge computing or CDN services can further reduce latency by serving data from locations closer to viewers.

How does Low-Latency HLS differ from standard HLS?

LL-HLS optimizes traditional HLS byenabling real-time segment transmission, allowing segments to be partiallydelivered, and reducing buffering. While standard HLS segments introducedelays, LL-HLS segments are encoded and transmitted in smaller parts,significantly reducing the delay to just a few seconds. This change makesLL-HLS more suitable for interactive or time-sensitive applications likeauctions, Q&As, or gaming.

What role does network infrastructure play in reducing HLS latency?

Network infrastructure, such as CDNs and edge computing, plays a critical role. CDNs reduce latency by caching content on servers closer to the end users, minimizing data travel. Edge computing places data processing closer to the source, reducing latency in interactive streams. For broadcasters, optimizing network routes and using dedicated streaming servers can further improve latency.

What is a good latency target for HLS streaming, and how can it be maintained?

A practical latency target for HLS is around 2-5 seconds for a near real-time experience. To maintain this, monitor streaming metrics, optimize encoder settings, and adjust buffer sizes as needed. Regularly testing network conditions and ensuring adaptive bitrate streaming is enabled can also help maintain consistent low latency for different viewing environments.

Author

Ashuthosh Dubey

Product Marketing

Join Our Video Streaming Newsletter

Reducing latency in HLS streaming