Parsing ADTS Audio Frames with Python: A Developer's Guide

What Is ADTS?

ADTS (Audio Data Transport Stream) is a format for streaming AAC audio without a container. Each AAC frame in an ADTS stream is prefixed with a 7-byte (or 9-byte with CRC) synchronization header containing metadata about the frame. This self-framing structure allows decoders to synchronize to a stream at any point — critical for broadcast and HTTP streaming applications.

If you've ever processed raw .aac files, worked with HLS audio segments, or extracted audio from MPEG-TS streams, you've almost certainly encountered ADTS. This guide walks through the ADTS header structure and shows how to parse it programmatically in Python.

ADTS Header Structure

An ADTS header is either 7 bytes (without CRC) or 9 bytes (with CRC). The fields, packed into the header from most-significant bit to least-significant, are:

Field	Bits	Description
Sync word	12	Always 0xFFF — used to find frame boundaries
ID	1	0 = MPEG-4, 1 = MPEG-2
Layer	2	Always 00
Protection absent	1	1 = no CRC, 0 = CRC present
Profile	2	AAC profile (0=Main, 1=LC, 2=SSR, 3=reserved)
Sampling freq index	4	Index into sampling frequency table
Private bit	1	Set freely by encoder
Channel config	3	Number of channels
Originality/copy flags	4	Various flags
Frame length	13	Total size of ADTS frame (header + data) in bytes
Buffer fullness	11	0x7FF = VBR stream
Number of AAC frames	2	Number of raw frames in this ADTS frame minus 1

Sampling Frequency Index Table

The 4-bit sampling frequency index maps to standard sample rates:

0: 96000 Hz
1: 88200 Hz
3: 64000 Hz
4: 48000 Hz
4: 44100 Hz
6: 32000 Hz
... (continuing down to index 12: 7350 Hz)

Python Parser: Reading ADTS Frames

Below is a minimal Python function that reads an ADTS stream from a file and yields parsed frame metadata:

SAMPLE_RATES = [
    96000, 88200, 64000, 48000, 44100, 32000,
    24000, 22050, 16000, 12000, 11025, 8000, 7350
]

def parse_adts_frames(filepath):
    with open(filepath, 'rb') as f:
        data = f.read()

    offset = 0
    frames = []

    while offset + 7 <= len(data):
        # Check sync word (12 bits = 0xFFF)
        if data[offset] != 0xFF or (data[offset+1] & 0xF0) != 0xF0:
            offset += 1
            continue

        b1 = data[offset+1]
        b2 = data[offset+2]
        b3 = data[offset+3]
        b4 = data[offset+4]
        b5 = data[offset+5]
        b6 = data[offset+6]

        mpeg_id       = (b1 & 0x08) >> 3
        protection    = (b1 & 0x01)
        profile       = (b2 & 0xC0) >> 6
        freq_idx      = (b2 & 0x3C) >> 2
        channel_conf  = ((b2 & 0x01) << 2) | ((b3 & 0xC0) >> 6)
        frame_length  = ((b3 & 0x03) << 11) | (b4 << 3) | ((b5 & 0xE0) >> 5)

        sample_rate = SAMPLE_RATES[freq_idx] if freq_idx < len(SAMPLE_RATES) else None

        frames.append({
            'offset': offset,
            'mpeg_version': 2 if mpeg_id else 4,
            'has_crc': not bool(protection),
            'profile': profile + 1,
            'sample_rate': sample_rate,
            'channels': channel_conf,
            'frame_length': frame_length,
        })

        if frame_length == 0:
            break
        offset += frame_length

    return frames

Using the Parser

Once you have the frame list, you can extract useful information about any ADTS file:

frames = parse_adts_frames('audio.aac')
print(f"Total frames: {len(frames)}")
print(f"Sample rate: {frames[0]['sample_rate']} Hz")
print(f"Channels: {frames[0]['channels']}")
print(f"MPEG version: {frames[0]['mpeg_version']}")

Common Pitfalls

False sync words: The byte sequence 0xFF 0xF? can appear in audio data. Always validate by checking that the frame length makes sense and that the next frame also starts with a sync word.
CRC bytes: If protection_absent is 0, there are 2 additional CRC bytes after the 7-byte header before the audio data begins.
Zero frame length: A frame_length of 0 is invalid and usually indicates a corrupt or truncated file.

Libraries to Consider

For production use, consider established libraries rather than hand-rolled parsers:

FFmpeg / PyAV: Robust, battle-tested AAC/ADTS decoding with Python bindings
mutagen: Python library for reading audio metadata including AAC files
libavformat (C): The underlying library FFmpeg uses, available via ctypes if needed

Hand-rolling a parser is a great learning exercise, but for anything handling untrusted or diverse media files, use a well-tested library that handles edge cases gracefully.