How to design an audio playback system


An audio playback path requires understanding for audio format, audio postprocessing, and audio hardware.


Audio system engineering is a mix of signal processing, codec, amplifier design, and acoustics design. It's a very interdisciplinary field that requires system level thinking and design.


The goal of this article is to provide an introduction audio playback system operation basics, architecture overview, and design areas.

Audio System Overview

Audio Pipeline

Content application -> audio service (background service)-> audio driver (kernel)-> audio hardware -> speaker.

A content application generates audio source, an example is Spotify

A audio service write down audio from content app to low level audio drivers.

A audio drivers sends digital audio stream down to the wire to audio hardware

A audio hardware coverts digital audio stream to analog signal which is played out by the speaker.

Audio Format

Uncompressed Audio: LPCM

Bit depth; 16/32 bits

Sampling rate: 44.1 kHz, 48KHz, and 98 Khz

Audio Hardware

  • Amplifier
    • Class-D Amplifiers are commonly used
  • Drivers
    • full range
    • 10w rated output
    • 4 ohms coil impedance
  • DSP
    • could be either external hardware or internal to Application processor
    • Note: simple speaker design does not need a dedicated DSP processor.

Audio Postprocessing

Limiter: it dynamically clips very intense audio peaks to reduce risk of over-driving the speaker

Compressor (e.g dynamic range conversion): it dynamically compressed audio peaks under a predefined level to reduce distortion generated by the speaker at high output power.

Equalization (sound effects): it's used to generate special sound effects of the sound system. The audio engineers adjust the amplitude for frequency of interest to achieve desired sound effect.

Cross-over filter: if more multiple speakers are used (e.g. 2 ways system containing woofer (20 to 2kHz) and tweeter (2kHz to 20kHz), a cross over filter is used to split the audio stream into two frequency bands for woofer and tweeter respectively.

System Interface


I2S: it is 4 wire bus interface that commonly used for audio data transfer. On the BUS, other protcols such as PCM (Stereo Channels) and TDM (up to 16 Channels of audio).


I2C: it's a two wire interface that is widely used communication interface for sending control and command signals.

Audio System Clock

All audio subsystem MUST be referenced to the same clock source. The reason is that different clock source has inherent clock drifts. Clock drifts from two different clock sources introduces acoustic misalignment resulting unwanted sound distortions such as beats as played out by two speakers each driven with separate clock.

How is common clocking achieved?

A clock source inside an SoC is generated by an crystal oscillator that is highly precise and stable, which desired for data transfer and timing. This clock is used as the master clock generator for all subsystems within the SoC, and one of such is audio subsystem. We Must ensure that audio hardware such as CODEC, DSP, Amplifiers, and Digital Microphone are referenced by this master clock.

MCLK from I2S interface could be used for such external reference clock output or a dedicated clock output generator pin that pinmux the internal system clock to an output pin can also be used

What if there is external digital audio source that needs to be mixed with the internal audio playback?

Ideally all audio equipment is synchronized with common master clock, however external digital audio is sampled by external clock source which will have difference frequency offset compared to internal master clock reference. When mixed both two audios with different sample rate, this will cause an audio misalignment that result in audible audio errors. Hence any cross domain clock audio system (i.e two or more separate clock) needs to be synchronized by a Asynchronous rate converter (ASRC) to address this issue.

AI Speaker Design Example

Following example is a generic voice command speaker.

AI Speaker Audio System

Block Description and Design

Speaker Amplifiers

we choose a Class-D amplifier with integrated I2S audio interface to simplified the design. Note I2S is an audio interface that can support stereo channels, Hence one I2S interface can be used to interface two speakers ampifiters


The DSP chosen here is to synchronize both speaker audio playback and microphone audio capture for voice processing. In side this DSP, postprocessing algorithm such as compressor, EQ, etc. can be applied per product requirements.

For voice processing, active echo cancellation is generally used along with noise reduction algorithm. To select a right DSP for the audio processing, one needs estimate the memory requirements (KB) as well as the computational horse power (DMIPS) as well as the DSP architecture (eg.g HI-FI 3). For simple audio processing, a generic microcontroller with DSP instruction and floating point unit can be used as well (e.g ARM Cortex M4F).

  • Note: the DSP block is for illustration purpose. A lot of digital system processor has audio system ports that supports PDM mic and I2S interface and audio post processing can be run in software or internal DSP/coprocessor.

System Clock

as we can see that common reference clock is generated by the SoC system clock fed to DSP and and DSP fed it to AMP and Microphone. This achieves synchronized audio.

Summary and conclusion

  • Understand audio pipeline of an audio playback system
  • learn about functionality of different audio hardware
  • learn that common reference clock is a MUST to synchronize audio system to reduce audio errors.
  • Go over a design steps for a voice command speaker.

In order to have a high quality audio system, we must use the right parts for right audio processing. Clocking mechanism is often the most problems seen in audio design early on.