How to select a digital microphone for audio capture applications.

Created: April/2020Last Updated: 5/13/2020

Introduction

The increased popularity of MEMS microphone has made its way into most of today's consumer electronics such as smart phones, smart speakers, tablet, laptop, etc. The credit goes to MEMS device's high reliability, low cost, small footprint, and simple output interface, either analog or digital, for hardware integration. For this tutorial, we will focus on the most widely used microphones in embedded system: digital PDM MEMS microphones.

As a design and system integration engineer, meeting the performance criteria and simplify design complexity is essential to product success.

Background

Micro-electromechanical systems (MEMS) is a process technology that allows miniaturization of both electrical and mechanical parts using the same silicon fabrication process found on integrated circuits such as microprocessors, hence achieving small footprint. A digital MEMS microphone has integrated functionalities such as sensing, conversion, and digital control and output.

Microphone is a type of sensor that provides acoustic measurement of the environment in the form of an electrical output. In the case of digital microphone, an integrated intermediate ADC is used to covert electrical signal(voltage waveform) to digital signals (bits).

Overview

As a good practice, microphone with high dynamic range, flat frequency response, and high linearity/minimal distortion is chosen for audio capture/recording in the case of, audio recording, voice call, and voice recognition applications.

In order to achieve high audio quality generally used aforementioned applications, following sensor characteristics, frequency response, dynamic Range, linearity, and resolution, are evaluated to meet the design requirements.

Definition

dB SPL: sound pressure level (SPL) is the analog representation of acoustic loudness for senor input.

dB FS: full scale is the digital representation of acoustic loudness for sensor output. It's maximum value is 0 dBFS.

Dynamic range: difference between upper and lower limits of sensor input expressed in decibels (dB).

Frequency response: Measured amplitude vs. frequency of the sensor.

Sensitivity: It's the sensor output expressed in dBFS when sensor input is excited at 94 dB SPL.

SNR: Signal to noise ratio of a microphone (SNR) is the difference between 94 dB SPL to noisefloor/self-nose of the sensor.

- Note: self-noise of a microphone is usually expressed as equivalent input noise (EIN) on the datasheet.

Acoustic overload point: it is the the defined as the input pressure level that produces 10% THD with 1 kHz test tone, which is usually corresponding to the clipping point of sensor output, and it's an effect nonlinearity.

Equivalent input noise (EIN) : It is the inherent electronic noise associated with microphone circuit.

- Notes: One needs to understand that this noise is not depend on the acoustic environment. (i.e the minimal detectable signal will be the same EIN value found on datasheet regardless of test conditions such as in vacuum) .

Resolution: ADC conversion capability represented bits.

- Notes: resolution provides information about the minimal distance between two measurements

Linearity: a measurement of deviation between sensor input to output linear relationship.

- Note: sensor output becomes more nonlinear at as the input level reaching to its max, and this non-linearity causes distortion that is characterized as total harmonic distortion (THD).

Total Harmonic Distortion: a measurement of non-linearity of the of sensor accounting for unwanted harmonics.

Detailed design

To start with any design, we need to understand the product requirements and then specify corresponding engineering requirements. For illustration, we are building a far field assistant TV

Note: far field in this context means that the TV product has built-in microphone for picking out voice commands at a distance.

An example product requirements:

Support far field voice for TV
Panel Size 50 inches
Support 2.0 speakers.
- 40 W speakers
Support optical audio out
Support voice pick up angle of 360 degrees.

A example engineering acoustic requirements:

Sensor Input Requirements
- Directivity: Omni-directional
- Frequency Range: 100 to 8000 kHz (Voice Band)
- Acoustic Dynamic range : > 90 dB
- Acoustic overload point: > 120 dB SPL
- SNR: 64 dB
- Acoustic EIN: < 30 dB SPL
- Flatness of Frequency: +/-3 dB relative to 1 khz
- Part Tolerance: +/- 1dB of sensitivity
- Distortion (THD): < 3% from 100 Hz to 200 Hz & <1% from 200 Hz to 8 kHz @ 100 dB SPL
Sensor Output Requirements
- Digital Dynamic range: > 90 dB
- Sensitivity: -26 dBFS
- Resolution: 16 bits
- Sampling rate: 16 kHz
- Digital Full Scale Output: 0 dBFS
- Digital EIN: < -90 dBFS

Example Microphone manufactuer part #

Minimum part: Invensense INMP 521
Recommended part: Invensense INMP 621

Q&A

How did we choose dynamic range?

In the case of TV, we measured a 100 dB SPL maximum TV volume received at the built-in microphone and use the acoustic upper limit and 30 dB SPL (a quiet bed room at night) as the lower limit. Hence the application dynamic range is set to 70 dB.

How did we choose Acoustic overload point?

A large AOP level is useful to capture audio in loud settings. We typically would choose a microphone with acoustic over load point at least 20 dB higher than measured application upper limit of the product to allow for some margin safety before excess distortion is generated at high input level. Hence, a equal or greater than 120 dB AOP is selected for the microphone

How did we choose SNR?

A SNR tells us the minimum detectable signal of the microphone. The lower limit of acoustic range in this design is 30 dB SPL; therefore, SNR is simply 94 dB SPL minus 30 dB SPL, resulting in 64 dB. Generally a high SNR is desirable.

How did we choose resolution of the sensor?

A higher resolution sensor will resolve finer step in measurements, and vice versa. The minimum resolution of sound pressure level that we need to detect is 30 dB SPL. To do that, one need to find the corresponding resolution that covers the full digital dynamic range, which equals to 90 dB divided by 6 dB/bit, resulting is 15 bits. However, most commonly used ADC in MEMS device is sigma delta ADC and has resolution that comes in 16 bits and greater, so the minimal resolution that we use is 16 bits.

- Note: Digital dynamic range is generally 6 to 10 dB higher than Acoustic dynamic range due to ADC nonlinearity and modulator limit. Make sure to always use Digital dynamic range for resolution calculation, extra bits might be needed!

How did we choose sensitivity?

A sensitivity is the mapping of 94 dB SPL sensor input to sensor output expressed in digital unit dBFS. A lower sensitivity part has higher dynamic range given same SNR at an expense of quieter audio recording. However, this is not an issue since digital gain can be applied to amplify the audio in the audio preprocessing stage using software.

A manufacture generally has multiple variants with the same SNR but with different sensitivity rating. A good designer will select the lowest sensitivity part number to achieve higher dynamic range and apply digital gain to boost the audio to recommended level. This is only true for digital MEMS microphones.

How did we choose sampling rate of the sensor?

Nyquist theorem indicated sampling rate needs to at least 2 times the highest frequency of interests to in order to fully recover the sensor input signal without any aliasing. In this case, doubling of the highest frequency 8 kHz of voice band, is 8 kHz,16 kHz sampling rate.

How did we specify flatness, tolerance, and distortion?

In generally, these characteristics are by product of manufacturing and are constantly improving. These values are guidelines based on empirical test data and experience. However, a good designer always do a competitive analysis for cost and performance among different MEMS manufacturers and select the one with best trade offs with guidelines laid out in this design tutorial.

Summary

A good audio capture product uses microphone that encompasses high dynamic range, flat frequency response, and high linearity/low distortion. As a intelligent electrical design and system integration engineer, one must equip with the design mentality of simplicity and effectiveness.

We went through an exercise choosing a Digital PDM MEMS Microphone for a smart far field assistant TV instead of using an traditional analog microphone and saw digital microphone is sufficient for our needs, and high fidelity analog part (e.g. lower self noise, and power consumption is not needed.

Lastly, having a well thought out product requirements and an engineering one as well are a MUST have before starting any design, else one has to redesign the device in later stage if product runs into quality issues.

How to select a digital microphone for audio capture applications.

Introduction

Background

Overview

Definition

Detailed design

Q&A

Summary

Further Reading

Contact email: xiaoshi@hwe.design

YouTube Channel: Official YouTube

Discord channel: https://discord.com/invite/an5y499P8Z

Medium Blog: https://medium.com/@xiaoshi_4553