The Need for a Squeeze: Making Big Music Fit into Small Devices
October 14, 2016


As the listening habits of consumers shift towards ever smaller and more compact playback devices, the sole application of linear and time-invariant processing methods (such as sophisticated equalization technologies) is not necessarily sufficient for reaching the desired and often conflicting requirements of audibility, low distortion, tonal balance, bass response, loudness, etc. DSP system design for small loudspeakers is inherently a compromise, but by using cutting-edge digital technologies we can achieve results which invariably take any micro-speaker to the next level of performance.
Read on to find out why dynamic range processing is practically a must-have when optimizing the sound of a mobile phone or any other miniature sound device, as well as to get an introduction to some basic parameters for dynamic range compression.
The dynamic range of an audio signal
The dynamic range of an audio signal is the ratio between its loudest and weakest parts. This is of course a simplification, as a piece of music typically contains some silence which would imply an infinite dynamic range. But our discussion about dynamic range has to start somewhere, and a loose definition, such as the average ratio between the levels of the highest and lowest parts of the signal, will suffice for now. The dynamic range of a digital signal is limited by the number of bits being used, for example, a 16-bit recording has a maximum dynamic range of 96dB.

Figure 1: Example of two different audio signals which exhibit a large and a small dynamic range, respectively. In the upper signal there is generally a larger level variation in the amplitude envelope, i.e., a larger dynamic range.
Every loudspeaker has an upper limit to how loud it can play, how much heat it can tolerate, as well as a maximum cone excursion. When talking about the dynamic range of a loudspeaker, it makes sense to consider the ratio of the highest level the loudspeaker can safely reproduce to the lowest level a listener can perceive. At the lower limit of hearing the signal remains constant, and the dynamic range is determined by the maximum playback level of the loudspeaker.
The human ear can handle an enormous range of sound pressures. The lower threshold of hearing is highly frequency-dependent. Since miniature loudspeakers are often used in relatively noisy environments, the threshold of perception raises due to the phenomenon called simultaneous masking, which is when the presence of an unwanted sound (the background noise) prevents perception of a desired sound.
The mismatch problem
When music is produced, sound engineers and mastering engineers take great care to ensure that the finished result sounds good on as wide a range of sound systems as possible; from large sound systems to headphones. The tools they use are equalizers, compressors, limiters, and enhancers.
Over the last five or so decades there has been a distinct trend towards less dynamics, the so-called “loudness wars.” The amount of dynamics still varies a lot between tracks and in many, if not most, cases the dynamic range of the music is larger than what is optimal for a miniature loudspeaker. And we should be thankful for that— if all music were optimized for portable devices, it would sound terrible for those of us who listen with headphones or a HiFi system. Unfortunately, however, a large dynamic range in the music quickly becomes a problem when the volume is reduced and background noise is present.

Figure 2: In a high-noise environment, an audio signal with a large dynamic range will be largely masked by the noise, even at maximum volume.
The problem of dynamic range mismatch is now abundantly clear, but what should be done to correct it? Ideally every track should be re-mixed and re-mastered, with the studio monitors replaced by the specific mobile phone. Background noise should also be present so that masking effects can be accounted for already at the mixing and mastering stage. But, as nice as it would be, this is clearly not an option.
The solution: adaptive dynamic range processing
A real-world solution would be to instead employ adaptive dynamic-range processing to optimize each music signal subject to the constraints of the mobile-phone loudspeaker. The aim is not to add color to the sound or use the dynamics processing as a creative tool, but to maintain audibility in various environmental conditions, with all types of music, and for different volume settings.
A purist might argue that we shouldn’t alter the music and change its character to something different than what the producers intended. We agree, but who would prefer hearing only parts of a song because most of it is disappearing in background noise? Or having a volume control that’s unusable because the sound drowns in background noise as soon as the volume is reduced even one step down from max? The vast majority of people agree to sacrifice a small amount of purity for the large gain in audibility.

Figure 3: The same audio signal as in the previous figure, before (upper plot) and after (lower plot) dynamic range compression.
Fundamentally, dynamic-range processing in its simplest form is a simple volume control; a multiplication with a gain factor. Long-term adjustment of the gain could be performed manually by a sound engineer who reduces the dynamics by lowering the gain when the signal, as he perceives it, is too loud, and raising it when it’s too weak.

Figure 4: The two basic structures for implementing dynamic range controllers. The processing in the sidechain has been simplified to a single gain computation block for clarity.
In the mobile-phone context, it’s more common that the engineer is relieved from this duty and replaced by a digital dynamic-range controller. The dynamic-range controller analyzes certain features of the signal (typically RMS or peak of the input signal) and constantly strives to meet a certain output target level for each input level. This target level is given by the static relationship between the input and output (see Figure 5).

Figure 5: Static transfer characteristic of a hard-knee compressor.
This transfer characteristic is called compression because (you guessed it) the dynamic range of the input is compressed to become smaller. For input levels above the threshold, the output level is reduced by a certain ratio (in the example above the compression ratio is 1:3). The make-up gain is used to compensate for what would otherwise be an overall gain reduction. The net effect is that low-level signals (below the threshold) are boosted more than high-level signals.
As simple as the basic concept is, there are many perceptual aspects involved in making radical changes to the music while maintaining its basic character. We want the processing to make a dramatic difference, yet not be too obvious.
There are several additional small speaker sound-optimization criteria besides maintaining signal audibility. These include loudness maximization, avoiding over-amplification of noise, and ensuring that the loudspeaker is safe from overheating and over-excursion. Other forms of dynamic-range processing, such as limiting and expansion, are used to reach such criteria. But that would be a topic for another day.
Want to learn more about Dirac’s processing solutions for small and miniature loudspeakers and mobile audio solutions? Keep following this blog or contact us for a demo.
– Mattias Arlbrant, Senior Research Engineer at Dirac Research