Creating the Perfect Sound System with 3D Sound Reproduction


Over the last decades there’s been a lot of research devoted to figuring out how to record a sound event and exactly reproduce the original sound field in a different location. So far, this has not been possible to achieve for sound reproduction in a consumer environment with a reasonable number of loudspeakers. 

However, recently we have seen a lot of innovation in spatial sound reproduction, with applications ranging from TV and cinema, to games and VR. For instance, with cinema applications, new 3D sound formats are using what is known as object-based audio. Meaning that instead of storing sound as a fixed number of channels that must be mapped to a specific loudspeaker layout, the audio is stored as a variable number of audio objects. The audio objects contain the audio data and metadata that describe the direction of the sound in space, plus much more. These objects are then “rendered” by a processor on the available speakers and the speaker layout becomes more flexible.

Such innovation, and the possibilities it presents, is exciting. But when it comes to achieving pure music listening for the home environment, do the best of today’s formats and sound systems represent the utopia of good sound? Or are there still untapped possibilities for even better experiences? To start thinking about this question it is necessary to take a systems approach and consider how music is recorded in the first place, and how it can be rendered on a multi-channel sound system. Follow me as I think “out loud” through this question step by step. 

Step 1: Create ambience for music enjoyment

First things first. The most important thing you can do to enjoy good sound, even in the future when you have the perfect sound system, is to ensure you have a nice listening environment. You need a comfortable sofa. You need quiet. You need a calm and relaxed mind, and perhaps a good whiskey in your hand. And, of course, you need a selection of your favorite music which has just come out in the new format of the perfect sound system.

Step 2: Record reality

When you put a single microphone in a room, the microphone performs an incredible information reduction by discarding all the information about where the sound is coming from and just adding everything together so that a 1-dimensional signal results. When recording spatial sound today, multiple directional microphones are often employed in an unwieldy setup. For the sound system of the future, however, a bit more sophistication is needed. 

Using the DSP technology that’s available today, it is possible to build a 3D microphone with convenient form factor that retains a lot of information about where sound is coming from in a sound field. A great application would be the recording of whole ensembles and ambient sounds. Ambisonics is a similar technology that has been around for ages, but it doesn’t benefit from today’s digital technology and I’d argue that a more robust, easier to use, and better performing solution ought to be possible. 

When recording an individual instrument today, it is common to put one or several microphones close to the instrument and then mix the signals together into one or two channels. The problem is that all instruments have highly complex radiation patterns and the “sound” of an instrument is determined by how it interacts with its acoustical surroundings. This “sound” is easily lost by the close-miking technique. Even if early reflections are present in the recording, information about where these reflections are coming from is lost.

So, for the ideal sound system, another procedure could be as follows: several directional microphones pick up the sound from the instrument in different directions, but these signals are not mixed together. Instead, all information is used to render the instrument in the listening room. More about sound rendering below.

Step 3: Objectify the sound stage

If the recordings capture the direction of sounds, then, of course, the audio format used to store the recording must retain this information. This is most conveniently accomplished by storing the audio as several audio objects. Each audio object contains waveform data plus metadata which describes, for example, when the sound is playing, from where, and how loud. Audio objects can be divided into different types. A 3D microphone would need a specific audio object type that’s related to it. Such an audio object could, for example, contain several audio waveforms representing sound coming from different directions. 

Storing audio as objects makes it much easier to render it in an optimal way for different sound systems. If a stereo system is used, the full 3D sound field is downmixed to two channels by the rendering engine. The object-based data stream could contain metadata describing how this should be done so that the mixing engineer has control over the process. If the listener has a multi-channel sound system, the rendering engine can make optimal use of all the speakers available. 

If the audio object format were furthermore an open format, the entire industry could use its creative powers to develop rendering solutions. Perhaps the listener is wearing a VR headset with headphones and head-tracking. In this case, it is a big benefit to use audio objects because in 3D sound rendering for headphones there is no need to rely on phantom sources (a technique where you pan sound to positions between two loudspeakers).

If the object audio stream is being broadcasted to a mobile device, then to save bandwidth the mobile device could transmit its audio rendering capabilities/available bandwidth so that a downmixed version of the audio could be broadcasted to it.

Modern cars often have high-end, multichannel sound systems that would benefit from object-based source material as well.

Step 4: Determine the sound rendering scenario

As I discussed in a previous blog post, a stereo sound system is theoretically suited to emulate a sound scenario where the front wall between the speakers opens up to allow you to “listen in” to the recording sound environment. With surround sound, you can emulate the “you are there” scenario where you are transported to a different sound environment. A third possibility, in addition to the “listen in to the recording environment” and the “you are there” scenarios, is to emulate the sensation that instruments are actually playing in your own listening room. 

As discussed above, instruments have different radiation patterns and interact with the acoustic environment in a unique way. Loudspeakers that could take a multichannel recording of an instrument and emulate the directivity pattern of the instrument would offer great artistic possibilities. For such a speaker, any room interaction would not be a coloration of the sound, it would be as natural as having the instrument playing in your room. For a regular sound signal, the adjustable directivity could be used to adjust the amount of room sound and make the speaker interact optimally with the listening room. 

I would say the goal should be to produce a believable sound event, and not necessarily a perfect reproduction of the original sound field. An artistic mix of the ”you are there,” ”artist playing in my living room,” and “acoustic opening in front wall” scenarios would probably give very enjoyable experiences.

Concluding Remarks

To summarize, I think the future will bring interesting possibilities for innovation in sound reproduction. By taking a systems approach, we can especially improve the spatial aspects of sound reproduction. Several innovative technologies are already available and ready to use or be developed further; like digitally optimized 3D microphones, digitally optimized loudspeaker arrays for radiation pattern control, and Dirac Unison technology for exact control of the low frequency sound field in a room.

For further reading, also see my previous blog posts (How to Make Headphones Stereo-Compatible and Under the Hood of the Stereophonic System: Phantom Sources) where I elaborate on a few technical obstacles with stereo sound systems; like the coloration that comes with the use of phantom image sources, and the challenge of reproducing room sound and reverberation in stereo. 

- Viktor Gunnarsson, Senior Research Engineer at Dirac Research