In HiFi, Virtual Reality Might Be Better Than Reality


As I write this, I'm listening to a recording of Joss Stone. Her voice sounds completely natural, hovering in the air just a few meters in front of me, placed distinctly at the center of my sound system, remaining there regardless of how I move my head. I can almost touch the ambience of the recording. The low frequency extension is great, the room modes are extremely well controlled. The listening room is remarkably well treated, with just the right amount of air and sense of space, and without the annoyance of comb filters or spectral coloration. It’s treated so well, I don’t need digital room correction. This is an experience you can’t get without a HiFi and room treatment budget of at least $100,000 USD.

The funny fact is this: I’m getting this experience with a pair of headphones. And the sound system I’m referring to? It’s a virtual one. 

We’ve succeeded in creating a virtual HiFi system using a new approach to head-related transfer functions (denoted dynamic HRTFs), a head tracker, a new 3D reverberation engine, and some other processing techniques. I will explain in more detail below, but first let’s consider the problem being addressed.


The problem of lackluster headphone sound

At Dirac, we have worked more than 15 years combatting the detrimental effects the listening room has on your HiFi experience, and we have come up with a number of solutions, including Dirac Live and, more recently, Dirac Unison. Step by step, we have removed more and more of the acoustical problems through the use of increasingly advanced digital signal processing. We can’t break physics, however. Some room acoustic and speaker-related problems are not possible to solve in practice. 

Another problem we have always wanted to solve has to do with headphone listening.  Headphones are different from anything else we listen with. Without headphones, whatever you listen to, you’ll always hear part of a sound with your right ear and another part with your left ear. Headphone listening, on the other hand, is inherently unnatural. With headphones, your ears are blocked, so the sound is prevented from entering both ears in a natural way. You see, stereo music is recorded in such a way so as to create a sound image that spans from about -30 to +30 degrees in front of you. When an instrument is hard-panned to the right, it is only played through the right speaker, and the sound is 30 degrees to the right of you if the speaker is placed at that angle. With headphones, the sound is at 90 degrees, and, not only that, it is completely blocked from your left ear, which, in reality, no sound at 90 degrees would be. This is why it can sound so annoying, and why you get listening fatigue much sooner with headphones than with a HiFi system. 

One solution is to use cross-feed. That is, you send a little bit of that right signal to your left ear as well. But if you do this the obvious way, that is, by adding an attenuated version of the right signal to the left signal and vice versa—and there is more to the music mix than just that hard-panned instrument—you’ll also introduce artifacts such as comb filtering. 

What if the solution has nothing to do with the reality of the sound, but could instead be achieved with a virtual HiFi system? A perfect listening room and ideal loudspeakers. Wouldn’t that be cool? But how could we achieve that? After all, in order to listen to music we would need some form of transducer, which invariably has some coloration.

Here was our thinking: If we can manage to design the ideal custom listening space and speakers in the virtual domain, well, then we have also succeeded in creating the ultimate room correction solution—replacing the real room with a virtual room, decked out to provide the best possible listening environment!

And so we set out to do just that.


Designing HiFi in the virtual domain

In order to simulate a sound coming from a particular direction, such as a loudspeaker playing at 30 degrees to the right of a listener, we needed to precisely simulate how that sound would propagate through the listening room and enter the listener’s ears, and then play back that signal through a set of finely optimized, high-quality headphones. In essence, we had to model the direct wave, the early reflections, diffuse reverberation, and the impact made on the sound by the body, head, and ears. This has taken some time, but now we're ready to show the world the results—and they are exciting!


Dirac Unison xc90

We reused some of the lessons learned from recreating the acoustics of the Gothenburg Concert Hall in the Bowers & Wilkins sound system for Volvo XC90. That task seemed impossible at first, but by extracting key parameters from the room acoustics and applying our most advanced multi-speaker optimization to remove as much of the car cabin’s room acoustics as possible, we made it happen. (By the way, this achievement was just awarded Best Audio System for cars over £25,000 at the Car Tech Awards.) Based on this, we then developed a very flexible 3D reverberation engine that can create ideal room acoustics in the virtual domain based on a few key room acoustic parameters. These room acoustics can be reproduced by using measurements from a real room—but, more interestingly, we can make a virtual room that sounds the way we really want a listening room to behave acoustically.

Another key innovation we used to make a virtual listening room possible, is the accurate rendering of the head-related transfer function (HRTF). When a sound arrives from a certain direction, as it enters our ear canals it is affected by the head, the ears, and the body. Altogether, this effect is called the head related transfer function. We had to get every detail right. For example, in real life, when you hear something coming from a certain direction you instinctively turn your head in that direction. Normally, you just turn your head while your torso remains still. However, the conventional HRTF models assume that the torso and head move in tandem. These are static HRTFs. Although there exists a number of public HRTF databases, as of yet, none of them correctly model the dynamic movements of the head in relation to the torso. To fix this, we created a measurement system that allows us to capture the dynamic HRTFs resulting from isolated head movements while the sound object remains in a fixed position. The measurement system also included a number of features to make it robust and able to attain the extremely high level of precision that we needed. As usual, the devil is in the details. 

It turns out that both localization and sound quality improve significantly when we take into account the exact position of the head relative to the torso. We needed to measure the dynamic HRTFs with high precision and high resolution spanning the entire 360 degree sphere. It’s a challenge to create a measurement system of that precision. Just collecting all this data with high angular resolution could take forever, unless the system were really robust and well designed. Moreover, in order to find how individual aspects of the HRTFs affect the localization and sound quality (your ears and mine are a bit different, which could mean a universal HRTF database wouldn’t be good enough for most people) we needed to measure a large number of individuals. In other words, the system needed to be able to capture all this data quite quickly, and then extract common and uncommon aspects of people’s HRTFs. 

It turns out that both localization and sound quality improve significantly when we take into account the exact position of the head relative to the torso.


When we used dynamic HRTFs instead of the static ones, which have been prevalent until today, along with a high-precision noise and error-free measurement system, we had two main findings: 

  1. The dynamic head movements play an important role in determining localization with high accuracy and without sound quality degradations.
  2.  Individualizing HRTFs, except for one key parameter that measures the head size, played a smaller role than we expected with regards to sound quality and correct localization.

The latter finding was surprising, as the current belief in the industry is that individual HRTFs are necessary for attaining high-quality positional audio with headphones. Our conclusion is that with the right approach, that’s really not the case. Sure, a minimal improvement can be obtained, but it is so minuscule that it's hardly noticeable. 

Figure 1, below, shows how the HRTF changes when the listener rotates their head (azimuth change) keeping their body still and listening to a sound source at a fixed position. As you can see, the effect of the rotation in relation to the torso is strong, which means that we must model it correctly or else we are going to color the sound negatively, and make it harder for the brain to localize the sound correctly. Investigations of the elevation and roll yield similar conclusions. The HRTFs shown here are averaged over +-10 degrees in both the elevation and roll dimensions.

Figure 1: The HRTF changes as the listener rotates their head (azimuth change) keeping their body still and listening to a sound source at a fixed position.

Figure 1: The HRTF changes as the listener rotates their head (azimuth change) keeping their body still and listening to a sound source at a fixed position.

With a head-tracker, we can track the relative movements of the head, and then, knowing the position of the sound source (a virtual loudspeaker in this case), apply the correct HRTF. This gives the direction of the direct wave and, coupled with carefully modeled early reflections and reverberation generated by the 3D reverberation engine, we can finally simulate a very realistic loudspeaker setup and room acoustics using headphones. Of course, the final issue is the imperfection of the headphones themselves, and the resonances in the ear canal. Fortunately, Dirac HD Sound technology optimizes the transfer function of a pair of headphones, and makes it possible to remove most of its acoustical imperfections. 

When we combine all of this, we are able to achieve the results which have allowed me to listen to Joss Stone in an extremely well designed virtual listening room with a head-tracker mounted to the top of my headphones. When I take off my headphones and compare the sound to that which I obtain from a pair of studio monitors, I realize that virtual reality has just allowed me to experience a far more pleasing HiFi reproduction. For critical listening, like in the recording studio, I now firmly believe that virtual reality has a serious advantage over “real” reality. Or why not use the exact same technology to create a virtual home cinema system on par with, or even better than, the most exclusive high-end surround systems in the world? Who knows what the future of HiFi will look like at the dawn of VR. 

Oh, and if you’d like to experience this virtual HiFi system for yourself, please drop by and see us at CES this January, at our suites at the Venetian.

- Mathias Johansson, CEO at Dirac Research