Saturday, 4 May 2013

Web Audio Progress

Our Web Audio implementation is making great progress. This is mainly due to the efforts of Ehsan Akhgari, who is, astoundingly, cranking out one or two features per day. Paul Adenot and I are spending hours every day just reviewing his code. I think this is partly due to Ehsan and I laying down some pretty good infrastructure at the outset.

Our current goal is to have a basically complete implementation for Firefox 24, which branches from trunk in about eight weeks. There are a few things we need to do to get there:

  • Complete the feature set. At this point that mainly means adding all the node types that aren't implemented yet: MediaStreamAudioDestinationNode, MediaStreamAudioSourceNode, MediaElementAudioSourceNode, ConvolverNode, OscillatorNode, and WaveShaperNode. The first three are all related and shouldn't be too hard since we designed Web Audio from the start to share infrastructure with MediaStreams (which are already integrated into media elements) --- internally, a Web Audio node is just a special kind of MediaStream. We still need to implement HRTF and soundfield panning modes for PannerNode. We need to implement OfflineAudioContext. For some of the audio algorithms that aren't very well specified, we're borrowing code from Blink. This is suboptimal but there's ongoing discussion about what level of detail we should specify the audio algorithms at.
  • Work on latency. Right now audio output has pretty bad latency, especially on Android, FirefoxOS and Windows; the biggest problems are intrinsic issues with the platform APIs we're using. On older versions of Android and Windows XP it may be a lost cause, because good APIs simply aren't available. For Windows Vista and up we're writing a new audio output backend using WASAPI. On FirefoxOS we may rip out the Android code we're using and replace it with PulseAudio. There is additional work to do to better integrate the MediaStreamGraph (that drives MediaStream and AudioNode processing) with our libcubeb audio backends for lower latency and better tracking of the audio clock. This latency work is desperately needed for WebRTC as well as Web Audio.
  • Work on throughput. Right now we're focusing on having a good clean design and functional correctness. For example, all communication and synchronization with the MediaStreamGraph real-time processing thread is asynchronous, using message passing. Updates to Web Audio and MediaStream graphs are batched so all changes performed by a script happen atomically on the real-time thread. But we haven't done any profiling, tuning, or optimization of the actual processing code. In particular we'll clearly need SIMD implementations of basic audio primitives such as mixing to get near-maximum performance, especially on mobile.
  • Test and fix bugs, needless to say.

Contributors welcome! The Web Audio bug (779297) has a lot of dependencies to choose from :-).

One interesting issue that Jean-Marc Valin brought up recently is the prospect of a loudness war on the Web. Some areas, such as Sony's Playstation products and some broadcast TV regions, are trying to mitigate the loudness war by standardizing acceptable dB levels for all content. It might be a good idea to do this on the Web too, and have browsers automatically limit the volume of content that exceeds those levels. We're still thinking about whether and how this should be done, and talking about it on public-audio.


  1. "It might be a good idea to do this on the Web too, and have browsers automatically limit the volume of content that exceeds those levels."

    This strikes me as a very bad idea. The Web isn't a curated platform like PlayStation, and shouldn't make value calls. What if a particular user is looking for or wants to make app that intentionally loudens quieter content? Sorry, can't do that on Firefox?

    Scratch the whole thing and let's move on to something more productive.

    1. We make value calls all the time. We often try to design features to constrain apps so that users get better security, privacy etc.

      Under any proposal the user would have the ability to make content as loud as they want. It's just that by default apps would not be able to exceed standard loudness levels.

    2. That clarification makes a lot of sense. It'll prevent advertisers from blowing out peoples speakers to attract attention with high (onLoad) volume levels. So long as the user has volume control for adjustment after loading, you have the right idea imo.

  2. Good to hear there's progress, manually doing FFT calculations what imo the browser should do makes me a sad panda. Please implement an analyser node like webkit does and make the world a better place.

    1. We already implemented AnalyserNode.

      Having said that, we believe Web apps should be able to implement their own FFTs just as efficiently as native code, and work like asm.js, Parallel.js, and future extensions of that work will make that possible.

  3. FFT should scale magically with SSE2 and above.