Thursday, 14 September 2017

Some Opinions On The History Of Web Audio

People complain that Web Audio provides implementations of numerous canned processing features, but they very often don't do exactly what you want, and working around those limitations by writing your own audio processing code in JS is difficult or impossible.

This was an obvious pitfall from the moment the Web Audio API was proposed by Chris Rogers (at Google, at that time). I personally fought pretty hard in the Audio WG for an API that would be based on JS audio processing (with allowance for popular effects to be replaced with browser-implemented modules). I invested enough to write a draft spec for my alternative and implement a lot of that spec in Firefox, including Worker-based JS sample processing.

My efforts went nowhere for several reasons. My views on making JS sample manipulation a priority were not shared by the Audio WG. Here's my very first response to Chris Rogers' reveal of the Web Audio draft; you can read the resulting discussion there. The main arguments against prioritizing JS sample processing were that JS sample manipulation would be too slow, and JS GC (or other non-realtime behaviour) would make audio too glitchy. Furthermore, audio professionals like Chris Rogers assured me they had identified a set of primitives that would suffice for most use cases. Since most of the Audio WG were audio professionals and I wasn't, I didn't have much defense against "audio professionals say..." arguments.

The Web Audio API proceeded mostly unchanged because there wasn't anyone other than me trying to make significant changes. After an initial burst of interest Apple's WG participation declined dramatically, perhaps because they were getting Chris Rogers' Webkit implementation "for free" and had nothing to gain from further discussion. I begged Microsoft people to get involved but they never did; in this and other areas they were (are?) apparently content for Mozilla and Google to spend energy to thrash out a decent spec that they later implement.

However, the main reason that Web Audio was eventually standardized without major changes is because Google and Apple shipped it long before the spec was done. They shipped it with a "webkit" prefix, but they evangelized it to developers who of course started using it, and so pretty soon Mozilla had to cave.

Ironically, soon after Web Audio won, the "extensible Web" become a hot buzzword. Web Audio had a TAG review at which it was clear Web Audio was pretty much the antithesis of "extensible Web", but by then it was too late to do anything about it.

What could I have done better? I probably should have reduced the scope of my spec proposal to exclude MediaStream/HTMLMediaElement integration. But I don't think that, or anything else I can think of, would have changed the outcome.


  1. A valid question which I feel would have been illuminating is the one I asked in my post's title: "Who is this spec designed for? Who are we targeting?" I asked in more in a... cynical, desperate manner, but it's a question I legitimately think should be asked of every spec that floats by W3C.

    As I mention, game developers usually have solutions like FMOD and Wwise which already have different specifications of such effects and won't benefit from nodes, and the professional audio market probably won't like the cheap-o built-in things either: the market there relies on a large set of complex effects and plugins.

    I'd love to hear Chris Rogers' thoughts on who Web Audio was designed for.

    1. I don't know what Chris had in mind but I guess he had in mind the same audience that used the Core Audio builtin effects ("Audio Unit Services and Audio Processing Graph Services" I guess?).

    2. But yes, you're quite right.

      One thing to keep in mind is that in 2009 when Chris started his work the idea of compiling C/C++ code (like FMOD) to some format (JS/asm.js/WebAssembly) where you would use Web APIs was not in the picture. So there was no thought of developers using FMOD directly, and CHris probably thought "people who use FMOD" would be an audience for Web Audio because they were going to have to rewrite their apps anyway so changes in the behavior of the nodes would be palatable.