I'm at linux.conf.au at the moment (until Wednesday) and yesterday I attended the browsers miniconf. It went well, better than I expected. I had a slot to talk about the MediaStreams Processing API proposal to enable advanced audio effects (and much more!) in browsers, which has been my main project for the last several months (see my earlier post here. I worked frantically up to last minute to create demos of some of the most interesting features of the API, and get my implementation into a state where it can run the demos. By the grace of God I was successful :-). Even more graciously, the audio in the conference room worked and even played my stereo effects properly!
I have made available experimental Windows and Mac Firefox builds with most of the MediaStreams Processing API supported. (But the Mac builds are completely untested!) The demos are here. Please try them out! I hope people view the source, modify the demos and play with the API to see what can be done. Comments on the API should go to me or to the W3C Audio Working Group.
I must apologise for the uninspired visual design and extraordinarily naive audio processing algorithms. Audio professionals who view the source of my worker code will just laugh --- and hopefully be inspired to write better replacements :-). Making that easy for anyone to do is one of my goals.
Some of the things I like about this API:
- First-class support for JS-based processing. In particular, JS processing off the main thread,
using Workers. This lets people build whatever effects they want and get reasonable performance. Soon we'll have something like Intel's River Trail in browsers and then JS users will be able to get incredible performance.
- Leverages MediaStreams. Ongoing work on WebRTC and elsewhere is introducing MediaStreams as an abstraction of real-time media, and linking them to sources and sinks to form a media graph. I don't think we need another real-time media graph in the Web platform.
- Allows processing of various media types. MediaStreams currently carry both audio and video tracks. At the moment the API only supports processing of the audio because we don't have graphics APIs available in Workers to enable effective video processing, but that will change. Applications will definitely want to process video in real time (e.g. QR code recognizer, motion detection and other "augmented reality" applications). Soon we'll want Kinect depth data and other kinds of real-time sensor data.
- First-class synchronization. Some sources and effects have unbounded latency. We want to make sure
we maintain A/V sync in the face of latency or dynamic graph changes. This should be automatic so authors don't have to worry about it.
- Support for streams with different audio sample rates and channel configurations in the same graph. This is important for efficient processing when you have a mix of rates and some of them are low. (All inputs to a ProcessedMediaStream are automatically resampled to the same rate and number of chnanels to simplify effect implementations.)
- No explicit graph or context object. It's not needed.
Most of the features in the proposed spec are implemented. Notable limitations: