Wednesday 17 July 2013
Avoiding Copies In Web APIs
The Web platform has matured to the point where avoiding data copying is often an important optimization to reduce running time and memory usage. Avoiding data copies often requires careful API design. The Web has two facilities that are very useful for writing such APIs: Blobs and ArrayBuffer neutering.
A Blob is a chunk of generic immutable data, optionally with a MIME type. Its contents are not directly byte-addressable, enabling browsers to easily offer multiple implementations of Blob under the hood. For example, in Gecko we support Blobs backed by in-memory buffers, Blobs backed by files, and Blobs that reference lists of other Blobs they are the concatenation of. Because Blob contents are immutable, it's easy for multithreaded or multiprocess browser implementations to safely share Blob contents in any situation without copying (including, for example, Blobs being passed between Workers via structured cloning). Any Web API dealing with generic data chunks that don't need to be byte-addressable or easily mutable should probably use Blobs, or at least support Blobs as an option.
ArrayBuffer is part of the Typed Arrays spec; it represents main-memory data that is directly addressable and mutable. Because preventing in-memory data races is an important principle of the Web platform, ArrayBuffers cannot be directly shared between Workers. Therefore we added to the Web the ability to transfer ownership of an ArrayBuffer's data between workers (and the main thread) via the Transferables abstraction. An ArrayBuffer is normally represented as a small JS object containing an internal pointer to a (usually much larger) buffer containing its elements. When an ArrayBuffer is transferred from one thread to another, the ArrayBuffer on the origin thread is neutered --- i.e. its length becomes zero; its element buffer is detached and ownership transferred to the destination thread; and on the destination thread a new ArrayBuffer object is created wrapping the transferred element buffer. The contents of the element buffer are not copied.
ArrayBuffer neutering is potentially useful for more than just transferring ownership between Workers. Some Web API methods take ArrayBuffers or typed arrays as parameters, and have to copy the data internally for later use. For example, WebGLRenderingContext.bufferData has to validate its ArrayBuffer parameter's contents and copy the data for later use --- otherwise the caller could modify the ArrayBuffer after the bufferData call has returned, and defeat any validation that was performed. If it's helpful for performance, we can add a version of bufferData that neuters its ArrayBuffer parameter and takes ownership of the data instead of copying it.
Web Audio is an interesting case. Web Audio AudioBuffers contain ArrayBuffers containing audio samples. AudioBuffer samples are made available to an audio processing thread. The question is, what happens if those sample ArrayBuffers are modified by the main thread while the audio processing thread is using them? If the memory is naively shared, the answer is "data races", which are unpopular with many people (including me). One way to fix this without adding copying is to allow an AudioBuffer to neuter its ArrayBuffers, taking ownership of their element buffers and making them immutable so no copying is needed but there's also no possibility of data races.
When designing new Web APIs, if copy avoidance is important, we should keep ArrayBuffer neutering and Blobs in mind as possible solutions.