Wednesday 28 September 2011
For several years Gecko used a C++ wrapper around cairo as its cross-platform rendering API. We eventually found that the cairo API led to some inevitable performance problems and designed a new API, Azure, to address these problems. Joe Drew and Bas Schouten have already discussed Azure a bit. I want to mention some of the specific lessons we learned about graphics API design.
Cairo uses a "stateful context" model much like Postscript or Quartz 2D. To draw content, you make individual API calls to set up various bits of state in a context object, followed by another API call to actually perform drawing. For example, to stroke a shape with a dashed line, you would typically set the color, set the line width, set the dashing style, start a new path, emit path segments, and finally draw the stroke --- all as separate API calls. In cairo, even drawing an image requires the caller to set the source surface, emit a rectangle path, and fill --- at least 6 API calls.
This design has some advantages. In typical applications you can set up some state (current color etc) and reuse it across many drawing operations. The API can consist of many logically independent operations with small numbers of parameters, instead of a few giant operations taking dozens of parameters. If your application never needs to use non-default values of certain drawing parameters, you can completely ignore them.
Unfortunately we found that with GPU-accelerated rendering on intensive HTML <canvas> benchmarks, we were doing so many drawing operations per second that the overhead of making many cairo API calls was becoming a significant drag on performance. (In some configurations we can do over 100,000 image draws per second; a single malloc per draw call becomes significant in the profile.) We could have improved that situation by adding new cairo APIs matching the <canvas> 2D APIs, e.g. a cairo_draw_image API. However, other problems with stateful contexts and cairo's implementation led us down a different path.
Cairo collects state in its cairo_t context objects --- in its "gstate" layer --- and each drawing call passes all relevant state down to a "surface backend" object to do the drawing. Essentially it maps its "stateful context" API to a "stateless" backend API. This mapping adds intrinsic overhead --- floating point to fixed-point coordinate conversion, sometimes memory allocation to store path data, and generally the overhead of storing and retrieving state. This is an especially big deal when the underlying platform API we're wrapping with cairo is itself a stateful API, such as Quartz 2D; in that case each stateless backend drawing call performs several platform API calls to reset all the state every time we draw. Cairo forces us to go from stateful to stateless and back to stateful as we move down the rendering stack.
HTML 2D <canvas> is a "stateful context" API much like cairo's, so cairo was actually a pretty good fit for <canvas>. In the rest of the browser we use graphics APIs somewhat differently. When rendering CSS, we typically have to reset all the context state every time we draw. For example, every time we draw a border we have to set the current color to the CSS border color, set the current line width to the CSS border width, set the line dashing style to the CSS border style, etc. Effectively we treat our graphics API as stateless. Cairo's accumulation of state in its context before calling into the stateless backend to draw is just unnecessary overhead.
Another consideration is that the state of a 2D <canvas> can't be represented directly in cairo; for example, cairo has no concept of global alpha or shadows. Therefore our <canvas> implementation has to maintain its own state tracking in parallel with cairo's internal state tracking. Given that, tracking all state in our <canvas> code (above the graphics API) is not much extra work.
Given all this, it made sense to make our cross-platform drawing API stateless, so we created Azure.
Almost all the operations on an Azure DrawTarget (the nearest equivalent to a drawing context) do actual drawing and take most relevant state as parameters. The only state carried by a DrawTarget is the destination surface itself (of course) plus a current transform and a current clip stack. We let the transform and clip state remain in the DrawTarget because those are the only pieces of state not constantly reset by CSS rendering. Our CSS rendering needs to always render under some given transform and clip, and we don't want all our rendering code to have to pass those around everywhere.
So far we're only using Azure to for 2D <canvas> drawing. For optimal canvas performance we ensure that every canvas operation, including global alpha and shadows, is directly supported in Azure. The only Azure backend shipping right now is the Direct2D backend. Direct2D is mostly stateless so it's a good fit for Azure, although we've designed Azure carefully so that stateful backends like cairo or Quartz 2D will work pretty well too. (Mapping from stateless down to stateful is pretty easy and can be optimized to avoid resetting state that's constant from draw to draw.) <canvas> performance with Azure/Direct2D is much improved.
We have given up some of the advantages of stateful context APIs mentioned above --- Azure's probably a little less convenient to use than cairo. To some extent that just doesn't matter; writing more verbose API calls to get a significant performance improvement is completely worthwhile for us. But we can mitigate the worst aspects of a parameter-heavy API anyway. We group parameters together into structs like "StrokeOptions" and "FillOptions", and use C++ to assign sensible default values to the fields; this means that callers can continue to ignore features where they only want the defaults, and callers can reuse state values across drawing calls.
One of the key insights here is that Gecko is a framework, not an application, and APIs suitable for applications to use directly may be less suitable for frameworks. When you're writing drawing code for an application you know the content you're going to draw and you can write code that takes advantage of "state locality" --- e.g. setting the current color and then drawing several shapes with that color. For most of what Gecko draws, we can't do that because we don't statically know what a Web page will draw. Also, because we're a framework, we have fewer calls to graphics APIs in our code than an application might. For example, we have only one piece of code that draws all CSS gradients; we might have less than half a dozen places in our entire codebase that actually create gradient patterns. Therefore making the API calls shorter or more convenient offers us very little value.
Aside People rightly ask why we didn't just modify cairo to fit our needs better. Of course we considered it, since we've done a lot of work to improve cairo for our needs over the years. We'd need to make massive architectural and API changes to cairo, which would probably be at least as much work as writing Azure from scratch --- quite likely more work given that cairo promises a stable API, so we'd have to maintain our stateless API alongside cairo's stateful one. Cairo's API stability guarantee often gets in the way of us making improvements, and it has little use to us since we ship our own copy of cairo on most platforms; getting away from the cairo API will be helpful in that respect.
I don't think this should be considered a failure of cairo's design. Cairo was originally designed to be used directly by applications, over a very low-level backend (XRender), with much lower performance targets on very different workloads. Using it in a framework, wrapping very capable and high-level platform APIs, and expecting 100K image draws per second is far beyond those design parameters.