Friday, 30 September 2011

Shifts In Promoting The Open Web

Historically Mozilla spent quite a bit of energy promoting use of the "open Web" over proprietary platforms and non-standard browser extensions (IE6). This is still needed, but the landscape has shifted and I think our emphasis needs to change with it.

The platforms we need to worry about have changed a lot. Instead of WPF, Silverlight and Flash, the proprietary stacks competing for developers are now iOS and Android. Accordingly, the features the Web needs to catch up on are mobile focused. We need to be knocking down the barriers that make mobile app developers write native apps instead of Web apps (and we are!) and we need to be promoting the development and use of Web apps instead of native mobile apps. Demos that only work on desktop browsers are less important.

The open Web also has some interesting new platform competitors: platforms that build on (embrace and extend?) Web standards but take control of application delivery to create a vendor-controlled platform. The Chrome app store and the upcoming Windows 8 Metro app store are examples of this. I was very disappointed to see offline GMail and Google Calendar restricted to being Chrome apps. Even though Angry Birds works great in Firefox, its Chrome branding alone probably makes people think it only works in Chrome. To counter this we have to make sure that browser competition stays strong and offer developers browser-neutral Web app stores. Mozilla is working on these, of course :-). We also need to send a clear message that browser-specific app stores run counter to the open Web.

A less obvious form of platform competition is application developers targeting a single browser or browser engine. Google is explicitly telling its developers to target Chrome-only at first and support other browsers as an afterthought. That's understandable, but still disturbing. Also disturbing is that many mobile sites only target Webkit (sometimes implicitly by relying on Webkit bugs, more often explicitly by relying on -webkit-prefixed features). Many mobile developers, even developers at good places like Google, are reluctant to change this behavior. This is a huge problem for the open Web. We need an open-Web-standards campaign targeted at mobile Web developers. We need to be clear that apps working only a single browser engine, whichever engine that is, run counter to the open Web.

It's unfortunate that only Mozilla (and maybe Opera) of the major browser vendors has no vested interest in the success of a non-open-Web platform. I'm glad I work here.

One of the great things about the Web right now is the explosion of new features and standards for Web developers. However, we need to carefully distinguish good open standards from "open-washed" single-vendor initiatives. Not every proposed standard is good for the Web, even if it comes with an open-source implementation. Maciej Stachowiak points out a few Google projects --- VP8, SPDY, Pepper, and Native Client --- that, while they may be good ideas, fall short of true open standards to varying degrees. (The lack of a good spec for VP8 is an issue that we at Mozilla can and probably should address ourselves.) There are also cases where even though a good multi-vendor spec exists and some Web developers want it, the feature is not good for the Web and should be resisted. So I think when we promote the open Web we need to be very discerning about which specs we promote. Just because someone pushed out a draft spec with "CSS" (or "HTML" or "Web") in the name, and shipped a prefixed implementation, doesn't mean that that spec is, or should be, part of the open Web. People need to ask: is this feature good for the Web? is there a thorough draft spec that doesn't require reverse-engineering of an existing implementation? are there multiple implementations? is the spec actively edited with feedback from multiple vendors, and Web authors, being taken into account?

It's a challenging time. It's an exciting time. Despite the threats I mentioned, it's great to see massive investment in improvements in open Web technology. It's great to see Microsoft moving away from Silverlight towards a standards-based platform. We've won some battles, but the war for open Web standards is not over and we need to keep fighting it, on the right fronts.

Wednesday, 28 September 2011

Graphics API Design

For several years Gecko used a C++ wrapper around cairo as its cross-platform rendering API. We eventually found that the cairo API led to some inevitable performance problems and designed a new API, Azure, to address these problems. Joe Drew and Bas Schouten have already discussed Azure a bit. I want to mention some of the specific lessons we learned about graphics API design.

Stateful Contexts

Cairo uses a "stateful context" model much like Postscript or Quartz 2D. To draw content, you make individual API calls to set up various bits of state in a context object, followed by another API call to actually perform drawing. For example, to stroke a shape with a dashed line, you would typically set the color, set the line width, set the dashing style, start a new path, emit path segments, and finally draw the stroke --- all as separate API calls. In cairo, even drawing an image requires the caller to set the source surface, emit a rectangle path, and fill --- at least 6 API calls.

This design has some advantages. In typical applications you can set up some state (current color etc) and reuse it across many drawing operations. The API can consist of many logically independent operations with small numbers of parameters, instead of a few giant operations taking dozens of parameters. If your application never needs to use non-default values of certain drawing parameters, you can completely ignore them.

Downsides

Unfortunately we found that with GPU-accelerated rendering on intensive HTML <canvas> benchmarks, we were doing so many drawing operations per second that the overhead of making many cairo API calls was becoming a significant drag on performance. (In some configurations we can do over 100,000 image draws per second; a single malloc per draw call becomes significant in the profile.) We could have improved that situation by adding new cairo APIs matching the <canvas> 2D APIs, e.g. a cairo_draw_image API. However, other problems with stateful contexts and cairo's implementation led us down a different path.

Cairo collects state in its cairo_t context objects --- in its "gstate" layer --- and each drawing call passes all relevant state down to a "surface backend" object to do the drawing. Essentially it maps its "stateful context" API to a "stateless" backend API. This mapping adds intrinsic overhead --- floating point to fixed-point coordinate conversion, sometimes memory allocation to store path data, and generally the overhead of storing and retrieving state. This is an especially big deal when the underlying platform API we're wrapping with cairo is itself a stateful API, such as Quartz 2D; in that case each stateless backend drawing call performs several platform API calls to reset all the state every time we draw. Cairo forces us to go from stateful to stateless and back to stateful as we move down the rendering stack.

HTML 2D <canvas> is a "stateful context" API much like cairo's, so cairo was actually a pretty good fit for <canvas>. In the rest of the browser we use graphics APIs somewhat differently. When rendering CSS, we typically have to reset all the context state every time we draw. For example, every time we draw a border we have to set the current color to the CSS border color, set the current line width to the CSS border width, set the line dashing style to the CSS border style, etc. Effectively we treat our graphics API as stateless. Cairo's accumulation of state in its context before calling into the stateless backend to draw is just unnecessary overhead.

Another consideration is that the state of a 2D <canvas> can't be represented directly in cairo; for example, cairo has no concept of global alpha or shadows. Therefore our <canvas> implementation has to maintain its own state tracking in parallel with cairo's internal state tracking. Given that, tracking all state in our <canvas> code (above the graphics API) is not much extra work.

Given all this, it made sense to make our cross-platform drawing API stateless, so we created Azure.

Azure

Almost all the operations on an Azure DrawTarget (the nearest equivalent to a drawing context) do actual drawing and take most relevant state as parameters. The only state carried by a DrawTarget is the destination surface itself (of course) plus a current transform and a current clip stack. We let the transform and clip state remain in the DrawTarget because those are the only pieces of state not constantly reset by CSS rendering. Our CSS rendering needs to always render under some given transform and clip, and we don't want all our rendering code to have to pass those around everywhere.

So far we're only using Azure to for 2D <canvas> drawing. For optimal canvas performance we ensure that every canvas operation, including global alpha and shadows, is directly supported in Azure. The only Azure backend shipping right now is the Direct2D backend. Direct2D is mostly stateless so it's a good fit for Azure, although we've designed Azure carefully so that stateful backends like cairo or Quartz 2D will work pretty well too. (Mapping from stateless down to stateful is pretty easy and can be optimized to avoid resetting state that's constant from draw to draw.) <canvas> performance with Azure/Direct2D is much improved.

We have given up some of the advantages of stateful context APIs mentioned above --- Azure's probably a little less convenient to use than cairo. To some extent that just doesn't matter; writing more verbose API calls to get a significant performance improvement is completely worthwhile for us. But we can mitigate the worst aspects of a parameter-heavy API anyway. We group parameters together into structs like "StrokeOptions" and "FillOptions", and use C++ to assign sensible default values to the fields; this means that callers can continue to ignore features where they only want the defaults, and callers can reuse state values across drawing calls.

Insights

One of the key insights here is that Gecko is a framework, not an application, and APIs suitable for applications to use directly may be less suitable for frameworks. When you're writing drawing code for an application you know the content you're going to draw and you can write code that takes advantage of "state locality" --- e.g. setting the current color and then drawing several shapes with that color. For most of what Gecko draws, we can't do that because we don't statically know what a Web page will draw. Also, because we're a framework, we have fewer calls to graphics APIs in our code than an application might. For example, we have only one piece of code that draws all CSS gradients; we might have less than half a dozen places in our entire codebase that actually create gradient patterns. Therefore making the API calls shorter or more convenient offers us very little value.

Aside People rightly ask why we didn't just modify cairo to fit our needs better. Of course we considered it, since we've done a lot of work to improve cairo for our needs over the years. We'd need to make massive architectural and API changes to cairo, which would probably be at least as much work as writing Azure from scratch --- quite likely more work given that cairo promises a stable API, so we'd have to maintain our stateless API alongside cairo's stateful one. Cairo's API stability guarantee often gets in the way of us making improvements, and it has little use to us since we ship our own copy of cairo on most platforms; getting away from the cairo API will be helpful in that respect.

I don't think this should be considered a failure of cairo's design. Cairo was originally designed to be used directly by applications, over a very low-level backend (XRender), with much lower performance targets on very different workloads. Using it in a framework, wrapping very capable and high-level platform APIs, and expecting 100K image draws per second is far beyond those design parameters.

Thursday, 22 September 2011

Risks Of Exposing Web Page Pixel Data To Web Applications

Some Web applications require the pixel data of Web pages to be exposed to Web applications, e.g.

  • A 3D bookreader application that draws arbitrary Web pages into WebGL textures (from there, the pixel data of the pages can be extracted directly or using timing attacks)
  • An interactive virtual environment that wants to render Web content onto 2D surfaces in the environment via WebGL
  • A visual effect using 2D canvas that wants to draw a Web page into the canvas and cut it up into shards that move around under animation
  • A screensharing application that sends the contents of Web pages over a video stream to help with support issues
  • A bug-reporting tool that wants to grab the rendering of a Web page to capture in a bug report

There are some pretty big security implications here. The biggest problem is cross-origin information leakage. For example, an attack page could load a page from another origin in an IFRAME; rendering the attack page content will then capture the other origin's content and allow it to be returned to the attack server. The same goes for cross-origin images and other resources. To close this hole, we'd need to track the origins of data during painting and detect and/or block the painting of cross-origin data, which would add considerable complexity to the paint path and probably be error-prone.

Another problem is <input type="file">. In many implementations the file input control renders the complete path of the file, or at least more than just the file name; capturing the pixel data of the page would leak that information which we intentionally conceal from Web pages.

Theme drawing is another problem. By capturing the rendering of themed form controls, a page could determine what system theme the user is using. This isn't a big problem by itself but it would contribute to fingerprinting.

Update Commenters point out another problem I forgot to mention: CSS history sniffing. Access to rendered pixel data makes it easy to determine the visitedness of a link.

Any solution to the use-cases listed above needs to prevent the above problems. In Gecko we have the drawWindow API which lets you render the contents of arbitrary windows into a canvas, addressing all of the above use-cases, but it's only available to privileged content such as Firefox extensions. We've considered making it available to untrusted apps in some form, but the above issues have prevented that.

However, a little-known fact is that in Gecko we do have a limited way to render HTML content to a 2D or 3D canvas with access to the pixel data of the results! You can construct an SVG image containing a <foreignobject> containing arbitrary HTML, draw it to the canvas, and (if the image is same-origin with the page) call getImageData on the results. This approach avoids the problems above because the content of SVG images is extremely restricted in Gecko. The biggest restriction is that in Gecko, SVG images can only reference resources in the same document or loaded from data: URIs! Basically an SVG image has to be stand-alone. This prevents any kind of cross-origin attack. Issues with file controls or other interactive features are prevented because it's impossible for users to direct events to or otherwise interact with the contents of SVG images. Script can't run in SVG images, nor can script access the DOM of SVG images. Theme drawing will be disabled in SVG images.

Unfortunately this limited solution doesn't address most of the use-cases above. I don't have any good answers for those; this is a really hard problem.

Monday, 19 September 2011

Doing The Same Thing Over And Over Again And Expecting Different Results

The quote "The definition of insanity is doing the same thing over and over again and expecting different results" is popular, being widely attributed to Einstein (probably incorrectly). Unfortunately it's often used to support a common and dangerous fallacy: that if you're trying something and it's not working, you must not be using the best strategy and you must try something else.

It's easy to see the fallacy using a thought experiment. Imagine you must regularly play a lottery which offers red tickets and black tickets, and red tickets have twice the probability of winning compared to black tickets. So you choose red tickets, but you don't win. Would Einstein tell you to start choosing black tickets instead? Hopefully not!

When the best available strategy has a low probability of success, sticking to that strategy requires discipline, wisdom and courage. This is one of the themes of The Lord Of The Rings :-). You see people fail by panicking and flailing, abandoning their best hope to try something, anything, and essentially self-destructing.

Of course, one must also avoid stubbornly sticking to a strategy which is not the best available.

Overall it's probably best not to use aphorisms in serious discussions :-).

Saturday, 10 September 2011

France Vs Japan

Our family went to see the France vs Japan Rugby World Cup game at North Harbour Stadium tonight. I knew it would be fun but I didn't expect the game to be so exciting! France is ranked far above Japan (Japan have never come close to beating France, or any other team in their league) but with 20 minutes to go France was only leading 25-21 and the Japanese were attacking with gusto with the crowd roaring them on. We really believed they could win --- it was fantastic.

In the end, France ran away with it by scoring a few easy tries in the last ten minutes. It was, nevertheless, a wonderful game. It was great to see a lot of Japanese and French supporters, as well as a lot of locals supporting one or another of the teams. I got the feeling there were more supporters of Japan, but the French supporters were more vocal. Having La Marseillaise as the anthem certainly helps!

The atmosphere at the game, back in town on the way home, and also at Mt Eden last night where we watched the opening night fireworks, has been excellent. There's a strong festive mood in the city. Even on our bus home, there were random strangers wrapped in French and Japanese regalia shaking hands with each other. It's all a bit surreal and wonderful.

Friday, 9 September 2011

So It Begins

The Rugby World Cup starts today. The weather's great and it should be a wonderful six weeks. As previously noted, I have pledged to enjoy the entire event even if the All Blacks don't win --- and I don't think they will. The knockout format means that even if they would beat any given opponent 75% of the time, they have only a 42% chance of winning all three knockout rounds. People seem to have a hard time distinguishing "is the most likely team to win" vs "is probably going to win".

I think New Zealanders are coming to terms with it though. In 2007 the media was far more full of outrage and "doom and gloom" than the actual people I know --- heck, there was strong public support to retain the coaching staff, which the media found unfathomable. "Rugby-mad Kiwis despair" is too good a story, especially for overseas audiences I guess.

Still, I hope that sermons on "being content in all circumstances" are forthcoming from pulpits across the country over the next month :-).

I'll be at the stadium watching France vs Japan tomorrow, and then I'm off to California for a Mozilla meeting. Fun times.