Sunday, 7 October 2007

If I Did It

I think Gecko's frame model is a big mistake.

For those just tuning in, Gecko builds a tree of nsIFrame objects for each document. Each frame represents one CSS box, a rectangular area which renders some content. A single DOM node can generate multiple CSS boxes when it breaks. For example, an inline element or text node can break across multiple lines, creating multiple boxes, and hence multiple nsIFrames in Gecko. The frames for a single DOM node are linked into a doubly-linked list, the "continuation chain".

Continuations are the root of a lot of problems. Changes in layout require changes to the frame tree; frames must be created and destroyed, pointers updated, etc. This requires a lot of code, and it's fragile, lots of bugs leading to memory-safety violations. It also consumes quite a lot of memory. For example suppose you have K nested spans wrapping a large chunk of text, which get laid out across N lines. We create K*N frames, and each one's about 50 bytes. Creating and destroying these frames is probably slow --- it's hard to measure the performance cost of
the continuation model, though.

Continuations are, in principle, good for complicated layout like pagination, columns and advanced layouts like "fill text into this box here, and then when that's full, this other box over here". But over the years, as we've worked out the details, it's become clear that in a CSS context there are grave problems. For example, with vertical breaking and elements with a specified CSS height whose children overflow that height, you run into situations where an element's children have more continuations than the element itself, and so you have no natural parent for the children's continuations. All the options are bad; lately in Gecko fantasai has created "overflow continuations", ghostly additional continuations for the element just to host its overflowing children. It works surprisingly well but nevertheless, the continuation model is leading us down an increasingly complex and disturbing road. And we will need to add still more complexity to fully handle breaking of positioned elements.

Let's take a step back and ask the question, "what's the minimal information that we really need to compute and store during layout, that will let us efficiently paint any section of a page (or handle events or compute offsets requested by script)?" For discussion's sake let's focus on blocks containing inlines.

Here's one answer: all we really need are the line breaks and some information about the y geometry of each line. A line break can be stored as a pair (DOM node, (optional) text offset). For each line we could store the y-offset of its baseline, its ascent and descent, and occasionally an "overflow area" containing the area rendered by all its descendant content if that sticks out beyond the normal line bounds.

In most cases we wouldn't need any kind of layout object for the inlines themselves! From the line information we can compute everything we need easily enough, except for complex stuff like inline-block/inline-table and replaced elements. This could really slash memory usage. It could also speed up layout and even simplify code. (By the way, Webkit doesn't use continuations like ours --- a good
move --- but they do have one render-object per element, so I think it can be improved on.)

Of course lots of code (e.g. painting) needs to inspect the geometry of actual CSS boxes. I'd provide for that by exposing a "box iterator" API that lets clients walk a
hypothetical CSS box tree.

One nifty thing this approach would allow is fairly easy implementation of vertical
writing: the box iterator could transform its output boxes to the correct orientation.
Relative positioning could also be done in the iterator.

Vertical breaking (e.g. columns and pages) could be handled in a similar way, perhaps, by storing the DOM offsets of vertical breaks. So if you have a block that breaks across pages you only have one layout object for the block, but when you ask for the CSS box subtree for the block, you specify the DOM boundaries of the page you're interested in and you get just the boxes for the content in that page.

This isn't a real proposal. There are huge questions about how all the rest of layout could be expressed in this model, huge questions about what happens to frame construction, how to cache things so that painting isn't slow, and so on. There are even huger questions about how even if this was designed, it could be implemented incrementally in Gecko or any other context. It's mainly my daydreaming. But I think there is a point here worth taking, which is that there's probably considerable headroom to improve the design of Gecko, and Webkit too.


  1. Keep the "big ideas" coming. They're interesting to read, and could really spur some innovative ideas.

  2. Absolutely there's room to avoid a rendering object per DOM object -- Netscape 1 had to run well on slow Pentium II machines, loading pages over modem links, rendering progressively, so ebina's layout engine flattened markup into something like a display list. When I added the "DOM level 0" and JS in to pre-alpha Netscape 2 in 1995, I had to cons up a content model more abstract than ebina's low-level element list. But (and this went away with the DOM standard as it evolved) it was possible to avoid over-reifying according to some big fat model.
    I'm glad you are encouraging, or provoking, this kind of thinking. Layering in specifications and architecture models can look like a powerful tool at first, but it often backfires because all abstractions leak, and Murphy was an optimist. Sam Ruby gave a fun talk about some hazards of layering ( I remember the late '80s and early '90s, when Van Jacobsen at LBL was squashing TCP/IP layering, violating two or three parts of the sacred-cow ISO seven-layer model, to win all performance bets about whether software protocol implementations could scale to 100Mbit Ethernet.
    A variation on such layering diseases has struck modern browsers' rendering pipelines, but we can find a cure -- and when we do, the performance wins will be amazing.

  3. Please, please, please can we have text at any angle?
    Even games on the spectrum had text at and angle. I am sick of using pngs, or flash to make 'cool' interfaces.
    Can we have non rectangular divs please. Circular, polygons etc, all with border, margin and padding worked out correctly. Can we then angle those divs.
    Can we have a kind of 3D space in which divs are located, so I could spin a div in 3D. (Have you seen the album browser on the new iPod? 3D divs with images and text in them)
    Or could I do this with SVG? Can you make an SVG object non rectangular? So I could make a circular SVG object, then create a sphere projection in the SVG and paste some of my DIVs onto it? Or some images? And this would be rendered in the middle of my text so the text flows around the circle SVG object?
    Can I have non-rectangular flash object please?
    If we are going for a new layout engine re-design, lets try and get it to do something more useful than mimic a newspaper (oh hell, IE can't even write text in columns) I hate it when I see better GUIs on my Playstation 1 than I can make on the web.
    So instead of a Y pos, lets have an X,Y,Z pos. Or maybe an X,Y,Z,time pos (then we can animate things)
    A native physics engine for animations. We all live in the real world where things drop and bounce and break. They slide, topple and stick to things. Why is the internet so different? Why don't new stories in drop onto the story pile? If an item is selected can I give it some energy to make it bounce and wiggle? Can I grab an image and throw it through a paragraph of text, scattering the letters. Then run time in reverse, reversing the destruction? Can I make text liquid and flow like water?
    I keep thinking we should leverage Blender into the core engine. Blender knows animated 3D, Firefox knows static 2D (a la the 1980s) lets combine the 2 :-)
    I have this feeling that kids today are laughing at the old skool HTML. One of them will invent something more fun, something amazingly better than HTML. Something that won't even compare to an N64 but will still be 1,000,000 times better than HTML. And I bet it will only run in silverlight. Ugh.

  4. Robert O'Callahan9 October 2007 11:40

    You can do a lot of things in SVG in Firefox right now, including rotated text. You can't flow text around irregular objects though.

  5. SVG?
    What's wrong with HTML? Just hack in some CSS angle: 90'
    Why should I use complicated SVG?
    Wouldn't that enable cool stuff for 99% of people writing web pages?
    Have some standard shapes - star, rect, circle, oval etc...
    I could use that to create a fan of tabs around my oval we page.
    Lets make things simpler. Please. SVG? What? Is that really simpler? I have to resort to something that complex to write text at an angle?!

  6. monkey boy, svg is just a xml dialect, it's not necessarily much more complex than it would be to do it by adding new HTML elements. In an ideal world at least :-)
    See some examples :