Among other things, I've been working away on the new textframe code. I've (re)written the layout part around the new gfxTextRun abstraction and things fit together pretty much as I expected except for a couple of issues.
One issue is that CSS :first-line and :first-letter styling doesn't map well onto the gfxTextRun concept. gfxTextRun assumes that the glyphs used for a chunk of text are largely independent of the formatting of the text into lines. I worked around this by aggressively recreating textruns when :first-line and :first-letter are in use. Ugly but effective and simple.
Another issue that came up was that we really want textruns to span multiple DOM text nodes, if those nodes are visually consecutive and use the same font. This allows chunks of shaped text to be styled differently ( e.g., as a link) without breaking shaping between chunks. Even for Latin text, it allows kerning and ligatures to work across styling boundaries. It also means that a set of text nodes adjacent in the DOM will render identically to a single text node, which is an important property to have. So I had to rework everything to allow a gfxTextRun to span DOM nodes. This added some complexity but I think it's worth it. It will make textrun setup slower, but it will make other things faster because we'll be able to create fewer and bigger textruns, which means fewer expensive calls down into Pango and other APIs.
The next step is for me to implement gfxPangoTextRun. This is requiring me to delve into Pango and figure out how to use its APIs. A key concern is to minimize the memory usage of live textruns. I think I have a scheme that will use four bytes per character in most common cases. We'll see how it goes. Once nice benefit of textrun objects is that those four bytes directly encode glyph indices and advance widths for the text, and we can also keep cairo_scaled_font_t pointers in the textrun, so painting text becomes very simple: very little more than a single pair cairo_set_scaled_font()/cairo_show_glyphs(). Repeated rendering of the same text should be about as fast as it possibly can be from our end ... any remaining problems can be blamed on cairo :-).
One remaining issue I haven't figured out yet is the problem of line endings changing the shape of characters in the textrun. For example, what happens if a font has a "tt" ligature and we hyphenate "cutting"? Also, since you can't kern across line breaks, the advance width of a character could be different if there's a line break after it. Furthermore, OpenType supports glyphs with different shapes when they occur at the beginnning or end of lines (although I don't think Pango does this yet). Currently gfxTextRuns are nearly immutable, which is very nice; the only visible mutable state they can have is cached spacing data. But I think we'll have to solve the line-ending problem by breaking that immutability and adding line ending state. Implementing this with Pango seems tricky though, for example we'll have to somehow un-ligate ligatures. I really don't want to have to recreate the entire layout, or keep enough data around in case we need to do so. For now I think I will just check at textrun creation time whether there are any break opportunities inside "problematic" text (say between two non-space characters) and only keep around the information to recreate the layout if there is at least one such opportunity.
There is a related issue: how do you draw text where the style changes in the middle of a ligature? E.g. a<a href="google.com">f</a>fluent where the font has a ligature for "ffl". I think we'll be doing what everybody else seems to do, a cheesy hack where you use clipping to cut the ffl glyph into three equal horizontal segments. (But you can't use this approach to avoid the un-ligating I mentioned. For example if you applied it to hyphenate "af-fluent", you're likely to have a pixel or two from the top of the first f appearing on the other side of the hyphen, or something equally disgusting.)