Thursday 24 May 2007
New textframe is looking pretty good. I'm dogfooding it and so are a few other people. The list of regressions is fairly short. We should be able to turn it on as soon as the tree opens for alpha6.
In the meantime I'm thinking about the glyph bounds problem. This is the problem of certain characters in certain fonts having glyphs that are particularly large, and fall outside the "font box" for the glyph: i.e. the glyph extends above the font's specified ascent, below the font's specified descent, to the left of the glyph origin ("left bearing"), or to the right of the glyph advance ("right bearing"). When a text frame contains such glyphs, we need to carefully compute the "overflow area" --- the union of the extents of the (positioned) glyphs --- when we lay out the text. That lets us redraw the correct area efficiently if the text moves or changes. Since time immemorial we have assumed no glyphs overflow their font boxes, which is plain wrong and getting more wrong as fonts get fancier...
The hard part (as often in the world of browser text) is performance. For example, the Mac implementation of cairo_glyph_extents gets the glyph extents by retrieving the glyph outline curves from ATS and computing their bounding boxes with grim mathematics. There's no way in the world we could apply that to each glyph, even with caching. The other Mac APIs aren't better, I'm told, and the other platforms aren't looking good performance-wise either.
One possible approach is to exploit the fact that normally we only care about glyph bounds if they overflow the font box. If we could examine the font and determine cheaply for each glyph if it's guaranteed to not overflow the font box, we'd be fast on most fonts and pages, where these check will succeed and we won't need the exact bounds for the glyphs. Looking at the OpenType/TrueType font format, the 'glyf' table contains, for each glyph, min/max x and y coordinates for the glyph. But this data does not take hinting into account. Hinting could make the glyph larger than that box.
Perhaps we can be a little conservative and assume that hinting will make the glyph at most ascend one more pixel, and descend one more pixel. We could also assume that hinting will not increase the left bearing and will not increase the glyph right bearing (because its hinted advance should increase if the glyph width does). Of course these are just heuristics, and it would be a good idea test them against a large collection of fonts to see if they hold. But if they hold, we should be able to use the 'glyf' data to quickly rule out overflow for a large number of glyphs.
There is one more problem: when we actually compute the bounding box approximation for a string of glyphs, our glyph advances may include kerning. The font box definition above assumed no kerning. This probably won't worry us in practice because we generally measure substrings that end with spaces or have no characters following them, so there is no kerning across the end of the substring.
Grabbing and analyzing these tables could still be expensive. I wish there was a better way, but I can't see one right now. Possibly the way to go, if it was practical, would be to abandon "overflow areas" in Gecko, but then we repaint a lot more than we have to when documents change dynamically.
One remaining card we can play is to identify common fonts that have no overflowing glyphs and hardcode that knowledge into Gecko to avoid looking up tables. That would be ugly, but expedient.