Tuesday 11 September 2007
Trust No One
When is an ASCII space (0x20) not a word separator?
When it's followed by a combining mark (e.g., COMBINING ACUTE ACCENT a.k.a. Unicode character 0x301).
According to ATSUI, anyway. Uniscribe disagrees and refuses to combine marks with space characters. It will allow combination if you stick a ZWJ (0x200D) in between. Gah!
We've also discovered that ATSUI's font fallback machinery often likes to choose different fonts for the mark and the character it combines with. Madness!
This is life working on Web browsers: the environment is so complex, any assumptions you make will be violated sooner or later.
Comments
Anyway, no-one uses combining characters, except, perhaps other character sets like ANSEL.
Grabbing a pre-combined glyph (or a base-glyph and a combining-glyph) from some other font might help ensure that the resulting character looks good on its own, but it practically guarantees that it will look awkward and eye-catching among the surrounding text.
Saying "no-one will do this in real life" doesn't work; authors do all kinds of crazy stuff and we have to handle it as best we can.
Screwtape: the thing is, if the glyphs are from different fonts, how can the mark be correctly positioned? I assume each font contains tables showing where marks should be attached to base glyphs.