Tuesday 17 June 2008
A while ago I wrote about the deep computer science problems of strings. Now I need to talk about another poorly understood data structure: rectangles. There are a couple of issues.
x1,y1,x2,y2 vs x,y,width,height Windows uses the former, most other libraries that I know of use the latter. If you mostly work with coordinates relative to the rectangle origin, then you'll probably prefer width+height, but otherwise I actually think Windows got this right! A lot of rectangle operations, such as union, intersection, and point containment are simpler in the x2+y2 format.
Union of empty rectangles Should the 'union' operation ignore empty rectangles or include them in the calculations? For example, what is the union of (100,100)-(100,200) with (100,100)-(200,100)? It depends on the use-case and is quite subtle. When the rectangle represents a set of pixels, for example a damage area that needs to be repainted, then you want 'union' to ignore empty rectangles, so the result in my example could be any empty rectangle. But when the rectangle represents the bounds of a set of points, then you want to return a non-empty rectangle in this case --- (100,100)-(200,200). (The difference between a pixel and a point is that a pixel has area, usually a 1x1 square, but a point is a mathematical point of zero size.) An example of this latter usage is when you're computing the bounds of a set of CSS boxes in order to determine how big a container must be to include them all.
So we want two forms of 'union' operator, or perhaps better still we should have two entirely different data structures for these two tasks. For pixel-set rectangles, there is only one logical "empty rectangle" --- all rectangles with zero width or height should be considered equal. For point-set rectangles, even if the width is zero the height and top-left can still be significant, and vice versa for height. This suggests we should separate the concepts.
The problem with having two data structures is that very often we need to convert one to the other --- to find the set of pixels that cover a point-set rectangle. (Off the top of my head, I can't think of situations where we need to convert pixel-sets to point-sets.) It would also be rather confusing for developers since I'm not aware of any libraries that make this distinction. Also, in Gecko, rectangles like the "overflow areas" are used to mean both pixel-sets and point-sets. For now, we'll just stick with different 'union' operators.