Friday 15 October 2010
The most important data structure in Gecko layout is the frame tree. Roughly speaking each frame object corresponds to a CSS box. The lifetimes of these frame objects are explicitly managed, and the structures get fairly complicated, so we sometimes have bugs where there is a dangling pointer to a frame object after it has been deleted. These bugs often used to be exploitable security bugs, since making a virtual call through a pointer to a deleted object can often be used to take control of the process using heap spraying. We fixed these bugs as fast as we could, but even after fixing all the ones we know about, experience suggests there are additional dangling frame pointer bugs we don't know about yet. So last year we implemented a mitigation technique that makes it impossible to exploit almost all bugs involving dangling pointers to frames. We call this technique frame poisoning. It incorporates some ideas that I first saw presented in a paper Memory Safety Without Runtime Checks or Garbage Collection. Most of the work was done by Zack Weinberg, who is now a PhD student working with Collin Jackson at CMU West.
Exploiting dangling pointers is essentially about exploiting type errors. An attacker writes an attacker-controlled data value to some memory location L in a deallocated object, typically by causing a new object to be allocated there. This data value is invalid for the type T the program expects when it accesses L through a pointer to the deallocated object. For example, the program may expect L to be a pointer to a C++ vtable, but the attacker has overwritten L with an integer --- which when interpreted as a vtable pointer, lets the attacker trigger arbitrary behavior.
Our approach is to prevent the first phase of the attack by making it impossible to overwrite fields of deallocated objects with values of the wrong type. We do this by ensuring that whenever the memory used by a deallocated frame object is reallocated to a new object, the new object must always be exactly the same type as the old object and at exactly the same address. Thus whenever code writes a value to a field of the new object, the value must be valid for the type T the program expects when it access that location through a pointer to the new object *or* the old object. Thus, dangling-frame-pointer attacks cannot get started.
This is implemented by keeping a "free list" of deallocated frames, one freelist per frame type. When a frame is deallocated, its memory is added to the freelist for its type. When a new frame is allocated, we check the freelist for its type and reuse memory from that freelist if the freelist is not empty, otherwise we allocate new memory for the frame.
One complication is that we actually keep one set of freelists per document (i.e., Web page). When a document goes away, all frames are destroyed and all frame memory is returned to the general application heap. This is safe because almost all persistent pointers to frames are from data structures associated with that document, data structures which are torn down when the document goes away. Thus it is very unlikely we will see bugs involving dangling pointers to frames whose document has gone away (experience bears this out).
An additional line of defense is that when frames are deallocated we fill the memory with a "poison value" --- a nonsense value which, when interpreted as a pointer, always points into a large region of invalid memory. This ensures that if the program loads a value from the memory of a deallocated object and dereferences that value as a pointer, it will almost certainly crash in a way that cannot be exploited. In practice this means when someone finds a bug involving a dangling frame pointer, the browser usually crashes immediately before doing anything an attacker would find interesting. "Frame poisoning crashes" are easy to identify because they're accesses through a pointer that is (or is close to) the "frame poison value". For more details see TestPoisonArea.
We didn't use NULL (zero) for the poison value because a lot of code tests to see if a pointer is non-null, and if so dereferences it. If that code operates on a deallocated object, we want it to crash immediately, not carry on trying to use the deallocated object in some other way.
We shipped frame poisoning in Firefox 3.6. In the end there was no significant performance impact, although we had to be careful because some of our earlier attempts did hurt performance! In particular, we optimize by not writing the poison value over deallocated frame objects when the entire document is going away.
Like all mitigations, frame poisoning is not ideal. We still fix dangling frame pointer bugs on trunk as quickly as we can. However, we know of no way to bypass frame poisoning to exploit dangling frame pointer bugs, so we may choose to not fix certain bugs on a maintenance branch where we believe frame poisoning blocks exploitation and a proper fix would be very destabilizing.