Friday, 15 October 2010

Mitigating Dangling Pointer Bugs Using Frame Poisoning

The most important data structure in Gecko layout is the frame tree. Roughly speaking each frame object corresponds to a CSS box. The lifetimes of these frame objects are explicitly managed, and the structures get fairly complicated, so we sometimes have bugs where there is a dangling pointer to a frame object after it has been deleted. These bugs often used to be exploitable security bugs, since making a virtual call through a pointer to a deleted object can often be used to take control of the process using heap spraying. We fixed these bugs as fast as we could, but even after fixing all the ones we know about, experience suggests there are additional dangling frame pointer bugs we don't know about yet. So last year we implemented a mitigation technique that makes it impossible to exploit almost all bugs involving dangling pointers to frames. We call this technique frame poisoning. It incorporates some ideas that I first saw presented in a paper Memory Safety Without Runtime Checks or Garbage Collection. Most of the work was done by Zack Weinberg, who is now a PhD student working with Collin Jackson at CMU West.

Exploiting dangling pointers is essentially about exploiting type errors. An attacker writes an attacker-controlled data value to some memory location L in a deallocated object, typically by causing a new object to be allocated there. This data value is invalid for the type T the program expects when it accesses L through a pointer to the deallocated object. For example, the program may expect L to be a pointer to a C++ vtable, but the attacker has overwritten L with an integer --- which when interpreted as a vtable pointer, lets the attacker trigger arbitrary behavior.

Our approach is to prevent the first phase of the attack by making it impossible to overwrite fields of deallocated objects with values of the wrong type. We do this by ensuring that whenever the memory used by a deallocated frame object is reallocated to a new object, the new object must always be exactly the same type as the old object and at exactly the same address. Thus whenever code writes a value to a field of the new object, the value must be valid for the type T the program expects when it access that location through a pointer to the new object *or* the old object. Thus, dangling-frame-pointer attacks cannot get started.

This is implemented by keeping a "free list" of deallocated frames, one freelist per frame type. When a frame is deallocated, its memory is added to the freelist for its type. When a new frame is allocated, we check the freelist for its type and reuse memory from that freelist if the freelist is not empty, otherwise we allocate new memory for the frame.

One complication is that we actually keep one set of freelists per document (i.e., Web page). When a document goes away, all frames are destroyed and all frame memory is returned to the general application heap. This is safe because almost all persistent pointers to frames are from data structures associated with that document, data structures which are torn down when the document goes away. Thus it is very unlikely we will see bugs involving dangling pointers to frames whose document has gone away (experience bears this out).

An additional line of defense is that when frames are deallocated we fill the memory with a "poison value" --- a nonsense value which, when interpreted as a pointer, always points into a large region of invalid memory. This ensures that if the program loads a value from the memory of a deallocated object and dereferences that value as a pointer, it will almost certainly crash in a way that cannot be exploited. In practice this means when someone finds a bug involving a dangling frame pointer, the browser usually crashes immediately before doing anything an attacker would find interesting. "Frame poisoning crashes" are easy to identify because they're accesses through a pointer that is (or is close to) the "frame poison value". For more details see TestPoisonArea.

We didn't use NULL (zero) for the poison value because a lot of code tests to see if a pointer is non-null, and if so dereferences it. If that code operates on a deallocated object, we want it to crash immediately, not carry on trying to use the deallocated object in some other way.

We shipped frame poisoning in Firefox 3.6. In the end there was no significant performance impact, although we had to be careful because some of our earlier attempts did hurt performance! In particular, we optimize by not writing the poison value over deallocated frame objects when the entire document is going away.

Like all mitigations, frame poisoning is not ideal. We still fix dangling frame pointer bugs on trunk as quickly as we can. However, we know of no way to bypass frame poisoning to exploit dangling frame pointer bugs, so we may choose to not fix certain bugs on a maintenance branch where we believe frame poisoning blocks exploitation and a proper fix would be very destabilizing.



10 comments:

  1. Pseudonymous Coward16 October 2010 05:05

    Sounds like solid computer science. Keep it up, Mozilla!

    ReplyDelete
  2. James Napolitano16 October 2010 07:17

    If this was implemented back in Gecko 1.9.2, why is this only being described now? (It was mentioned in some meeting notes, but I don't remember seeing any real explanation of what it was). Was there need to check if this really behaved as planned before letting potential attackers know how it worked?

    ReplyDelete
  3. Robert O'Callahan16 October 2010 10:23

    Yeah, something like that. There was no real plan. For a while we didn't want to talk about it until we were confident in it, and then we kinda forgot to talk about it.

    ReplyDelete
  4. Hey Roc,
    Thanks for the insightful read. A few days ago, I was reading about the Compartmentalized Heap being introduced in Firefox 4. Would that work assist in preventing/obviating dangling pointer frame attacks?
    Thanks,
    Manoj

    ReplyDelete
  5. Robert O'Callahan16 October 2010 22:25

    No. Compartments only restrict JS references.

    ReplyDelete
  6. Interesting post. Can you comment on the memory overhead induced by maintaining separate freelists for the different frame types? The paper you cited claims near-zero memory overhead for most of the embedded programs they analyzed, but it sounds like your implementation must necessarily vary from theirs. Since type safety alone does not guarantee memory safety you should be cautious when classifying these issues as non-exploitable because of these mitigations.
    Is there a maximize size to the freelist for each frame type? If so, what happens when memory is freed? Likewise, where does the memory come from when the freelist is empty? If the answer to either of these questions is the system heap, then there is still the possibility of dangling pointers. If there is no maximize freelist size then there is potential for memory overhead given certain allocation patterns (although this may be unconcerning in practice).

    ReplyDelete
  7. Interesting mitigation strategy. However, I keep wondering why one wouldn't simply use a conservative garbage collector to prevent dangling pointers in the first place.

    ReplyDelete
  8. Robert O'Callahan18 October 2010 12:55

    Anonymous: The extra memory overhead is low. In particular, we already used per-document freelists per frame object *size*, so making those lists per *class* was not a big change.
    There is no maximum size to the freelists. When the freelist is empty, memory is taken from the system heap, but that doesn't create problems (if we have dangling pointers into the system heap, we've got problems independent of frame pointers...).
    Andreas: performance, mainly.

    ReplyDelete
  9. Robert: that's a sad excuse. For one, Moore's law is still applicable. A 10% performance hit will be offset by new hardware within months. And I also believe that a lot of people would gladly accept a small performance hit in exchange for a safer browsing experience. Microsoft, for instance, replaced all ints in their C++ code by SafeInt, which does the right thing in cases of integer overflows. Sure, their code became a few percent slower. But they vastly reduced the number of vulnerabilities in their code. "Sure it crashes, but look how fast it is" just isn't good engineering.
    But even more depressing: There's a whole lot of reference counting going on in Mozilla, even with a cycle collector. That doesn't appear to be much faster than a parallel mark and sweep collector to me. In fact, naive reference counting is asymptotically much worse than mark and sweep.

    ReplyDelete
  10. Robert O'Callahan18 October 2010 22:56

    Moore's law helps if your performance targets are absolute, but that is not the world we live in. People compare Firefox against other browsers running on the same hardware, and we have to be faster.
    I don't believe that Microsoft replaced usage if "int" with SafeInt everywhere. Got proof?
    Frame objects aren't reference counted, so bringing up reference counting is not relevant here.

    ReplyDelete