Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Monday 10 September 2007

Federal Government

Work on garbage collection tends to assume that all objects in the heap are being managed by a single collector. In practice this is a problem because people want to have code written in different languages sharing data. We face this problem internally because we have C++ and Javascript code sharing objects. We also face it externally because people want to be able to use Python and other languages to write XUL applications or even potentially on the Web.

One approach is to implement all those languages on top of the same collector, which often means implementing them on the same virtual machine. That can be OK but it can impose some undesirable tradeoffs when the constraints of the virtual machine or collector don't allow the language to be implemented well. You often end up with a language that has some parts filed off because they don't fit ... e.g. IronRuby can't implement first-class continuations. Or you take a memory footprint hit because the object representation imposed by the runtime isn't optimal for your favourite language. Or maybe the engineering effort required to reimplement the language is prohibitive.

Another approach is to use reference counting at the language boundaries, which is what we mainly rely on. This is nice because it can be implemented in "wrapper objects" without touching the internals of the language implementation. But this approach is unable to collect cycles of objects that span language boundaries.

If we want to collect cross-language cycles it seems inevitable that we need a way to trace through all object graphs. That means we have to modify, or at least have knowledge of, the guts of the language runtimes involved. Given that requirement, what is the minimally invasive strategy that can collect cycles across language boundaries? My hypothesis: a distributed mark and sweep global collector.

The idea is that periodically there is a "global garbage collection", which works like this:

  1. "Stop the world": All language runtimes advance all their active threads to GC safe points. In Spidermonkey, for example, this means reaching the end of a JS "request".
  2. "Mark": Request all language runtimes mark all object in their heaps from their roots.

    • While marking, a runtime may encounter a reference to an object managed by another runtime. Such references are dispatched to be marked by that runtime.

  3. "Sweep": Request all language runtimes sweep their heaps and discard unmarked objects.
  4. "Restart": Tell all runtimes to restart their threads.

Extra steps would be required to support finalization and weak references, but it would be similar to the way they work in regular mark and sweep.

A list of active language runtimes must be maintained. Cross-runtime mark requests requires a map from memory areas to runtimes. This map could be gathered from all the runtimes at the start of each global GC. This map would also help conservative collectors detect references.

This approach is really very simple. It does not constrain the representation of a runtime's objects in any way other than that they must be addressable via pointers. It does not impose any read or write barriers. It does not constrain synchronization or threading models, other than requiring runtimes to have some way to stop their worlds. It allows runtimes to use both exact and conservative roots and tracing internally.

It also allows considerable flexibility for the runtime's internal garbage collection strategy. Obviously mark-and-sweep algorithms fit right in. Most generational collectors in practice use mark and sweep for the tenured objects, so they can fit in here as long as any objects to be passed across language boundaries are immediately tenured. A pure copying collector won't work without stationary proxy objects since we don't want to force all runtimes to handle object motion; I think that's a reasonable restriction. Incremental collection algorithms will work OK, they will just have to queue the marking of edges that lead outside the runtime's own heap until the next global GC. In the same way a runtime can collect more frequently than the global GC (as long as it preserves objects that may be referenced by foreign runtimes).

One wrinkle that I haven't fully thought through yet is how runtimes with exact collection handle mark requests into their heaps that don't point to actual objects (as could be sent by a runtime using conservative collection). I would probably require that the runtime can identify whether an object is being referenced. I would probably also require that interior pointers can be resolved to mark the enclosing object. These requirements would be limited to objects that have been exposed to foreign runtimes.

This approach does not address other issues like how cross-runtime calls can actually happen. That problem is actually much easier; it can be solved non-invasively using wrappers, for example. I think it's good to factor these issues out.

It would be an interesting experiment to take a few popular language runtimes and modify them to participate in a scheme like this. I hope it can be done with a very small amount of code. In practice the hardest problem is probably convincing people that this is worth adding, supporting and shipping in the core of their runtimes, and agreeing on the nitty-gritty details of how the interface would work.

I'd really like to see something like this happen, because the competing approach of shoehorning everything into a single runtime is technically unsatisfying, will slow language and runtime innovation, and will be politically infeasible for the free software world. I first wrote about this idea in 2001 and I'm surprised that I haven't seen it reinvented either in research or the general software world.


andrew cooke
every language has to stop and wait for all the others to finish marking, since none can predict who has an incoming link to their objects. i guess this is similar to multi-threaded (multi-core) gc (but have no idea what the solution is).
Jonas Sicking
What about runtimes that uses reference counting in combination with cycle collection. Python apparently does this, and I think Perl uses some type of reference counting strategy too.
Robert O'Callahan
andrew: yep. Global incremental multi-threaded collection is possible but it's high overhead and would require all participating runtimes to implement at least write barriers, which is vastly more invasive than what I've suggested here. However, with this scheme a runtime can collect objects before the global GC happens, if they know the objects were not exposed to other runtimes.
Jonas: that could work like this:
-- keep a hash-set of objects that are being referenced by foreign runtimes
-- when passing a reference to a foreign runtime, add it to the external-references set. If it wasn't already in the set, addref the object.
-- during global GC, build a new set of externally referenced objects by observing foreign mark requests.
-- during the global sweep phase, unref every object that is in the old external-references set but not the new one.
-- replace the old external-references set with the new one.