Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Monday 7 August 2017

Stabilizing The rr Trace Format With Cap’n Proto

In the past we've modified the rr trace format quite frequently, and there has been no backward or forward compatibility. In particular most of the time when people update rr — and definitely when updating between releases — all their existing traces become unreplayable. This is a problem for rr-based services, so over the last few weeks I've been fixing it.

Prior to stabilization I made all the trace format updates that were obviously already desirable. I extended the event counter to 64 bits since a pathological testcase could overflow 2^31 events in less than a day. I simplified the event types to eliminate some unnecessary or redundant events. I switched the compression algorithm from zlib to brotli.

Of course it's not realistic to expect that the trace format is now perfect and won't ever need to be updated again. We need an extensible format so that future versions of rr can add to it and still be able to read older traces. Enter Cap’n Proto! Cap’n Proto lets us write a schema describing types for our trace records and then update that schema over time in constrained ways. Cap’n Proto generates code to read and write records and guarantees that data using older versions of the schema is readable by implementations using newer versions. (It also has guarantees in the other direction, but we're not planning to rely on them.)

This has all landed now, so the next rr release should be the last one to break compatibility with old traces. I say should, because something could still go wrong!

One issue that wasn't obvious to me when I started writing the schema is that rr can't use Cap’n Proto's Text type — because that requires text be valid UTF-8, and most of rr's strings are data like Linux pathnames which are not guaranteed to be valid UTF-8. For those I had to use the Data type instead (an array of bytes).

Another interesting issue involves choosing between signed and unsigned integers. For example a file descriptor can't be negative, but Unix file descriptors are given type int in kernel APIs ... so should the schema declare them signed or not? I made them signed, on the grounds that we can then check while reading traces that the values are non-negative, and when using the file descriptor we don't have to worry about the value overflowing as we coerce it to an int.

I wrote a microbenchmark to evaluate the performance impact of this change. It performs 500K trivial (non-buffered) system calls, producing 1M events (an 'entry' and 'exit' event per system call). My initial Cap’n Proto implementation (using "packed messages") slowed rr recording down from 12 to 14 seconds. After some profiling and small optimizations, it slows rr recording down from 9.5 to 10.5 seconds — most of the optimizations benefited both configurations. I don't think this overhead will have any practical impact: any workload with such a high frequency of non-buffered system calls is already performing very poorly under rr (the non-rr time for this test is only about 20 milliseconds), and if it occurred in practice we'd buffer the relevant system calls.

One surprising datum is that using Cap’n Proto made the event data significantly smaller — from 7.0MB to 5.0MB (both after compression with brotli-5). I do not have an explanation for this.

Another happy side effect of this change is that it's now a bit easier to read rr traces from other languages supported by Cap’n Proto.