Monday, 14 August 2017

Public Service Announcement: "localhost" Is Not Necessarily Local

Today I learned that there exist systems (presumably misconfigured, but I'm not sure in what way) where the hostname "localhost" does not resolve locally but is sent to some remote DNS server, and then in some cases the DNS server returns a remote address (e.g. a server providing landing pages stuffed with ads).

This was breaking rr, since rr tells gdb to use (e.g.) the command "target extended-remote :1234", and apparently gdb resolves "localhost" to get the address to connect to. I've fixed rr to pass "" as an explicit local address, but who knows what other software is broken in such a configuration — possibly in insecure ways?

Sunday, 13 August 2017

When Virtue Fails

This quote, popularly (but incorrectly) attributed to Marcus Aurelius, proposes indifference to religion:

Live a good life. If there are gods and they are just, then they will not care how devout you have been, but will welcome you based on the virtues you have lived by. If there are gods, but unjust, then you should not want to worship them. If there are no gods, then you will be gone, but ... will have lived a noble life that will live on in the memories of your loved ones.

But this raises the question — what if you fail to live a truly good life, in the eyes of the god(s)?

The gospel — literally, the good news — about Jesus is that indeed we all fall short, but God sent Jesus into the world to take the punishment that we deserve, and through him we can receive forgiveness.

There is, of course, a catch. We have to accept that forgiveness, and that requires taking Jesus seriously. Thus psuedo-Aurelian indifference breaks down.

Monday, 7 August 2017

Stabilizing The rr Trace Format With Cap’n Proto

In the past we've modified the rr trace format quite frequently, and there has been no backward or forward compatibility. In particular most of the time when people update rr — and definitely when updating between releases — all their existing traces become unreplayable. This is a problem for rr-based services, so over the last few weeks I've been fixing it.

Prior to stabilization I made all the trace format updates that were obviously already desirable. I extended the event counter to 64 bits since a pathological testcase could overflow 2^31 events in less than a day. I simplified the event types to eliminate some unnecessary or redundant events. I switched the compression algorithm from zlib to brotli.

Of course it's not realistic to expect that the trace format is now perfect and won't ever need to be updated again. We need an extensible format so that future versions of rr can add to it and still be able to read older traces. Enter Cap’n Proto! Cap’n Proto lets us write a schema describing types for our trace records and then update that schema over time in constrained ways. Cap’n Proto generates code to read and write records and guarantees that data using older versions of the schema is readable by implementations using newer versions. (It also has guarantees in the other direction, but we're not planning to rely on them.)

This has all landed now, so the next rr release should be the last one to break compatibility with old traces. I say should, because something could still go wrong!

One issue that wasn't obvious to me when I started writing the schema is that rr can't use Cap’n Proto's Text type — because that requires text be valid UTF-8, and most of rr's strings are data like Linux pathnames which are not guaranteed to be valid UTF-8. For those I had to use the Data type instead (an array of bytes).

Another interesting issue involves choosing between signed and unsigned integers. For example a file descriptor can't be negative, but Unix file descriptors are given type int in kernel APIs ... so should the schema declare them signed or not? I made them signed, on the grounds that we can then check while reading traces that the values are non-negative, and when using the file descriptor we don't have to worry about the value overflowing as we coerce it to an int.

I wrote a microbenchmark to evaluate the performance impact of this change. It performs 500K trivial (non-buffered) system calls, producing 1M events (an 'entry' and 'exit' event per system call). My initial Cap’n Proto implementation (using "packed messages") slowed rr recording down from 12 to 14 seconds. After some profiling and small optimizations, it slows rr recording down from 9.5 to 10.5 seconds — most of the optimizations benefited both configurations. I don't think this overhead will have any practical impact: any workload with such a high frequency of non-buffered system calls is already performing very poorly under rr (the non-rr time for this test is only about 20 milliseconds), and if it occurred in practice we'd buffer the relevant system calls.

One surprising datum is that using Cap’n Proto made the event data significantly smaller — from 7.0MB to 5.0MB (both after compression with brotli-5). I do not have an explanation for this.

Another happy side effect of this change is that it's now a bit easier to read rr traces from other languages supported by Cap’n Proto.

Monday, 31 July 2017

Selecting A Compression Algorithm For rr

rr's traces are large. Memory-mapped files account for a lot of that, but the most efficient way to handle them is "zero copy" file cloning so we don't want to compress them during recording. Most of the rest is recordings of data copied into tracee address spaces by the kernel, and plus snapshots of registers, and these data are extremely compressible — often containing long runs of zeroes, for example. For a long time rr has used zlib to compress the non-mapped-file trace data, and zlib's 'deflate' algorithm often achieves compression of 8x or more.

Of course zlib is pretty old and significantly better algorithms exist, so now seems like a good time to reevaluate that decision. I used the Squash framework to compare zlib to two contenders, brotli and zstd, on actual rr trace data: a single, fairly short Firefox run, 828MB uncompressed, treated as independent 1MB chunks because that's what rr does. Here are the results:

I've omitted compression levels that took more than 20 seconds to compress the data. Currently rr uses zlib level 6, which takes just over 12 seconds to compress the data. Data compression occurs in parallel with the rest of recording, and uses multiple cores when it needs to, so in practice is seldom a performance bottleneck.

On this data, both brotli and zstd beat zlib by a significant margin, so we're definitely leaving some performance on the table by sticking with zlib. In particular, given the same time budget, zstd can shave 14% off the size of the non-mapped-file trace data, and brotli can shave off 17%. Alternatively, for the same trace size we could use much less CPU time — zstd level 1 compresses slightly better than zlib level 6, at 10x the speed!

For rr I think brotli level 5 is an obvious choice. For some reason there's a sudden improvement in compression at level 5, where it passes zstd and reaches roughly its optimal compression given a reasonable time budget. At level 5 we're shaving 17% off the current size and also taking 32% off the CPU time.

Apart from the performance, brotli also has a better licensing story. zstd has Facebook's standard patent license, which terminates if you sue Facebook for any patent infringement, and some organisations aren't comfortable with that. Apparently people have done patent searches and haven't found any Facebook patents covering zstd, but that is not wholly reassuring (while also being mystifying — if they're not applying for relevant patents, why not make that clear?). On the other hand, Google has made a clear commitment to license brotli royalty-free with no such conditions. Of course there could be third-party patents, but if they became a problem it would be easy to switch rr's algorithm (especially compared to the trouble they would cause for Web sites and browsers!).

Of course there are lots of other compression algorithms I could evaluate, but I guess if there are any further gains to be had, they would be very minor.

Update Unfortunately Ubuntu doesn't have a brotli library package. (Fedora does.) So, using brotli would mean everyone building rr on Ubuntu has to build brotli themselves first, or we vendor brotli into rr (or we do something truly insane like have rr pull and build brotli at build time if necessary). None of these approaches are appealing :-(. I guess there's also "rewrite rr in Rust so we can use cargo to have reasonable dependency management", which is appealing but not currently practical.

I'm leaning towards vendoring brotli into rr.

Saturday, 29 July 2017

Upstream Stable Kernels Work With rr Again

Greg K-H has released stable Linux kernels 3.18.63, 4.4.79, 4.9.40, and 4.12.4, containing a (backout) fix for the regression that broke rr. 4.13-rc2 also contains the fix. 4.11 was just declared end-of-life so it will not ever be officially fixed.

Obviously distros still have to produce kernel updates containing the fix, so we're not quite out of the woods yet, but that should be soon.

I'm holding off doing the rr 4.6.0 release until distro updates that work with rr have been out for a little while. To the (limited) extent possible I'd like to avoid people trying rr while it doesn't work on their kernel.

Thursday, 27 July 2017

Let's Never Create An Ad-Hoc Text Format Again

Recently I needed code to store a small amount of data in a file. Instinctively I started doing what I've always done before, which is create a trivial custom text-based format using stdio or C++ streams. But at that moment I had an epiphany: since I was using Rust, it would actually be more convenient to use the serde library. I put the data in a custom struct (EpochManifest), added #[derive(Serialize, Deserialize)] to EpochManifest, and then just had to write:

    let f = File::create(manifest_path).expect("Can't create manifest");
    serde_json::to_writer_pretty(f, &manifest).unwrap();
    let f = File::open(&p).expect(&format!("opening {:?}", &p));
    let manifest = serde_json::from_reader(f).unwrap();

This is more convenient than hand-writing even the most trivial text (un)parser. It's almost guaranteed to be correct. It's more robust and maintainable. If I decided to give up on human readability in exchange for smaller size and faster I/O, it would only take a couple of changed lines to switch to bincode's compact binary encoding. It prevents the classic trap where the stored data grows in complexity and an originally simple ad-hoc text format evolves into a baroque monstrosity.

There are libraries to do this sort of thing in C/C++ but I've never used them, perhaps because importing a foreign library and managing that dependency is a significant amount of work in C/C++, whereas cargo makes it trivial in Rust. Perhaps that's why the ISO C++ wiki page on serialization provides lots of gory details about how to implement serialization rather than just telling you to use a library.

As long as I get to keep using Rust I should never create an ad-hoc text format again.

Monday, 17 July 2017

Confession Of A C/C++ Programmer

I've been programming in C and C++ for over 25 years. I have a PhD in Computer Science from a top-ranked program, and I was a Distinguished Engineer at Mozilla where for over ten years my main job was developing and reviewing C++ code. I cannot consistently write safe C/C++ code. I'm not ashamed of that; I don't know anyone else who can. I've heard maybe Daniel J. Bernstein can, but I'm convinced that, even at the elite level, such people are few and far between.

I see a lot of people assert that safety issues (leading to exploitable bugs) with C and C++ only afflict "incompetent" or "mediocre" programmers, and one need only hire "skilled" programmers (such as, presumably, the asserters) and the problems go away. I suspect such assertions are examples of the Dunning-Kruger effect, since I have never heard them made by someone I know to be a highly skilled programmer.

I imagine that many developers successfully create C/C++ programs that work for a given task, and no-one ever fuzzes or otherwise tries to find exploitable bugs in those programs, so those developers naturally assume their programs are robust and free of exploitable bugs, creating false optimism about their own abilities. Maybe it would be useful to have an online coding exercise where you are given some apparently-simple task, you write a C/C++ program to solve it, and then your solution is rigorously fuzzed for exploitable bugs. If any such bugs are found then you are demoted to the rank of "incompetent C/C++ programmer".