Monday, 31 July 2017

Selecting A Compression Algorithm For rr

rr's traces are large. Memory-mapped files account for a lot of that, but the most efficient way to handle them is "zero copy" file cloning so we don't want to compress them during recording. Most of the rest is recordings of data copied into tracee address spaces by the kernel, and plus snapshots of registers, and these data are extremely compressible — often containing long runs of zeroes, for example. For a long time rr has used zlib to compress the non-mapped-file trace data, and zlib's 'deflate' algorithm often achieves compression of 8x or more.

Of course zlib is pretty old and significantly better algorithms exist, so now seems like a good time to reevaluate that decision. I used the Squash framework to compare zlib to two contenders, brotli and zstd, on actual rr trace data: a single, fairly short Firefox run, 828MB uncompressed, treated as independent 1MB chunks because that's what rr does. Here are the results:

I've omitted compression levels that took more than 20 seconds to compress the data. Currently rr uses zlib level 6, which takes just over 12 seconds to compress the data. Data compression occurs in parallel with the rest of recording, and uses multiple cores when it needs to, so in practice is seldom a performance bottleneck.

On this data, both brotli and zstd beat zlib by a significant margin, so we're definitely leaving some performance on the table by sticking with zlib. In particular, given the same time budget, zstd can shave 14% off the size of the non-mapped-file trace data, and brotli can shave off 17%. Alternatively, for the same trace size we could use much less CPU time — zstd level 1 compresses slightly better than zlib level 6, at 10x the speed!

For rr I think brotli level 5 is an obvious choice. For some reason there's a sudden improvement in compression at level 5, where it passes zstd and reaches roughly its optimal compression given a reasonable time budget. At level 5 we're shaving 17% off the current size and also taking 32% off the CPU time.

Apart from the performance, brotli also has a better licensing story. zstd has Facebook's standard patent license, which terminates if you sue Facebook for any patent infringement, and some organisations aren't comfortable with that. Apparently people have done patent searches and haven't found any Facebook patents covering zstd, but that is not wholly reassuring (while also being mystifying — if they're not applying for relevant patents, why not make that clear?). On the other hand, Google has made a clear commitment to license brotli royalty-free with no such conditions. Of course there could be third-party patents, but if they became a problem it would be easy to switch rr's algorithm (especially compared to the trouble they would cause for Web sites and browsers!).

Of course there are lots of other compression algorithms I could evaluate, but I guess if there are any further gains to be had, they would be very minor.

Update Unfortunately Ubuntu doesn't have a brotli library package. (Fedora does.) So, using brotli would mean everyone building rr on Ubuntu has to build brotli themselves first, or we vendor brotli into rr (or we do something truly insane like have rr pull and build brotli at build time if necessary). None of these approaches are appealing :-(. I guess there's also "rewrite rr in Rust so we can use cargo to have reasonable dependency management", which is appealing but not currently practical.

I'm leaning towards vendoring brotli into rr.

Saturday, 29 July 2017

Upstream Stable Kernels Work With rr Again

Greg K-H has released stable Linux kernels 3.18.63, 4.4.79, 4.9.40, and 4.12.4, containing a (backout) fix for the regression that broke rr. 4.13-rc2 also contains the fix. 4.11 was just declared end-of-life so it will not ever be officially fixed.

Obviously distros still have to produce kernel updates containing the fix, so we're not quite out of the woods yet, but that should be soon.

I'm holding off doing the rr 4.6.0 release until distro updates that work with rr have been out for a little while. To the (limited) extent possible I'd like to avoid people trying rr while it doesn't work on their kernel.

Thursday, 27 July 2017

Let's Never Create An Ad-Hoc Text Format Again

Recently I needed code to store a small amount of data in a file. Instinctively I started doing what I've always done before, which is create a trivial custom text-based format using stdio or C++ streams. But at that moment I had an epiphany: since I was using Rust, it would actually be more convenient to use the serde library. I put the data in a custom struct (EpochManifest), added #[derive(Serialize, Deserialize)] to EpochManifest, and then just had to write:

    let f = File::create(manifest_path).expect("Can't create manifest");
    serde_json::to_writer_pretty(f, &manifest).unwrap();
and
    let f = File::open(&p).expect(&format!("opening {:?}", &p));
    let manifest = serde_json::from_reader(f).unwrap();

This is more convenient than hand-writing even the most trivial text (un)parser. It's almost guaranteed to be correct. It's more robust and maintainable. If I decided to give up on human readability in exchange for smaller size and faster I/O, it would only take a couple of changed lines to switch to bincode's compact binary encoding. It prevents the classic trap where the stored data grows in complexity and an originally simple ad-hoc text format evolves into a baroque monstrosity.

There are libraries to do this sort of thing in C/C++ but I've never used them, perhaps because importing a foreign library and managing that dependency is a significant amount of work in C/C++, whereas cargo makes it trivial in Rust. Perhaps that's why the ISO C++ wiki page on serialization provides lots of gory details about how to implement serialization rather than just telling you to use a library.

As long as I get to keep using Rust I should never create an ad-hoc text format again.

Monday, 17 July 2017

Confession Of A C/C++ Programmer

I've been programming in C and C++ for over 25 years. I have a PhD in Computer Science from a top-ranked program, and I was a Distinguished Engineer at Mozilla where for over ten years my main job was developing and reviewing C++ code. I cannot consistently write safe C/C++ code. I'm not ashamed of that; I don't know anyone else who can. I've heard maybe Daniel J. Bernstein can, but I'm convinced that, even at the elite level, such people are few and far between.

I see a lot of people assert that safety issues (leading to exploitable bugs) with C and C++ only afflict "incompetent" or "mediocre" programmers, and one need only hire "skilled" programmers (such as, presumably, the asserters) and the problems go away. I suspect such assertions are examples of the Dunning-Kruger effect, since I have never heard them made by someone I know to be a highly skilled programmer.

I imagine that many developers successfully create C/C++ programs that work for a given task, and no-one ever fuzzes or otherwise tries to find exploitable bugs in those programs, so those developers naturally assume their programs are robust and free of exploitable bugs, creating false optimism about their own abilities. Maybe it would be useful to have an online coding exercise where you are given some apparently-simple task, you write a C/C++ program to solve it, and then your solution is rigorously fuzzed for exploitable bugs. If any such bugs are found then you are demoted to the rank of "incompetent C/C++ programmer".

Saturday, 15 July 2017

Usenix ATC 2017

During the last few days I attended the Usenix ATC 2017 conference. This is a fairly eclectic but mostly systems-focused conference, largely focused on academic research but with a smattering of other sorts of projects.

On Thursday I presented my talk about rr. I only had twenty minutes, and my usual rr talk is more like an hour, so I cut a lot and talked fast, but apparently it came across reasonably well. There were some good questions, and Brendan Dolan-Gavitt was kind enough to slip in a mention that rr has saved his colleague a month of work. They apparently have a pretty good rr-based workflow for diagnosing divergence bugs in their PANDA QEMU-based record and replay system. A number of people approached me before and after the talk to discuss rr and its relationship to their projects.

One particularly relevant project presented at the conference was H3, a record-and-replay project following on from previous work by the same group. They do efficient multicore record and replay by replaying threads on all available cores, gathering observations of control flow using Intel's Processor Trace, and then formulating the problem of matching reads from shared memory with their associated writes as a constraint system which they then solve using Z3. One nice thing about this approach is that they can come up with replay behaviors involving weaker memory models than sequential consistency. They get good results on small programs but the scalability of their approach to really large applications is still unproven. I think this line of research has potential, because there are all sorts of ways to improve it: gathering more kinds of observations (especially data values), being more selective about which observations to gather, or introducing periodic stop-the-world synchronization to simplify the constraint sets. It might also be possible to combine this technique with MMU-based page ownership approaches, so that for pages that see little sharing (mostly accessed by one thread at a time) no constraint solving is required, but constraint solving is used for pages that are intensively shared. Partly because of my discussions with this group, I'm become gradually more optimistic about the prospects for multicore record-and-replay on commodity hardware, though there's a lot more work to be done.

It's hard for me to make really accurate judgements about the bulk of the research presented, because most of it was not in areas I know a lot about, but it seemed to me that like most academic conferences there were too many papers solving artificial problems that probably don't matter. Also like most academic conferences, there were no negative results — apart from, as usual, the introductions of papers that described the shortcomings of previous research being addressed in the new papers. This needs to change.

I met some new people that I was hoping to meet, but also caught up with some old friends and acquaintances — Angela Demke Brown, Bianca Schroeder, Jim Larus, Carl Waldspurger and others. It was a good time.

An Inflection Point In The Evolution Of Programming Langauges

Two recent Rust-related papers are very exciting.

Rustbelt formalizes (a simplified version of) Rust's semantics and type system and provides a soundness proof that accounts for unsafe code. This is a very important step towards confidence in the soundness of safe Rust, and towards understanding what it means for unsafe code to be valid — and building tools to check that.

This systems paper is about exploiting Rust's remarkably strong control of aliasing to solve a few different OS-related problems.

It's not often you see a language genuinely attractive to the systems research community (and programmers in general, as the Rust community shows) being placed on a firm theoretical foundation. (It's pretty common to see programming languages being advocated to the systems community by their creators, but this is not that.) Whatever Rust's future may be, it is setting a benchmark against which future systems programming languages should be measured. Key Rust features — memory safety, data-race freedom, unique ownership, and minimal space/time overhead, justified by theory — should from now on be considered table stakes.

Thursday, 6 July 2017

Bay Area Progress Report

I'm in the middle of a three-week work trip to the USA.

Monday last week I met with a couple of the people behind the Julia programming language, who have been using and contributing to rr. We had good discussions and it's good to put faces to names.

Monday night through to Friday night I was honoured to be a guest at Mozilla's all-hands meeting in San Francisco. I had a really great time catching up with a lot of old friends. I was pleased to find more Mozilla developers using rr than I had known about or expected; they're mostly very happy with the tool and some of them have been using esoteric features like chaos mode with success. We had a talk about rr and I demoed some of the new stuff Kyle and I have been working on, and talked about which directions might be most useful to Mozilla.

Saturday through Monday I went camping in Yosemite National Park with some friends. We camped in the valley on Saturday night, then on Sunday hiked down from Tioga Road (near the Lukens Lake trailhead) along Yosemite Creek to north of Yosemite Falls and camped there overnight. The next day we went up Eagle Peak for a great view over Yosemite Valley, then hiked down past the Falls back to the valley. It's a beautiful place and rather unlike any New Zealand tramping I've done — hot, dry, various sorts of unusual animals, and ever-present reminders about bears. There were a huge number of people in the valley for the holiday weekend!

Tuesday was a bit more relaxed. Being the 4th of July, I spent the afternoon with friends playing games — Bang and Lords of Waterdeep, two of my favourites.

Today I visited Brendan and his team at Brave to talk about rr. On Friday I'll give a talk at Stanford. On Monday I'll be up in Seattle giving a talk at the University of Washington, then on Tuesday I'll be visiting Amazon to talk about the prospects for rr in the cloud. Then on Wednesday through Friday I'll be attending Usenix ATC in Santa Clara and giving yet another rr talk! On Saturday I'll finally go home.

I really enjoy talking to people about my work, and learning more about people's debugging needs, and I really really enjoy spending time with my American friends, but I do miss my family a lot.