Saturday 15 July 2017
During the last few days I attended the Usenix ATC 2017 conference. This is a fairly eclectic but mostly systems-focused conference, largely focused on academic research but with a smattering of other sorts of projects.
On Thursday I presented my talk about rr. I only had twenty minutes, and my usual rr talk is more like an hour, so I cut a lot and talked fast, but apparently it came across reasonably well. There were some good questions, and Brendan Dolan-Gavitt was kind enough to slip in a mention that rr has saved his colleague a month of work. They apparently have a pretty good rr-based workflow for diagnosing divergence bugs in their PANDA QEMU-based record and replay system. A number of people approached me before and after the talk to discuss rr and its relationship to their projects.
One particularly relevant project presented at the conference was H3, a record-and-replay project following on from previous work by the same group. They do efficient multicore record and replay by replaying threads on all available cores, gathering observations of control flow using Intel's Processor Trace, and then formulating the problem of matching reads from shared memory with their associated writes as a constraint system which they then solve using Z3. One nice thing about this approach is that they can come up with replay behaviors involving weaker memory models than sequential consistency. They get good results on small programs but the scalability of their approach to really large applications is still unproven. I think this line of research has potential, because there are all sorts of ways to improve it: gathering more kinds of observations (especially data values), being more selective about which observations to gather, or introducing periodic stop-the-world synchronization to simplify the constraint sets. It might also be possible to combine this technique with MMU-based page ownership approaches, so that for pages that see little sharing (mostly accessed by one thread at a time) no constraint solving is required, but constraint solving is used for pages that are intensively shared. Partly because of my discussions with this group, I'm become gradually more optimistic about the prospects for multicore record-and-replay on commodity hardware, though there's a lot more work to be done.
It's hard for me to make really accurate judgements about the bulk of the research presented, because most of it was not in areas I know a lot about, but it seemed to me that like most academic conferences there were too many papers solving artificial problems that probably don't matter. Also like most academic conferences, there were no negative results — apart from, as usual, the introductions of papers that described the shortcomings of previous research being addressed in the new papers. This needs to change.
I met some new people that I was hoping to meet, but also caught up with some old friends and acquaintances — Angela Demke Brown, Bianca Schroeder, Jim Larus, Carl Waldspurger and others. It was a good time.