Tuesday, 30 May 2017

Should Debuggers Report Idempotent Writes?

gdb's data watchpoints are designed to only trigger when a written value actually changes; i.e., writes to memory that leave the value unchanged are deliberately ignored. (rr has to follow gdb's behavior.) I don't know why gdb does this, but it may be for historical reasons: it's impractical for a normal debugger to observe all possible writes to memory (e.g. those performed by the kernel), so a traditional strategy for implementing watchpoints is simply to re-read the watched location frequently and report changes, and obviously this can't detect idempotent updates.

Over the last year Kyle Huey and I have been building a brand-new record-and-replay-based debugger. One of the most intellectually challenging aspects of this project is sorting through the decisions made by existing debuggers, distinguishing those which are truly desirable from those made under the duress of implementation constraints that no longer apply. Handling of idempotent writes is one such decision. During rr replay we can observe all memory writes, so reporting all idempotent writes is perfectly feasible.

There are some clear cases where idempotent writes should not be ignored. For example, consider a series of printfs repeatedly overwriting the contents an output buffer. If, at the end of the series, the user wants to understand how the final value of some byte in the buffer was obtained, we should report the last printf to write to that location, even if the byte value it happened to write was the same as that written to the location by some previous printf. Reporting the previous printf would be hopelessly confusing --- though this is, in fact, what rr+gdb do.

On the other hand, I'm cautious. There may be some good reason to ignore idempotent writes that we just haven't figured out yet. It could possibly involve optimization — an optimizing compiler is free to introduce spurious writes if it knows they'll be idempotent — but I haven't seen or thought of a plausible example where this would be a problem.

Thursday, 25 May 2017

A Couple Of Papers About Commodity Multicore Record And Replay, And A Possible Way Forward

Efficient record-and-replay of parallel programs with potential data races on commodity multicore hardware is a holy-grail research problem. I think that the best papers in this area so far, though they're a bit old, are SMP-Revirt and DoublePlay.

DoublePlay is a really interesting approach but I'm skeptical it would work well in practice. Without going into too many details, for parallel applications that scale well the minimum overhead it can obtain is 100%. Obtaining that overhead relies on breaking application execution into epochs that are reasonably long, but externally-visible output must be suppressed within each epoch, which is going to be difficult or impossible to do for many applications. For example, for the Apache Web server they buffer all network output in the kernel during an epoch, but this could cause problematic side effects for a more complicated application.

Therefore I think the SMP-Revirt approach is more robust and more likely to lead to a practical system. This is a fairly simple idea: apply the well-known concurrent-read, exclusive-write (CREW) model at the page level, using hardware page protections to trigger ownership changes. If you imagine a herd of single-threaded processes all sharing a memory block, then each page is mapped read-only in all processes (unowned), or read-write in one process (the owner) and no-access in all the others. When a process writes to a page it doesn't own, a trap occurs and it takes ownership; when it reads from a page that has a different owner, a trap occurs and the page becomes unowned. SMP-Revirt worked at the hypervisor level; dthreads does something similar at user level.

A problem with SMP-Revirt is that when there is a lot of read/write sharing, the overhead of these transitions becomes very high. In the paper, for three of their six benchmarks recording the benchmark with four-core parallelism is slower than just recording on a single core rr-style!

I think it would be quite interesting for someone to implement a page-CREW approach in the Linux kernel. One of the problems dthreads has is that it forces all threads to be separate processes so they can have distinct memory mappings. Another problem is that mprotect on individual pages creates lots of kernel VMAs. Both of these add overhead. Instead, working in the kernel you could keep the normal process/thread setup and avoid having to create a new VMA when a page assignment changes, storing ownership values (CPU index) in per-page data. Then a page state transition from unowned to owned requires a page fault to the kernel, a change to the ownership value, and a synchronous inter-processor-interrupt to CPUs with a TLB mapping for the page; a transition from owned to unowned (or a different owner) requires just an IPI to the previous owner CPU. These IPIs (and their replies) would carry per-CPU timestamp values that get logged (to per-CPU logs) to ensure the order of ownership changes can be deterministically replayed.

Then some interesting questions are: how well does this perform over a range of workloads --- not just the standard parallel benchmarks but applications like Web browsers, memcached, JVMs, etc? Can we detect on-the-fly that some cluster of threads is experiencing severe overhead and degrade gracefully so we're at least no worse than running single-core? Are there other optimizations that we could do? (E.g. the unowned concurrent-read state could be bad for pages that are frequently written, because you have to send an IPI to all CPUs instead of just one when transitioning from unowned to owned; you could avoid that by forcing such pages to always be exclusively owned. When a CPU takes exclusive ownership you might want to block other CPUs from taking ownership away for a short period of time. Etc...) I think one could do quite a bit better than SMP-Revirt --- partly because kernel-level sharing is no longer an issue, since that's no longer being recorded and replayed.

To be clear: I don't want to work on this. I'm a lot more interested in the applications of record-and-replay than the machinery that does it. I want someone else to do it for us :-).

Friday, 19 May 2017

rr Usenix Paper And Technical Report

Our paper about rr has been accepted to appear at the Usenix Annual Technical Conference. Thanks to Dawn for suggesting that conference, and to the ATC program committee for accepting it :-). I'm scheduled to present it on July 13. The paper is similar to the version we previously uploaded to arXiv.

Some of the reviewers requested more material: additional technical details, explanations of some of our design choices compared to alternatives, and reflection on our "real world" experience with rr. There wasn't space for that within the conference page limits, so our shepherd suggested publishing a tech-report version of the paper with the extra content. I've done that and uploaded "Engineering Record And Replay For Deployability: Extended Technical Report". I hope some people find it interesting and useful.

Thursday, 11 May 2017

Obscurity Inhibits Persuasion

I'm a huge fan of China Miéville's work. "The City & The City" is one of my favourite books of all time. Particular features of his writing are his delightful linguistic obscurity and inventiveness, even though they tax the mind. Few writers could use the word "squiddity" in earnest and get away with it!

However, when one's goal is to communicate and persuade instead of entertain, simpler language would serve better. Miéville's essay "The Limits Of Utopia" would be more comprehensible to a broader audience if he had restrained his literary style. (For example, the words "coagulate", "imbricated", "malefic" and "peonage" all have simpler substitutes that would serve just as well.) A subject of this importance deserves the clearest and broadest communication; Miéville rejecting such discipline suggests that he values the his or his readers' literary gratification more highly than the influence he could exert through his essay.

It's not just Miéville, of course. All sorts of communities --- including scientists, activists, and religionists --- use specialized vocabulary, sometimes for brevity and precision, but often to identify as in-group. When they try to reach a broad audience, they'd be better off to be simple and clear --- especially when addressing people hostile to the in-group. Yet I'm amazed how often people fail to do this --- I guess due to habit, but sometimes perhaps due to insincerity.

Saturday, 6 May 2017

Perceptions Of Violent Crime

I spent last week as a parent helper at my child's school camp at Tui Ridge near Rotorua. I found it exciting and challenging in various ways, not least just having to exercise social skills non-stop for five days solid. I spent a fair bit of time socializing with the other parents. One conversation came around to the topic of violent crime, and some of the other parents talked about how violent crime is getting worse and there seemed to be a murder every day in Auckland. I ventured to suggest that violent crime, especially murder, is declining here (and elsewhere), to general skepticism. Unfortunately I didn't have data memorized and I drowned my phone recently, so I left it at that.

On returning home it didn't take long to confirm my point. For 2014 (the latest year for which I can find data) the police reported 66 "homicide and related offenses" nationally. That is the lowest number since that dataset began (1994), when the numbers were much higher. (I don't know why numbers for 2015 and 2016 aren't available in that dataset, but I guess bureaucratic inefficiency. There appears to be more recent data at policedata.nz, but it's behind a Flash app so I can't be bothered.) The idea that "a murder occurs every day in Auckland" was off by an order of magnitude, in 2014 at least, and there seems no reason to believe things have gotten worse.

Wikipedia has a good discussion of how in New Zealand (and other Western countries) people tend to believe violent crime is increasing when in fact it has been generally decreasing. I've read all sorts of theories and surveys of theories about the causes of this decrease, from "increased abortion" to "declining lead levels", but it appears no-one really knows; there are probably multiple causes. It's certainly welcome!