Monday, 17 July 2017

Confession Of A C/C++ Programmer

I've been programming in C and C++ for over 25 years. I have a PhD in Computer Science from a top-ranked program, and I was a Distinguished Engineer at Mozilla where for over ten years my main job was developing and reviewing C++ code. I cannot consistently write safe C/C++ code. I'm not ashamed of that; I don't know anyone else who can. I've heard maybe Daniel J. Bernstein can, but I'm convinced that, even at the elite level, such people are few and far between.

I see a lot of people assert that safety issues (leading to exploitable bugs) with C and C++ only afflict "incompetent" or "mediocre" programmers, and one need only hire "skilled" programmers (such as, presumably, the asserters) and the problems go away. I suspect such assertions are examples of the Dunning-Kruger effect, since I have never heard them made by someone I know to be a highly skilled programmer.

I imagine that many developers successfully create C/C++ programs that work for a given task, and no-one ever fuzzes or otherwise tries to find exploitable bugs in those programs, so those developers naturally assume their programs are robust and free of exploitable bugs, creating false optimism about their own abilities. Maybe it would be useful to have an online coding exercise where you are given some apparently-simple task, you write a C/C++ program to solve it, and then your solution is rigorously fuzzed for exploitable bugs. If any such bugs are found then you are demoted to the rank of "incompetent C/C++ programmer".

Saturday, 15 July 2017

Usenix ATC 2017

During the last few days I attended the Usenix ATC 2017 conference. This is a fairly eclectic but mostly systems-focused conference, largely focused on academic research but with a smattering of other sorts of projects.

On Thursday I presented my talk about rr. I only had twenty minutes, and my usual rr talk is more like an hour, so I cut a lot and talked fast, but apparently it came across reasonably well. There were some good questions, and Brendan Dolan-Gavitt was kind enough to slip in a mention that rr has saved his colleague a month of work. They apparently have a pretty good rr-based workflow for diagnosing divergence bugs in their PANDA QEMU-based record and replay system. A number of people approached me before and after the talk to discuss rr and its relationship to their projects.

One particularly relevant project presented at the conference was H3, a record-and-replay project following on from previous work by the same group. They do efficient multicore record and replay by replaying threads on all available cores, gathering observations of control flow using Intel's Processor Trace, and then formulating the problem of matching reads from shared memory with their associated writes as a constraint system which they then solve using Z3. One nice thing about this approach is that they can come up with replay behaviors involving weaker memory models than sequential consistency. They get good results on small programs but the scalability of their approach to really large applications is still unproven. I think this line of research has potential, because there are all sorts of ways to improve it: gathering more kinds of observations (especially data values), being more selective about which observations to gather, or introducing periodic stop-the-world synchronization to simplify the constraint sets. It might also be possible to combine this technique with MMU-based page ownership approaches, so that for pages that see little sharing (mostly accessed by one thread at a time) no constraint solving is required, but constraint solving is used for pages that are intensively shared. Partly because of my discussions with this group, I'm become gradually more optimistic about the prospects for multicore record-and-replay on commodity hardware, though there's a lot more work to be done.

It's hard for me to make really accurate judgements about the bulk of the research presented, because most of it was not in areas I know a lot about, but it seemed to me that like most academic conferences there were too many papers solving artificial problems that probably don't matter. Also like most academic conferences, there were no negative results — apart from, as usual, the introductions of papers that described the shortcomings of previous research being addressed in the new papers. This needs to change.

I met some new people that I was hoping to meet, but also caught up with some old friends and acquaintances — Angela Demke Brown, Bianca Schroeder, Jim Larus, Carl Waldspurger and others. It was a good time.

An Inflection Point In The Evolution Of Programming Langauges

Two recent Rust-related papers are very exciting.

Rustbelt formalizes (a simplified version of) Rust's semantics and type system and provides a soundness proof that accounts for unsafe code. This is a very important step towards confidence in the soundness of safe Rust, and towards understanding what it means for unsafe code to be valid — and building tools to check that.

This systems paper is about exploiting Rust's remarkably strong control of aliasing to solve a few different OS-related problems.

It's not often you see a language genuinely attractive to the systems research community (and programmers in general, as the Rust community shows) being placed on a firm theoretical foundation. (It's pretty common to see programming languages being advocated to the systems community by their creators, but this is not that.) Whatever Rust's future may be, it is setting a benchmark against which future systems programming languages should be measured. Key Rust features — memory safety, data-race freedom, unique ownership, and minimal space/time overhead, justified by theory — should from now on be considered table stakes.

Thursday, 6 July 2017

Bay Area Progress Report

I'm in the middle of a three-week work trip to the USA.

Monday last week I met with a couple of the people behind the Julia programming language, who have been using and contributing to rr. We had good discussions and it's good to put faces to names.

Monday night through to Friday night I was honoured to be a guest at Mozilla's all-hands meeting in San Francisco. I had a really great time catching up with a lot of old friends. I was pleased to find more Mozilla developers using rr than I had known about or expected; they're mostly very happy with the tool and some of them have been using esoteric features like chaos mode with success. We had a talk about rr and I demoed some of the new stuff Kyle and I have been working on, and talked about which directions might be most useful to Mozilla.

Saturday through Monday I went camping in Yosemite National Park with some friends. We camped in the valley on Saturday night, then on Sunday hiked down from Tioga Road (near the Lukens Lake trailhead) along Yosemite Creek to north of Yosemite Falls and camped there overnight. The next day we went up Eagle Peak for a great view over Yosemite Valley, then hiked down past the Falls back to the valley. It's a beautiful place and rather unlike any New Zealand tramping I've done — hot, dry, various sorts of unusual animals, and ever-present reminders about bears. There were a huge number of people in the valley for the holiday weekend!

Tuesday was a bit more relaxed. Being the 4th of July, I spent the afternoon with friends playing games — Bang and Lords of Waterdeep, two of my favourites.

Today I visited Brendan and his team at Brave to talk about rr. On Friday I'll give a talk at Stanford. On Monday I'll be up in Seattle giving a talk at the University of Washington, then on Tuesday I'll be visiting Amazon to talk about the prospects for rr in the cloud. Then on Wednesday through Friday I'll be attending Usenix ATC in Santa Clara and giving yet another rr talk! On Saturday I'll finally go home.

I really enjoy talking to people about my work, and learning more about people's debugging needs, and I really really enjoy spending time with my American friends, but I do miss my family a lot.

Friday, 30 June 2017

Patch On Linux Kernel Stable Branches Breaks rr

A change in 4.12rc5 breaks rr. We're trying to get it fixed before 4.12 is released, and I think that will be OK. Unfortunately that change has already been backported to 3.18.57, 4.4.72, 4.9.32 and 4.11.5 :-( (all released on June 14, and I guess arriving in distros a bit later). Obviously we'll try to get the 4.12 fix also backported to those branches, but that will take a little while.

The symptoms are that long, complex replays fail with "overshot target ticks=... by " where N is generally a pretty large number (> 1000). If you look in the trace file, the value N will usually be similar to the difference between the target ticks and the previous ticks value for that task --- i.e. we tried to stop after N ticks but we actually stopped after about N*2 ticks. Unfortunately, rr tests don't seem to be affected.

I'm not sure if there's a reasonable workaround we can use in rr, or if there is one, whether it's worth effort to deploy. That may depend on how the conversation with upstream goes.

Friday, 23 June 2017

Rising Tolerance For Static Analysis False Positives?

When I was a young graduate student working on static analysis tools, conventional wisdom was that a static analysis tool needed to have a low false-positive rate or no-one would use it. Minimizing false-positives was a large chunk of the effort in every static analysis project.

It seems that times have changed and there are now communities of developers willing to tolerate high false positive rates, at least in some domains. For example:

It will also miss really obvious bugs apparently at random, and flag non-bugs equally randomly. The price of the tool is astronomical, but it's still worthwhile if it catches bugs that human developers miss.
Indeed, I've noticed people in various projects ploughing through reams of false positives in case there are any real issues.

I'm not sure what changed here. Perhaps people have just become more appreciative of the value of subtle bugs caught by static analysis tools. Maybe it's a good time to be a developer of such tools.

Monday, 19 June 2017

Lazy Religion Tropes In Mass Media

I'm enjoying season two of Orphan Black. I hope Tatiana Maslany got paid a separate salary for each role, because she deserves every penny. Unfortunately the show falls down where so many others do: every religiously-identified person is either a chump or a psychopath. Disappointing, but expected; with few exceptions, the Christians portrayed in mass media are like none I've ever known.

I've just finished China Miéville's The Last Days Of New Paris. It's a bit over-Miévillian for me: one hundred and fifty pages of rampant imagination, but too many clever linguistic obscurities for maximal reading enjoyment. It hosts a very crisp example of the "hell exists but heaven does not" trope. Miéville frames it explicitly:

Everyone knew it helped it have a priest perform certain absurdities of his trade if it was demons you had to fight. "Why?" Thibaut asked Élise when they left again. "Why do you think it does work? It's not as if any of this stuff is true."
(Miéville's Parisian demons are explicitly of Hell.) Isn't this ... unfair? Yet it's a very common trope. Likewise I remain a huge fan of Buffy The Vampire Slayer, but it feels like a raw deal that the heroes wield crucifixes but the few ostensibly "religious" characters are villains.

I know, I know; the only way to fix this is to write my own novels and hit TV show. I'll get onto it.

Thursday, 15 June 2017

Is The x86 Architecture Sustainable?

My earlier post pointed to one strange little corner of the x86 instruction set. There is much, much more where that came from. My latest copy of Intel's "Combined Volumes" Software Developer’s Manual runs to 4,684 pages and it doesn't include new features they've announced such as CET. There's a good paper here describing some of the new features and the increasing complexity they bring. It also touches on the "feature interaction problem"; i.e., the complexity of the ISA grows superlinearly as Intel adds more features. One aspect it doesn't touch is how many legacy compatibility features x86 processors carry, which continue to interact with new features. One of the key risks for Intel and its ecosystem is that many of the new features may end up being little-used — just like many of the legacy compatibility features — so their complexity ends up being "dead weight".

Is there a problem here for anyone other than Intel? Yes. Unnecessarily complex hardware costs us all in the end because those CPUs won't be as efficient, cheap, reliable or secure as they otherwise could be — especially as we seem to be hitting a process wall so that we can't rely on increasing transistor density to bail us out. Another issue is that complex ISAs make ISA-dependent tools — assemblers, disassemblers, debuggers, binary instrumentation tools, and even hypervisors and kernels — more difficult to build. Super-complex ISAs encourage software to depend on obscure features unnecessarily, which builds a competitive moat for Intel but makes competition at the CPU level more difficult.

The obvious technically-appealing approach — starting over with a clean-sheet architecture — by itself doesn't solve these problems in the long run (or the short run!). We have to acknowledge that architectures will inevitably grow features in response to market demand (or perceived value) and that some of those features will turn into legacy baggage. The question is, can we adjust the ecosystem to make it possible to drop legacy baggage, and is it worth doing so?

I'm not an ARM expert but the phone and embedded markets already seem to have moved in this direction. Phones use a plethora of ARM derivatives with different architectural features (though in spite of that, ARM has accumulated its own share of weirdnesses!) Can we do it in the cloud/desktop/laptop spaces? I don't see why not. Even software that reaches down to a relatively low level like a Web browser (with a complex tiered JIT, etc) doesn't rely on many complex legacy architectural features. Porting Firefox to a brand new processor architecture is a lot of work — e.g., implementing new JIT backends, FFI glue, and translating a pile of handwritten assembler (e.g. in codecs) to the new architecture. However, porting Firefox to an x86 or ARM variant minus unnecessary legacy baggage would be a lot easier. Your variant may need kernel and bootloader changes but that's certainly doable for Linux derivatives.

So, I think there's room in the market for architectures which may or may not be based on existing architectures but lack legacy features and offer less-than-total binary compatibility with a popular architecture. Perhaps the forthcoming ARM Snapdragon Windows laptops are a step in that direction.

Update Many commenters on Hackernews suggested that x86 complexity is a non-issue because decoding legacy instructions to uops requires insignificant die area. But this really misses the point: x86 complexity is not just about legacy instructions which can be decoded down to some microcode. It's about architectural features like SGX, CET, MPX, TSX, VT — plus the legacy stuff like segment registers and 286 call gates and virtual 8086 mode and so on and so on. It's about the processor state required to support those features, how those features all interact with each other, and how they increase the complexity of context switching, OS support, and so on.

New "rr pack" Command

I think there's huge potential to use rr for debugging cloud services. Apparently right now interactive debugging is mostly not used in the cloud, which makes sense — it's hard to identify the right process to debug, much less connect to it, and even if you could, stopping it for interactive analysis would likely interfere too much with your distributed system. However, with rr you could record any number of process executions without breaking your system, identify the failed runs after the fact, and debug them at your leisure.

Unfortunately there are a couple of problems making that difficult right now. One is that the largest cloud providers don't support the hardware performance counter rr needs. I'm excited to hear that Amazon has recently enabled some HW performance counters on dedicated hosts — hopefully they can be persuaded to add the retired-conditional-branch counter to their whitelist (and someone can fix the Xen PMU virtualization bug that breaks rr). Another problem is that rr's traces aren't easy to move from one machine to another. I've started addressing this problem by implementing a new rr command, rr pack.

There are two problems with rr traces. One is that on filesystems that do not support "reflink" file copies, to keep recording overhead low we sometimes hardlink files into the trace, or for system libraries we just assume they won't change even if we can't hardlink them. This means traces are not fully self-contained in the latter case, and in the former case the recording can be invalidated if the files change. The other problem is that every time an mmap occurs we clone/link a new file into the trace, even if a previous mmap mapped the same file, because we have no fast way of telling if the file has changed or not. This means traces appear to contain large numbers of large files but many of those files are duplicates.

rr pack fixes both of those problems. You run it on a trace directory in-place. It deduplicates trace files by computing a cryptographic hash (BLAKE2b, 256 bits) of each file and keeping only one file for any given hash. It identifies needed files outside the trace directory, and hardlinks to files outside the trace directory, and copies them into the trace directory. It rewrites trace records (the mmaps file) to refer to the new files, so the trace format hasn't changed. You should be able to copy around the resulting trace, and modify any files outside the trace, without breaking it. I tried pretty hard to ensure that interrupted rr pack commands leave the trace intact (using fsync and atomic rename); of course, an interrupted rr pack may not fully pack the trace so the operation should be repeated. Successful rr pack commands are idempotent.

We haven't really experimented with trace portability yet so I can't say how easy it will be to just zip up a trace directory and replay it on a different computer. We know that currently replaying on a machine with different CPUID values is likely to fail, but we have a solution in the works for that — Kyle's patches to add ARCH_SET_CPUID to control "CPUID faulting" are in Linux kernel 4.12 and will let rr record and replay CPUID values.

How I Found A 20-Year-Old Linux Kernel Bug

Recently I improved the rr tests to test that the kernel doesn't write more memory than we expect during system calls. We place syscall output buffers at the ends of pages so that the end of the buffer is immediately followed by an unmapped page. If the kernel reads or writes too much memory then the system call will fail with EFAULT (or something worse will happen).

This found a few bugs in rr's assumptions about how much memory the kernel reads and writes. Interestingly, it also exposed a very old kernel bug: certain wireless ioctls are supposed to take a 32-byte memory parameter but the kernel actually fails with EFAULT if less than 40 bytes are available. (The extra bytes are not modified.) I tried to be a good citizen by reporting the bug and I'm pleased to say that it's actually being fixed!

The bug was apparently introduced in Linux 2.1.15, released December 12, 1996. It's interesting that it wasn't found and fixed until now. I guess not many programs use these ioctls, and those that do probably use buffers that are always followed by at least eight more bytes of data, e.g. any buffer on the stack. Then again, if you wrote a program that allocated those buffers using "malloc", I guess once in a while it would fail if your allocator happens to land one at the end of a page.

This class of bugs --- "small overrunning read that doesn't get used" --- could also be a problem in user-space. I don't recall seeing other bugs in this class, though.

Friday, 9 June 2017

Another Case Of Obscure CPU Nondeterminism

One of "big bets" of rr is that commodity CPUs running user-space code really are deterministic in practice, or at least the nondeterminism can be efficiently detected and controlled, without restorting to pervasive binary instrumentation. We're mostly winning that bet on x86 (but not ARM), but I'm uncomfortable knowing that otherwise-benign architectural changes could break us at any time. (One reason I'm keen to increase rr's user base is make system designers care about not breaking us.)

I recently discovered the obscure XGETBV instruction. This is a problematic instruction for rr because it returns the contents of internal system registers that we can't guarantee will be unchanged from run to run. (It's a bit weird that the contents of these registers are allowed to leak to userspace, but anyway...) Fortunately it's barely used in practice; the only use I've seen of it so far is by the dynamic loader ld.so, to return the contents of the XINUSE register. This obscure register tracks, for each task, whether certain CPU components are known to be in their default states. For example if your x86-64 process has never used any (legacy floating-point) x87 instructions, bit 0 of XINUSE should be clear. That should be OK for rr, since whether certain kinds of instructions have executed should be the same between recording and replay. Unfortunately I have observed cases where the has-used-x87 bit gets unexpectedly flipped on; by singlestepping the tracee it's pretty clear the bit is being set in code that has nothing to do with x87, and the point where it is set, or if it happens at all, varies from run to run. (I have no way to tell whether the bit is being set by hardware or the kernel.) This is disturbing because it means that XGETBV is effectively a nondeterministic instruction. Fortunately, the ld.so users are just testing to see whether AVX has been used, so they mask off the x87 bit almost immediately, eliminating the nondeterminism from the program state before it can do any damage (well, unless you were super unlucky and took a context switch between XGETBV and the mask operation). I haven't seen any bits other than the x87 bit being unexpectedly set.

Fortunately it seems we can work around the problem completely in rr by setting the x87-in-use bit immediately after every exec. If the bit is always set, it doesn't matter if something intermittently sets it again. Another day, another bullet dodged!

Thursday, 1 June 2017

WebAssembly: Mozilla Won

Mozilla staff are being very diplomatic and restrained by allowing WebAssembly to be portrayed as a compromise between the approaches of asm.js and PNaCl. They have good reasons for being so, but I can be a bit less restrained. asm.js and PNaCl represented quite different visions for how C/C++ code should be supported on the Web, and I think WebAssembly is a big victory for asm.js and Mozilla's vision.

Considered as a Web platform feature, PNaCl had three major problems from the beginning, two of which it inherited from NaCl. By design, in NaCl and PNaCl, application code was highly isolated from the world of Javascript and HTML, running in its own process and behaving much like an opaque plugin with very limited interactions with its host page (since any interactions would have to happen over IPC). This led to a much bigger problem, which is that to interact with the outside world (P)NaCl code needed some kind of platform API, and Google decided to create an entirely brand-new platform API — "Pepper" — which necessarily duplicated a lot of the functionality of standard Web platform APIs. To make things even worse, neither Pepper nor PNaCl's LLVM-based bytecode had a proper specification, let alone one with multi-vendor buy-in.

Therefore any non-Chrome-based browser wishing to implement PNaCl would have had to reverse-engineer a Chromium-bug-compatible Pepper spec and reimplement it, or more likely import a large amount of Chromium code to implement Pepper/NaCl. Likewise they'd have had to import a large amount of LLVM code for the PNaCl compiler. Both imports would have to stay in sync with whatever Google did. This would mean lots more code bloat, maintenance and spec work, and more work for Web developers too, not to mention being a severe blow to Web standards. Mozilla people (including me) explained the unacceptability of all this to relevant Google people early on, but to no avail.

Mozilla responded with the asm.js project to show that one could achieve similar goals with minimal extensions to the standard-based Web platform. asm.js sets up a Javascript array to represent memory and compiles C/C++ to Javascript code operating on the array. It avoids those big PNaCl issues: asm.js code can interact with JS and the DOM easily since it shares the JS virtual machine; it specifies no new platform APIs, instead relying on the (already standardized and interoperably implemented) Web platform APIs; and very little new specification work had to be done, since it mostly relies on already-specified JS semantics.

In these key areas WebAssembly follows asm.js, not PNaCl. WebAssembly applications or components interact freely with JS and AFAIK in all browsers WebAssembly is implemented as part of the JS VM. WebAssembly defines no new platform APIs other than some APIs for loading and linking WebAssembly code, relying on standards-based Web APIs for everything else. WebAssembly differs from asm.js by defining a bytecode format with some new operations JS doesn't have, so some spec work was required (and has been done!). Like asm.js, WebAssembly application call-stacks are maintained by the JS VM, outside the memory addressable by the application, which reduces the exploitability of application bugs. (Though, again like asm.js and unlike PNaCl, the compiler is trusted.)

I'm not belittling the contributions of Google and others to WebAssembly. They have done a lot of good work, and ditching PNaCl was an admirable decision for the good of the Web — thank you! Often in these contests proclaiming a "winner" is unimportant or even counterproductive, and it's true that in a sense Web developers are the real winners. I'm calling it out here because I think the good and essential work that Mozilla continues to do to improve the standards-based Web platform is too often overlooked. I also want credit to go to Mozilla's amazing asm.js/WebAssembly team, who went up against Google with far fewer resources but a better approach, and won.

Tuesday, 30 May 2017

Should Debuggers Report Idempotent Writes?

gdb's data watchpoints are designed to only trigger when a written value actually changes; i.e., writes to memory that leave the value unchanged are deliberately ignored. (rr has to follow gdb's behavior.) I don't know why gdb does this, but it may be for historical reasons: it's impractical for a normal debugger to observe all possible writes to memory (e.g. those performed by the kernel), so a traditional strategy for implementing watchpoints is simply to re-read the watched location frequently and report changes, and obviously this can't detect idempotent updates.

Over the last year Kyle Huey and I have been building a brand-new record-and-replay-based debugger. One of the most intellectually challenging aspects of this project is sorting through the decisions made by existing debuggers, distinguishing those which are truly desirable from those made under the duress of implementation constraints that no longer apply. Handling of idempotent writes is one such decision. During rr replay we can observe all memory writes, so reporting all idempotent writes is perfectly feasible.

There are some clear cases where idempotent writes should not be ignored. For example, consider a series of printfs repeatedly overwriting the contents an output buffer. If, at the end of the series, the user wants to understand how the final value of some byte in the buffer was obtained, we should report the last printf to write to that location, even if the byte value it happened to write was the same as that written to the location by some previous printf. Reporting the previous printf would be hopelessly confusing --- though this is, in fact, what rr+gdb do.

On the other hand, I'm cautious. There may be some good reason to ignore idempotent writes that we just haven't figured out yet. It could possibly involve optimization — an optimizing compiler is free to introduce spurious writes if it knows they'll be idempotent — but I haven't seen or thought of a plausible example where this would be a problem.

Thursday, 25 May 2017

A Couple Of Papers About Commodity Multicore Record And Replay, And A Possible Way Forward

Efficient record-and-replay of parallel programs with potential data races on commodity multicore hardware is a holy-grail research problem. I think that the best papers in this area so far, though they're a bit old, are SMP-Revirt and DoublePlay.

DoublePlay is a really interesting approach but I'm skeptical it would work well in practice. Without going into too many details, for parallel applications that scale well the minimum overhead it can obtain is 100%. Obtaining that overhead relies on breaking application execution into epochs that are reasonably long, but externally-visible output must be suppressed within each epoch, which is going to be difficult or impossible to do for many applications. For example, for the Apache Web server they buffer all network output in the kernel during an epoch, but this could cause problematic side effects for a more complicated application.

Therefore I think the SMP-Revirt approach is more robust and more likely to lead to a practical system. This is a fairly simple idea: apply the well-known concurrent-read, exclusive-write (CREW) model at the page level, using hardware page protections to trigger ownership changes. If you imagine a herd of single-threaded processes all sharing a memory block, then each page is mapped read-only in all processes (unowned), or read-write in one process (the owner) and no-access in all the others. When a process writes to a page it doesn't own, a trap occurs and it takes ownership; when it reads from a page that has a different owner, a trap occurs and the page becomes unowned. SMP-Revirt worked at the hypervisor level; dthreads does something similar at user level.

A problem with SMP-Revirt is that when there is a lot of read/write sharing, the overhead of these transitions becomes very high. In the paper, for three of their six benchmarks recording the benchmark with four-core parallelism is slower than just recording on a single core rr-style!

I think it would be quite interesting for someone to implement a page-CREW approach in the Linux kernel. One of the problems dthreads has is that it forces all threads to be separate processes so they can have distinct memory mappings. Another problem is that mprotect on individual pages creates lots of kernel VMAs. Both of these add overhead. Instead, working in the kernel you could keep the normal process/thread setup and avoid having to create a new VMA when a page assignment changes, storing ownership values (CPU index) in per-page data. Then a page state transition from unowned to owned requires a page fault to the kernel, a change to the ownership value, and a synchronous inter-processor-interrupt to CPUs with a TLB mapping for the page; a transition from owned to unowned (or a different owner) requires just an IPI to the previous owner CPU. These IPIs (and their replies) would carry per-CPU timestamp values that get logged (to per-CPU logs) to ensure the order of ownership changes can be deterministically replayed.

Then some interesting questions are: how well does this perform over a range of workloads --- not just the standard parallel benchmarks but applications like Web browsers, memcached, JVMs, etc? Can we detect on-the-fly that some cluster of threads is experiencing severe overhead and degrade gracefully so we're at least no worse than running single-core? Are there other optimizations that we could do? (E.g. the unowned concurrent-read state could be bad for pages that are frequently written, because you have to send an IPI to all CPUs instead of just one when transitioning from unowned to owned; you could avoid that by forcing such pages to always be exclusively owned. When a CPU takes exclusive ownership you might want to block other CPUs from taking ownership away for a short period of time. Etc...) I think one could do quite a bit better than SMP-Revirt --- partly because kernel-level sharing is no longer an issue, since that's no longer being recorded and replayed.

To be clear: I don't want to work on this. I'm a lot more interested in the applications of record-and-replay than the machinery that does it. I want someone else to do it for us :-).

Friday, 19 May 2017

rr Usenix Paper And Technical Report

Our paper about rr has been accepted to appear at the Usenix Annual Technical Conference. Thanks to Dawn for suggesting that conference, and to the ATC program committee for accepting it :-). I'm scheduled to present it on July 13. The paper is similar to the version we previously uploaded to arXiv.

Some of the reviewers requested more material: additional technical details, explanations of some of our design choices compared to alternatives, and reflection on our "real world" experience with rr. There wasn't space for that within the conference page limits, so our shepherd suggested publishing a tech-report version of the paper with the extra content. I've done that and uploaded "Engineering Record And Replay For Deployability: Extended Technical Report". I hope some people find it interesting and useful.

Thursday, 11 May 2017

Obscurity Inhibits Persuasion

I'm a huge fan of China Miéville's work. "The City & The City" is one of my favourite books of all time. Particular features of his writing are his delightful linguistic obscurity and inventiveness, even though they tax the mind. Few writers could use the word "squiddity" in earnest and get away with it!

However, when one's goal is to communicate and persuade instead of entertain, simpler language would serve better. Miéville's essay "The Limits Of Utopia" would be more comprehensible to a broader audience if he had restrained his literary style. (For example, the words "coagulate", "imbricated", "malefic" and "peonage" all have simpler substitutes that would serve just as well.) A subject of this importance deserves the clearest and broadest communication; Miéville rejecting such discipline suggests that he values the his or his readers' literary gratification more highly than the influence he could exert through his essay.

It's not just Miéville, of course. All sorts of communities --- including scientists, activists, and religionists --- use specialized vocabulary, sometimes for brevity and precision, but often to identify as in-group. When they try to reach a broad audience, they'd be better off to be simple and clear --- especially when addressing people hostile to the in-group. Yet I'm amazed how often people fail to do this --- I guess due to habit, but sometimes perhaps due to insincerity.

Saturday, 6 May 2017

Perceptions Of Violent Crime

I spent last week as a parent helper at my child's school camp at Tui Ridge near Rotorua. I found it exciting and challenging in various ways, not least just having to exercise social skills non-stop for five days solid. I spent a fair bit of time socializing with the other parents. One conversation came around to the topic of violent crime, and some of the other parents talked about how violent crime is getting worse and there seemed to be a murder every day in Auckland. I ventured to suggest that violent crime, especially murder, is declining here (and elsewhere), to general skepticism. Unfortunately I didn't have data memorized and I drowned my phone recently, so I left it at that.

On returning home it didn't take long to confirm my point. For 2014 (the latest year for which I can find data) the police reported 66 "homicide and related offenses" nationally. That is the lowest number since that dataset began (1994), when the numbers were much higher. (I don't know why numbers for 2015 and 2016 aren't available in that dataset, but I guess bureaucratic inefficiency. There appears to be more recent data at policedata.nz, but it's behind a Flash app so I can't be bothered.) The idea that "a murder occurs every day in Auckland" was off by an order of magnitude, in 2014 at least, and there seems no reason to believe things have gotten worse.

Wikipedia has a good discussion of how in New Zealand (and other Western countries) people tend to believe violent crime is increasing when in fact it has been generally decreasing. I've read all sorts of theories and surveys of theories about the causes of this decrease, from "increased abortion" to "declining lead levels", but it appears no-one really knows; there are probably multiple causes. It's certainly welcome!

Sunday, 30 April 2017

Call Out China For Their Treatment Of NK Escapees

Nothing really new here in the stories of North Korea's prison camps, but I think one important aspect is consistently underemphasized:

Several weeks later, guards and inmates were ordered to gather as the pair, their bodies battered from torture, were dragged back behind the barbed wire. They had been caught in China and returned to the repressive regime.

"The two brothers were beheaded in front of everyone," said Lim. "They called everyone to watch as a warning not to flee. The other prisoners then had to throw stones at them."

Rounding up North Korean escapees and sending them back for torture and execution is barbaric and indefensible and should be embarrassing to China and Chinese people. The escapees want to reach South Korea; surely NGOs could take them there at no greater cost or inconvenience to China than the current policy.

It seems to me human rights organizations, news organizations and political leaders should be making this point at every opportunity ... but it rarely seems to come up. I don't know why.

Sunday, 23 April 2017

One Does Simply Walk Into Mordor

... on the Tongariro Northern Circuit. This central North Island "Great Walk" is a loop circumnavigating Mount Ngauruhoe (aka "Mount Doom") and much of the Tongariro volcanic complex. The terrain varies but is mostly a mixture of lava fields and somewhat barren alpine plains scoured by old pyroclastic flows (most recently by the "super-colossal" ~1800-year-ago Taupo eruption). It doesn't take much imagination, or sophisticated marketing, to see it as (a peaceful and beautiful version of) Mordor.

We completed the Circuit yesterday, after four days of hiking. It's certainly possible to do it in three or even two days, but we took it easy and had time to do all the side tracks: the Tongariro summit track on day two (Mangatepopo Hut to Oturere Hut), the Ohinepango springs near Waihohonu on day three, and the Upper Tama Lake track on day four (Waihohonu Hut to Whakapapa Village).

Thank God, we had perfect weather. We also had a great time lounging around at the huts talking to other trampers of all ages, and even playing a variety of games. Overall it was another wonderful "Great Walk" experience. The only improvement would have been if one of the volcanoes was erupting, but you can't have everything!

The new Waihohonu Hut was impressive. It's probably the best DoC hut I've yet seen: multiple outdoor decks with picnic tables, well insulated, a very spacious common area with wood-burning stove, eight gas cooking stoves, solar-powered LED lighting, and even solar water heating!

After destroying a couple of cameras by getting them wet while tramping (only cheap point-and-shoots), I bought a waterproof camera. On this trip I took some underwater photos and they turned out quite well!

Wednesday, 5 April 2017

Rust Optimizations That C++ Can't Do (Version 2)

My last post was wrong in the details, so let me show a better example. Rust code:

pub extern fn foo(v: &u64, callback: fn()) -> u64 {
  let mut sum = 0;
  for _ in 0..100 {
    sum += *v;
    callback();
  }
  sum
}
C++ version here.

The inner loop of the Rust version turns into:

.LBB0_1:
        inc     ebx
        call    r14
        add     r12, r15
        cmp     ebx, 100
        jl      .LBB0_1
sum is in r12 and the evaluation of *v has been hoisted out of the loop with the result in r15. In the compiled C++ code the value has not been hoisted:
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        add     rbx, qword ptr [r15]
        call    r14
        dec     ebp
        jne     .LBB0_1

Of course the performance difference in this case would be negligible, but if *v was replaced with a more complicated expression, the difference would increase. (No, I don't know why the Rust code counts up instead of counting down like C++ to get slightly better code; someone should fix that.)

Here it's really clear that the semantics of Rust are making this optimization possible. In C++ v could be a reference to a global variable which is modified by the callback function, in which case hoisting the load would be incorrect.

Several people pointed out that with LTO/WPO a C++ compiler might be able to determine that such cases don't happen, in this case that no function passed into callback directly or indirectly modifies v. That's what I meant in my previous post by "a C++ compiler could do some analysis to show that can't happen" ... but as I continued, "those analyses don't scale". The larger and more complicated your code gets, the more likely it is that static analysis will fail to prove the invariants you care about. Sometimes, due to dynamic linking or run-time code generation, relevant code just isn't available to be analyzed. In other cases the analysis algorithms aren't sophisticated enough to prove what you care about (often because no such algorithm is known), or adequate algorithms exist but they were too expensive to deploy. Perhaps the worst cases for sophisticated global analysis are when the property you care about isn't even true, because you accidentally violated it in some part of the program, but you'll never know, because the only effect of that accident is that performance got a little bit worse somewhere else. (E.g. with this example if you accidentally change one of the callback functions to indirectly modify v, your hypothetical WPO compiler does a slightly worse job and you won't know why.)

What we need are language-level invariants that apply globally and support aggressive optimization using only local analysis. C++ has some, but Rust's are stronger. We also need those invariants to be checked and enforced, not just "it's undefined behavior if you do this"; (safe) Rust wins again on that.

Rust Optimizations That C++ Can't Do

Update dbaupp on Reddit pointed out that this particular example does not support the point I'm trying to make. I'll explain why below. I followed up with a new post with a better example.


Consider the Rust code

pub struct Foo {
    v1: u64,
    v2: u64,
}
pub fn foo(t: &Foo) -> u64 {
    t.v1 + t.v2
}

The stable Rust compiler, in Release mode, generates

_ZN8rust_out3foo17h154ee47e2fd80a9eE:
 leaq (%rdi,%rsi), %rax
 retq

Hang on! foo is supposed to take a reference, so shouldn't it take a pointer as its parameter and do two loads to get the values of v1 and v2? Instead it receives the values of v1 and v2 in %rdi/%rsi and adds them using leaq as a three-operand "add" instruction. The Rust compiler has decided (correctly) that even though you wrote a reference, in this case it's more efficient to just pass the parameter by value. This is good to know as a Rust developer; I can use references freely without worrying about whether it would be more efficient to pass the value directly.

C++ compilers can't do this optimization, for a couple of reasons. One reason is that C++ has a standard ABI that dictates how parameters are passed; it requires that references are always passed as pointers. A more interesting reason is that in general this optimization is not safe in C++. An immutable Rust reference promises that the referenced value does not change (except for some special cases involving UnsafeCell which are easily detected by the compiler), but not so for C++ const references; a "const" value can be modified through other mutable references, so copies induced by pass-by-value could change program behavior. For a simple case like this a C++ compiler could do some analysis to show that can't happen, but those analyses don't scale (e.g. if foo called, and was called by, unknown code, all bets are off).

Here Rust's strong control over aliasing is enabling improved optimization, not just improved safety. Update Actually as dbaupp points out, this optimization example is not specific to Rust. In fact, LLVM is doing it via local analysis (the sort I mentioned above), and clang++ produces the same results for the equivalent C++ code. In fact Rust in general would need to preserve references because you can convert a reference to a raw pointer and compare them for equality.

It's easy to find other examples, such as improved code motion and common subexpression elimination. I think with more study of large Rust programs, more investment in the Rust compiler, and tighter control over unsafe Rust, we will probably discover that Rust enables many more interesting optimizations.

One of the promises of hair-shirt functional languages was that referential transparency and other strong semantic guarantees would enable extremely aggressive optimizations, which would recover the performance lost by moving away from the stateful semantics of hardware. That didn't really work out, but with Rust we might get the aggressive optimizations on top of semantics that are already very efficient.

Sunday, 2 April 2017

Pararaha Valley

This weekend my son and I went out to camp overnight at Pararaha Valley in the Waitakere Ranges west of Auckland. We got dropped off on Piha Road at the north end of the Upper Huia Dam track, near the centre of the Ranges, and I planned to walk from there for 5-6 hours to reach the campsite near the west coast. After one and a half hours we reached the dam, and unfortunately discovered that the next section, Nugget Track, was closed to protect against kauri die-back. We had no choice but to retrace our steps for another hour and half. The Upper Huia track is pretty rough so at 2pm we were back where we started having used up a lot of energy :-(. (The trailhead has a map showing track closures but Nugget Track was not clearly marked as closed, so I'll contact the council about that.)

I wasn't sure what to do but decided we could probably reach the campsite before dark by following roads instead. So we walked west along Piha Road, south along Lone Kauri Rd towards Karekare, then crossed over into Pararaha Valley via the Buck Taylor Track. We got to the campsite around 6pm, having walked for about 7.5 hours at a fairly quick pace, so I was pretty tired. My son was fine :-).

The Pararaha Valley campsite is a lovely spot. I thought the swamp, er I mean wetland, at the bottom of the valley would mean a lot of mosquitoes but that wasn't a problem. The site documentation says that site has only stream water that needs to be treated, but there's a cooking shelter with roof runoff into a tank, and we drank the tank water untreated with no problems. (One of the other people at the campsite told me he'd drunk it two weeks ago while running the Hillary Trail (67km!).)

This morning we had a short two hour walk from the campsite up the coast along the beach to Karekare to get picked up. We got back to the city in time for church :-).

Let's Make NZ More Expensive For Tourists

I agree with Labour leader Andrew Little on this. Tourist numbers are growing, and making NZ a more expensive destination is a perfectly fair and rational alternative to fast-growing tourist numbers. NZ is never going to be a budget destination; better to be a high-price luxury destination with money reinvested to maintain and improve our facilities and environment, than a low-price destination with an environment degraded by too many tourists. I don't think we're anywhere close to being seen as a "rip-off".

FWIW many backpackers I meet while tramping are here for months; for these people the impact of a border-crossing tax would amortized. Fewer people making longer trips is also more climate-friendly.

Monday, 27 March 2017

The Parable Of The Workers In The Vineyard Really Is About Grace

A small matter perhaps, but a "professor of New Testament and Jewish Studies at Vanderbilt University Divinity School" has claimed that Matthew 20:1-16 is about economic justice:

This parable tells the story of a series of workers who come in at different points of the day, but the owner pays them all the same amount. The parable is sometimes read with an anti-Jewish lens, so that the first-hired are the "Jews" who resent the gentiles or the sinners entering into God's vineyard. Nonsense again.

"Jesus' first listeners heard not a parable about salvation in the afterlife but about economics in present. They heard a lesson about how the employed must speak on behalf of those who lack a daily wage."

This interpretation must be popular in some circles, because I once heard it preached in a sermon by a guest speaker. I was frustrated and mystified then, and now; I just don't see grounds for rejecting the traditional eschatalogical application, in which the landowner is God and the "denarius" is salvation. The parable's introduction says "The kingdom of heaven is like ..." and its conclusion says "So the last will be first, and the first will be last". The passage immediately leading up to the parable is even more clearly eschatalogical and ends with the same formula:

Everyone who has left houses or brothers or sisters or father or mother or wife[e] or children or fields for my sake will receive a hundred times as much and will inherit eternal life. But many who are first will be last, and many who are last will be first.

Even allowing for the likelihood that the text was arranged thematically rather than chronologically, clearly the writer thought the passages belonged together.

It's true that "the kingdom of heaven" sometimes refers to God's will being done on earth, but the context here is strongly against that. Furthermore, the traditional interpretation fits the text perfectly and aligns with Jesus' other teachings. God is absurdly generous while still being just; those thought to be most religious are on shaky ground; lost sheep are welcomed in. The traditional interpretation is not inherently anti-Jewish since it applies just as well to the Jewish undesirables ("tax collectors and sinners") beloved by Jesus as to Gentile converts.

Many Biblical passages extol economic justice. This isn't one of them.

Wednesday, 22 March 2017

Blogging Vs Academic Publishing

Adrienne Felt asked on Twitter:

academic publishing is too onerous & slow. i'm thinking about starting a blog to share chrome research instead. thoughts?
It depends on one's goals. I think if one's primary goal is to disseminate quality research to a broad audience, then write papers, publish them to arXiv, and update them in response to comments.

I've thought about this question a fair bit myself, as someone who's interacted with the publication system over two decades as a writer and reviewer but who won't perish if I don't publish. (Obviously if you're an academic you must publish and you can stop reading now...) Academic publishing has many problems which I won't try to enumerate here, but my main issues with it are the perverse incentives it can create and the erratic nature of the review process. It does have virtues...

Peer review is the most commonly cited virtue. In computer science at least, I think it's overrated. A typical paper might be reviewed by one or two experts and a few other reviewers who are capable, but unfamiliar with the technical details. Errors are easy to miss, fraud even easier. Discussions about a paper after it has been published are generally much more interesting than the reviews, because you have a wider audience who collectively bring more expertise and alternative viewpoints. Online publishing and discussion could be a good substitute if it's not too fragmented (I dislike the way Hackernews, Reddit etc fragment commentary), and if substantive online comments are used to improve the publication. Personally I try to update my blog posts when I get particularly important comments; that has the problem that fewer people read the updated versions, but at least they're there for later reference. It would be good if we had a way to archive comments like we do for papers.

Academic publishing could help identify important work. I don't think it's a particularly good tool for that. We have many other ways now to spread the word about important work, and I don't think cracking open the latest PLDI proceedings for a read-through has ever been an efficient use of time. Important work is best recognized months or years after publication.

The publishing system creates an incentive for people to take the time to systematically explain and evaluate their work and creates community standards for doing so. That's good, and it might be that if academic publishing wasn't a gatekeeper then over time those benefits would be lost. Or, perhaps the evaluation procedures within universities would maintain them.

Academic publishing used to be important for archival but that is no longer an issue.

Personally, the main incentives for me to publish papers now are to get the attention of academic communities I would otherwise struggle to reach, and to have fun hanging out with them at conferences. The academic community remains important to me because it's full of brilliant people (students and faculty) many of whom are, or will be, influential in areas I care about.

Thoughts On "Java and Scala’s Type Systems are Unsound" And Fuzz Testing

I just belatedly discovered Java and Scala’s Type Systems are Unsound from OOPSLA last year. It's a lovely paper. I would summarize it as "some type system soundness arguments depend on 'nonsense types' (e.g. generic types with contradictory bounds on type parameters) having no instances; if 'null' values can inhabit those types, those arguments are invalid". Note that the unsoundness does not lead to memory safety issues in the JVM.

The paper is a good illustration of how unexpected feature interactions can create problems for type systems even when a feature doesn't seem all that important at the type level.

The paper also suggests (implicitly) that Java's type system has fallen into a deep hole. Even without null, the interactions of subtyping, generics and wildcards are immensely complicated. Rust's rejection of subtyping (other than for lifetimes, which are tightly restricted) causes friction for developers coming from languages where subtyping is ubiquitous, but seems very wise for the long run.

I think the general issue shown in this paper could arise in other contexts which don't have 'null'. For example in a lazy language you can create a value of any type by calling a function that diverges. In a language with an explicit Option type, if T is a nonsense type then Option is also presumably a nonsense type but the value None inhabits it.

The paper discusses some methodological improvements that might detect this sort of mistake earlier. One approach it doesn't mention is fuzz testing. It seems to me that the examples in the paper are small enough to be found by fuzz testing techniques searching for programs which typecheck but contain obviously unsound constructs (e.g. a terminating function which can cast its parameter value to any type). Checking soundness via fuzz testing has been done to small extent with Rust (see paper) but I think more work in that direction would be fruitful.

Tuesday, 21 March 2017

Deterministic Hardware Performance Counters And Information Leaks

Summary: Deterministic hardware performance counters cannot leak information between tasks, and more importantly, virtualized guests.

rr relies on hardware performance counters to help measure application progress, to determine when to inject asynchronous events such as signal delivery and context switches. rr can only use counters that are deterministic, i.e., executing a particular sequence of application instructions always increases the counter value by the same amount. For example rr uses the "retired conditional branches" (RCB) counter, which always returns exactly the number of conditional branches actually retired.

rr currently doesn't work in environments such as Amazon's cloud, where hardware performance counters are not available to virtualized guests. Virtualizing hardware counters is technically possible (e.g. rr works well in Digital Ocean's KVM guests), but for some counters there is a risk of leaking information about other guests, and that's probably one reason other providers haven't enabled them.

However, if a counter's value can be influenced by the behavior of other guests, then by definition it is not deterministic in the sense above, and therefore it is useless to rr! In particular, because the RCB counter is deterministic ("proven" by a lot of testing), we know it does not leak information between guests.

I wish Intel would identify a set of counters that are deterministic, or at least free of cross-guest information leaks, and Amazon and other cloud providers would enable virtualization of them.

Thursday, 16 March 2017

Using rr To Debug Go Programs

rr seems to work well with Go programs, modulo some limitations of using gdb with Go.

The DWARF debuginfo produced by the standard Go compiler is unsatisfactory, because it does not include local variables held in registers. Instead, build with gccgo.

Some Go configurations generate statically linked executables. These work under rr, but they're slow to record and replay because our syscall-interception library is not loaded. When using rr with Go, be sure to generate dynamically linked executables.

Running Go tests using go test by default builds your project and then runs the tests. Recording the build step is wasteful, especially when the compiler is statically linked (see above). So build the test executable using go test -compiler gccgo -c and then run the test executable directly under rr.

Saturday, 25 February 2017

Against Online Voting

There's a push on by some to promote online voting in New Zealand. I'm against it.

Government has called for local government to take the lead in online voting, with the need to guarantee the integrity of the system - most notably safety from hacking - the top priority.

But Hulse questioned some politicians' interest in a digital switchover.

...

Hulse said the postal system isn't bullet-proof either.

"Government seems to want absolute watertight guarantees that it is not hackable or open to any messing with," she said.

"I don't know why people seem to feel somehow the postal vote has this magic power of being completely safe. It can be, and has been, hacked in its own way by people just going round at night and picking up ballot papers."

I think the fears around last year's US election emphasized more than ever the importance of safeguarding the electoral process. Hacking is a greater threat than, say, people stealing postal ballot papers for a few reasons.

Stealing postal ballots is necessarily risky. A thief must be physically present and there is a risk of being caught. In contrast, many hacks can be done from anywhere and it's relatively easy to conceal the source of the attack.

The risk and effort required to steal ballots is proportional to the number of ballots stolen. In contrast, successful hacks are often easy to scale up to achieve any desired effect with no extra work or risk.

Some fraction of people are likely to notice their postal ballot went missing and notify the authorities, who can then investigate, check for other missing ballots, and invalidate postal ballots en masse if necessary. Hacks against online voting systems can be much more stealthy.

People systematically overestimate the security of electronic systems, partly because a relatively small group has the background to fully understand how they work and the risks involved --- and even that group is systematically overconfident.

A spokeswoman for the Mayor's office said Goff was not available for comment due to other commitments this week, but said the Mayor endorsed the push for online voting.

Deputy Mayor, Bill Cashmore, also said while there is no clear timeline on when a trial might run, online voting is also something he would like to see introduced - pointing out it would probably be well-received by young voters and a good way of maximising the reach of the democratic process.

That's all very well, but confidence in the integrity of the voting process is also an important driver for participation.

The fundamental problems with online voting are well-understood by the research community. The NZ Government should continue holding the line on this, at least for Parliamentary elections. For local body elections (which the Mt Albert election is referenced above is not), voter participation is so low you can make the argument that reducing the integrity of the election is worthwhile if online voting significantly increases the participation rate. However, there is evidence it may not.

Friday, 24 February 2017

306 Points In "Lords Of Waterdeep"

A new record for our household.

It was a three-player "long" game. Nine Plot Quests is probably too many but they were good value and created a lot of synergies. The most important quest was Shelter Zhentarim Agents, which I completed first; I probably drew about 15 corruption during the game, so had a constant supply of Intrigue cards, and with only two other players I was able to play two or three Intrigues per round (a couple of buildings with play-Intrigue helped). I also completed Seize Citadel Of The Bloody Hand early and Recruit Lieutenant in the fourth round.

I discovered a nice trick after completing Fence Goods For Duke Of Darkness and building Hall Of The Three Lords: arrange to have three or four turns in a row (the Lieutenant usually plays last after the other players have placed their agents, then you can reassign agents from Waterdeep Harbor), then place an agent on Hall Of The Three Lords and place three rogues on three vacant action spaces for ten victory points. In your following turns, place your agents on those action spaces and scoop your rogues back up as well as taking the benefit of the action; Fence Goods For Duke Of Darkness gives you two gold for each one, so you get a bonus six gold. If one or two of those action spaces are The Librarium or other buildings that let you take resources and place additional resources elsewhere for your next agent to pick up, so much the better.

Sunday, 19 February 2017

"New Scientist" And The Meaning Of Life

A recent issue of New Scientist has a headline article on "The Meaning Of Life". It describes research showing correlations between a sense of purpose and various positive attributes such as health and happiness. It then touches on a few ways people can try to enhance their sense of purpose.

Unfortunately the second paragraph gives the game away:

As human beings, it is hard for us to shake the idea that our existence must have significance beyond the here and now. Life begins and ends, yes, but surely there is a greater meaning. The trouble is, these stories we tell ourselves do nothing to soften the harsh reality: as far as the universe is concerned, we are nothing but fleeting and randomly assembled collections of energy and matter. One day, we will all be dust.

Quite so --- assuming a secular worldview. In that context, the question "what is my purpose?" has no answer. The best you can hope for is to grab hold of some idea, make it your purpose and refrain from asking whether you have chosen correctly.

To me, even before I became a Christian, that approach seemed unsatisfactory --- too much like intentional self-delusion. To conclude "there is no greater meaning" and carry on as if there is didn't seem honest.

Later I came to believe that Christianity is objectively true. That means God has created us for a specific (yet broad) purpose --- "to glorify God and enjoy him forever", in the words of the Westminster Shorter Catechism. The question has an answer. We'll spend eternity unpacking it, though.

Monday, 6 February 2017

What Rust Can Do That Other Languages Can't, In Six Short Lines

struct X {
  y: Y
}
impl X {
  fn y(&self) -> &Y { &self.y }
}

This defines an aggregate type containing a field y of type Y directly (not as a separate heap object). Then we define a getter method that returns the field by reference. Crucially, the Rust compiler will verify that all callers of the getter prevent the returned reference from outliving the X object. It compiles down to what you'd expect a C compiler to produce (pointer addition) with no space or time overhead.

As far as I know, no other remotely mainstream language can express this, for various reasons. C/C++ can't do it because the compiler does not perform the lifetime checks. (The C++ Core Guidelines lifetime checking proposal proposes such checks, but it's incomplete and hasn't received any updates for over a year.) Most other languages simply prevent you from giving away an interior reference, or require y to refer to a distinct heap object from the X.

This is my go-to example every time someone suggests that modern C++ is just as suitable for safe systems programming as Rust.

Update As I half-expected, people did turn up a couple of non-toy languages that can handle this example. D has some special-casing for this case, though its existing safety checks are limited and easily circumvented accidentally or deliberately. (Those checks are in the process of being expanded, though it's hard to say where D will end up.) Go can also do this, because its GC supports interior pointers. That's nice, though you're still buying into the fundamental tradeoffs of GC: some combination of increased memory usage, reduced throughput, and pauses. Plus, relying on GC means it's nearly impossible to handle the "replace a C library" use case.

Saturday, 4 February 2017

rr 4.5.0 Released

We just released rr 4.5.0. Massive thanks to Keno Fischer who probably contributed more to this release than anyone else.

This is another incremental release. rr is already pretty good at what it does --- single-core recording, replaying, and debugging with gdb and reverse execution. The incremental nature of recent releases reflects that. It's still a significant amount of work to keep updating rr to fix bugs and cope with new kernel and hardware features as they start getting used by applications. Notably, this release adds support for Intel Knights Landing and Kaby Lake CPUs, contains a workaround for a regression caused by a Linux kernel security fix, and adds support for recording software using Hardware Lock Elision (and includes detection of a KVM bug that breaks our HLE support). This release also contains a fix that enables recording of Go programs.

We have some encouraging reports of rr adoption and rr users pretty much ceasing use of standalone gdb. However, rr definitely needs more users. When hardware and OS issues crop up, the bigger the user-base, the easier it will be to get the attention of the people whose cooperation we need. So, if you're a happy rr user and you want to help make sure rr doesn't stop working one day, please spread the word :-).

Let me say a tiny little bit about future plans. I plan to keep maintaining rr --- with the help of Keno and Kyle and other people in the community. I don't plan to add major new user-facing functionality to rr (i.e. going beyond gdb), but rather build such things as separate projects on top of rr. rr has a stable-in-practice C++ API that's amenable to supporting alternative debugger implementations, and I'd be happy to help people build on it. I've built but not yet released a framework that uses parts of DynamoRio to enable low-overhead binary instrumentation of rr replays; I plan to keep that closed at least until we have money coming in. The one major improvement that we expect we'll add to rr some time this year is reworking trace storage to enable easy trace portability and fix some existing issues.

I Really Admire Jehovah's Witnesses

... and other door-to-door evangelists.

I just had a couple of JWs visit me. I always cut through the script by asking door-to-door evangelists who they're with. Anyone who won't tell me straight-up I'd shoo off immediately, but people generally are open about it. On hearing they were JWs, I confirmed that they don't believe Jesus is God and asked them what they make of John chapter 1. We fenced amicably about those issues a little bit and I think they decided I wasn't worth their time.

Of course I think JWs are totally wrong about this key doctrine. However, I really admire the zeal and fortitude of anyone who does honest door-to-door evangelism, because I've done just enough of other kinds of evangelism to know how hard it would be. I believe that these people are at least partly motivated by genuine love for the lost. As Penn Jillette observed, if you think you've come across truth that's life-changing for everyone, only a coward or a criminal would fail to evangelize --- though door-to-door may not be the most effective approach.

Regardless, bravo!

Tuesday, 31 January 2017

A Followup About AV Test Reports

Well, my post certainly got a lot of attention. I probably would have put a bit more thought into it had I known it was going to go viral (150K views and counting). I did update it a bit but I see no reason to change its main thrust.

A respondent asked me to comment on the results of some AV product comparison tests that show Microsoft's product, the one I recommended, in a poor light. I had a look at one of the reports they recommended and I think my comments are worth reproducing here.

In that test, MS got 97% (1570 out of 1619) ... lower than most of the other products, but the actual difference is very small.

The major problem with tests like this is that they are designed to fit the strengths of AV products and avoid their weaknesses. The report doesn't say how they acquire their malware samples, but I guess they get them from the same sources AV vendors do. (They're not writing their own since they say they independently verify that their malware samples "work" on an unprotected machine.) So what they're testing here is the ability of AV software to recognize malware that's been in the wild long enough to be recognized and added to someone's database. That's exactly the malware that AV systems detect. But in the real world users will often encounter malware that hasn't had time to be classified yet, and possibly (e.g. in spear-phishing cases) will never be classified. A more realistic test would include malware like that. Testers should cover that by generating a whole lot of their own malware that's not in any database, and see how AV products perform on that. My guess is that that detection rate would be around 0% if they do it realistically, partly because in the real world, malware authors can iterate on their malware until they're sure the major AV products won't detect it. (And no, it doesn't really matter if the AV classifier uses heuristics or neural nets or whatever, except that using neural nets makes it devilishly hard to understand false positives and negatives.)

So for the sake of argument let's suppose 20% of attacks in the real world use new unclassified malware and 80% use old malware and none of the 20% are detected by AV products. In this report, that would be 405 additional malware samples not detected by any product. Now Microsoft scores 77.6% and the best (F-Secure in this case) scores 79.9%. That difference doesn't look as important now.

The other major issue with this whole approach is that it takes no account of the downsides of AV products. If a product slows down your system massively (even more than other products), that doesn't show up here. If a product blocks all kinds of valid content, that doesn't show up here. If a product introduces huge security vulnerabilities --- even if they're broadly known --- that doesn't show up here. If a product spams the user incessantly with annoying messages (that teach them to ignore security warnings altogether, poisoning the human ecosystem), that doesn't show up here.

This limited approach to testing probably does more harm than good, because to the extent AV vendors care about these test results, they'll optimize for them at the expense of those other important factors that aren't being measured.

There are also some issues with this particular test report. For example they say how important it is to test up-to-date software, and then test "Microsoft Windows 7 Home Premium SP1 64-Bit, with updates as of 1st July 2016" and Firefox version 43.0.4. But on 1st July 2016, the latest versions were Windows 10 and Firefox 47.0.1. Another issue I have with this particular report's product comparisons is that I suspect all it's really measuring is how closely their malware sample import pipeline matches the pipelines of other vendors. Maybe F-Secure won that benchmark because they happen to get their malware samples from exactly the same sources as AV-Comparatives and the other products use slightly different sources. The source of malware samples is critical here and I can't find anywhere they say what it is.

Of course there may be research that's better than this report. This just happens to be one recommended to me by an AV sympathizer.

Update Bruce points out that page 6 of the report, second paragraph, does describe a bit more about how they acquired their samples, and they say they scan the Internet themselves for malware samples. I don't know how I missed that! But there's still critical information missing like the lead time between a sample being scanned and then tested. I think the issues I wrote about above still apply.

Monday, 30 January 2017

Tripling Down Against USA Conference Hosting

Well, that escalated quickly!

I wrote "Really, Please Stop Booking International Conferences In The USA" and then the very next day chaos erupted as Trump's executive order on the "seven Muslim countries" appeared and went into effect.

I'm less open-borders than a lot of the people currently freaking out --- but regardless, the order is cruel and capricious, especially how it shut out people with valid visas and even green cards without warning. And I think this reinforces my point about conferences: how can you book a conference with international attendees in a country where this happens?

Administration staff are already talking about other disturbing changes:

Miller also noted on Saturday that Trump administration officials are discussing the possibility of asking foreign visitors to disclose all websites and social media sites they visit, and to share the contacts in their cell phones. If the foreign visitor declines to share such information, he or she could be denied entry. Sources told CNN that the idea is just in the preliminary discussion level.

No thanks :-(.

It has been pointed out that Trump's EO might actually force some organizations to hold more conferences in the USA because some US residents may not be able to return home if they go abroad. That makes some sense but it feels wrong.

On another note, around now the top US universities start their annual drive to recruit foreign grad students. I wonder how that's going to go. The professoriat not being exactly Trump country, they'll have to reconcile their fears with a message that foreign students are going to be OK. Careers depend on it.

Friday, 27 January 2017

Rustbelt Is Hiring

Rustbelt is looking for postdocs.

Most academic CS research projects strike me as somewhere between "could be useful" and "hahahaha no". Rustbelt is one of the very few in the category of "oh please hurry up we need those results yesterday". That's exactly the sort of project researchers should be flocking to.

The long-term goal here is to specify formal semantics for Rust's safe and unsafe code and have a practical, tool-supported methodology for verifying that modules containing unsafe code uphold Rust's safety guarantees. This would lead to an ecosystem where most developers mostly write safe code and that's relatively easy to do, but some developers sometimes write unsafe code and that's a lot harder to do because you have to formally verify that it's OK. In exchange, you know that your unsafe code isn't undermining the benefits of Rust; it's "actually safe". I think this would be a great situation to have. We need the results as soon as possible because they will suggest changes to Rust and/or what is considered "valid unsafe code", and the sooner they happen, the easier that would be.

rr Talk At Auckland C++ Meetup, February 21

I've signed up to give a talk about rr at the Auckland C++ Meetup on February 21, at 6pm. Should be fun!

Really, Please Stop Booking International Conferences In The USA

Last year I opined that international organizations should stop booking conferences in the USA in case Trump became President and followed through on his promise to ban Muslims from entering the country.

That particular risk soared when he was unexpectedly elected, then decreased given he seems to have backtracked on a blanket Muslim ban. Still, it feels like almost anything could happen, and because conference locations have to be booked pretty far ahead, IMHO prudence still means preferring non-USA venues until the future becomes more clear. (The pre-Trump ESTA changes asking for social media account data are also quite troubling. It makes me wonder whether I'm making trouble just posting this .. and I shouldn't have to.)

I especially suggest that if you're one of those people who sees Trump as a clear and present danger to American democracy and social order, it doesn't really make sense to be comfortable with organizing international conferences there.

(To be clear: I'm not calling for some kind of boycott to punish Americans for making "the wrong choice". It's simply about making sure international conferences are as inclusive and congenial as possible.)

Thursday, 26 January 2017

Disable Your Antivirus Software (Except Microsoft's)

I was just reading some Tweets and an associated Hackernews thread and it reminded me that, now that I've left Mozilla for a while, it's safe for me to say: antivirus software vendors are terrible; don't buy antivirus software, and uininstall it if you already have it (except, on Windows, for Microsoft's).

Update (Perhaps it should go without saying --- but you also need to your OS to be up-to-date. If you're on Windows 7 or, God forbid, Windows XP, third party AV software might make you slightly less doomed.)

At best, there is negligible evidence that major non-MS AV products give a net improvement in security. More likely, they hurt security significantly; for example, see bugs in AV products listed in Google's Project Zero. These bugs indicate that not only do these products open many attack vectors, but in general their developers do not follow standard security practices. (Microsoft, on the other hand, is generally competent.)

Furthermore, as Justin Schuh pointed out in that Twitter thread, AV products poison the software ecosystem because their invasive and poorly-implemented code makes it difficult for browser vendors and other developers to improve their own security. For example, back when we first made sure ASLR was working for Firefox on Windows, many AV vendors broke it by injecting their own ASLR-disabled DLLs into our processes. Several times AV software blocked Firefox updates, making it impossible for users to receive important security fixes. Major amounts of developer time are soaked up dealing with AV-induced breakage, time that could be spent making actual improvements in security (recent-ish example).

What's really insidious is that it's hard for software vendors to speak out about these problems because they need cooperation from the AV vendors (except for Google, lately, maybe). Users have been fooled into associating AV vendors with security and you don't want AV vendors bad-mouthing your product. AV software is broadly installed and when it breaks your product, you need the cooperation of AV vendors to fix it. (You can't tell users to turn off AV software because if anything bad were to happen that the AV software might have prevented, you'll catch the blame.) When your product crashes on startup due to AV interference, users blame your product, not AV. Worse still, if they make your product incredibly slow and bloated, users just think that's how your product is.

If a rogue developer is tempted to speak out, the PR hammer comes down (and they were probably right to do so!). But now I'm free! Bwahahaha!

Monday, 16 January 2017

Browser Vendors And Business Interests

On Twitter it has been said that "browser-vendors support web-standards when and only when those standards align with their own business interests". That's not always true, and even if it was, "business interests" are broad enough to make surprising results possible in a competitive browser market.

Mozilla, as a nonprofit, isn't entirely driven by "business interests". Mozilla often acts for "the good of the Web" even when that costs them money. (Example: pressing on with their own browser engine instead of switching to Chromium.)

Other vendors perceive (rightly or wrongly) that being seen to "do the right thing" has some business value. There is a PR and marketing effect, but also a recruiting effect; being seen as an evil empire makes it harder to recruit talented staff when other good options are available. To some extent Mozilla's existence has encouraged other vendors to compete on virtue.

Competitive markets can force vendors to implement standards they otherwise might not want to. For example, Apple needs an iOS browser that can render the modern Web or they'll leak market share to Android, so they're forced to implement Web platform improvements that you might think are not in the interests of their App Store business.

Decisions about Web standards and implementations are often made by individuals keen to "do the right thing" even if it might clash with corporate priorities. Everyone's good at rationalizing their decisions to themselves and others.

Saturday, 14 January 2017

Browser Vendors Are Responsible For The State Of Web Standards

The W3C and WHATWG are mainly just forums. They have some policies that may help or hurt standards work a little bit, but most responsibility for the state of Web standards rests on the participants in standards groups, especially browser vendors, who drive the development of most Web standards and are responsible for implementing most of them too. Therefore, most of the blame for problems in Web standards should be assigned to the specific browser vendors who generated and implemented those standards, and influenced them in various directions.

For example, the reason media element playback, Web Audio and MediaStreams are all quite different APIs is because when I proposed a MediaStream-based alternative to Web Audio, no other browser vendors were interested. Google already had separate teams working on Web Audio and MediaStreams and was already shipping behind a prefix, Apple was barely engaged at all in the Web Audio working group, and Microsoft was completely disengaged. It's not because of anything specifically wrong with the W3C or WHATWG. (FWIW I'm not saying my proposal was necessarily better than what we got; there are technical reasons why it's hard to unify these APIs.)

In fact, even assigning responsibility to individual browser vendors glosses over important distinctions. For example, even within the Chrome team you've got teams who care a lot about Web standards and teams who care a bit less.

One way to make positive change for Web standards is to single out specific undesirable behavior by specific vendors and call them out on it. I know (from both giving and receiving :-) ) that that has an impact. Assigning blame more generally is less likely to have impact.

Sunday, 8 January 2017

Parenting Notes

  • My oldest son just got a new phone. By mutual agreement, it's a dumbphone. He's trying to activate it and feels obliged to read the terms and conditions before agreeing to them. Somehow he's even more of a stickler for rules than I am; I've been trying to teach him how you sometimes need to break rules, but it's tricky work without wounding his conscience. He's struggling through the obtuse legalese. It's a real coming-of-age moment.
  • My children don't watch much TV and their Internet usage is restricted, but they read a lot of books and I'm supposed to keep track of them. I try hard to get them to read books I've already read and liked, but I still end up having to read a lot of popular child-oriented fiction, much of which is trash. Why aren't more of the ultra-popular teen series better written? There are some good writers, like J.K. Rowling, but so many others just churn out formula with mediocre writing and people lap it up. Right now I'm in Michael Grant's Gone series, which has very average writing but at least combines some good ideas in interesting ways. Could be worse; there's Rick Riordan, who writes similarly but without the ideas. The same rubbish exists in adult fiction, but I don't have to read that.
  • My kids don't like movies. I struggle to get them to go and see any movie, Star Wars, whatever, or even watch them at home. I don't understand it.
  • On the flip side, they love modern board games. That's excellent because my wife and I do too. It's a bit tough trying to work during the school holidays while they're constantly playing great games like Dominion, Lords of Waterdeep, Settlers, etc.

Saturday, 7 January 2017

Cheltenham Beach

Today we drove to Devonport to walk around North Head and Cheltenham Beach. It was a lovely summer day and Cheltenham Beach looked amazing, especially for a beach that's more or less in the heart of Auckland.

Wednesday, 4 January 2017

How China Can Pressure North Korea

This CNN article claims that there's very little even China can do to influence North Korea. I wonder why that is, because (though I'm relatively uninformed) it seems to me China could have a big impact on North Korea by reforming their policy for handling North Korean escapees.

Currently Chinese policy is that any escapees are returned to North Korea. This is inhumane because those returnees are jailed and often tortured, not to mention the hardships endured by escapees trying to reach South Korea traveling in secret. I understand that China doesn't want to host a flood of North Korean refugees, but I don't know why they couldn't coordinate with South Korea to efficiently ship any and all North Korean escapees to South Korea. (Maybe South Korea doesn't want this, but they could be made to accept it.) This would probably lead to people pouring out of North Korea, which would put significant pressure on the regime. Not only would this give political leverage, but it's also a far more humane policy.