Monday, 29 September 2014

Upcoming rr Talk

Currently I'm in the middle of a 3-week visit to North America. Last week I was at a Mozilla graphics-team work week in Toronto. This week I'm mostly on vacation, but I'm scheduled to give a talk at MIT this Thursday about rr. This is a talk about the design of rr and how it compares to other approaches. I'll make the content of that talk available on the Web in some form as well. Next week I'm also mostly on vacation but will be in Mountain View for a couple of days for a planning meeting. Fun times!

Tuesday, 9 September 2014

rr 2.0 Released

Thanks to the hard work of our contributors, rr 2.0 has been released. It has many improvements over our 1.0 release:

  • gdb's checkpoint, restart and delete checkpoint commands are supported.
    These are implemented using new infrastructure in rr 2.0 for fast cloning of replay sessions.
  • You can now run debuggee functions from gdb during replay.
    This is a big feature for rr, since normally a record-and-replay debugger will only replay what happened during recording --- and of course, function calls from gdb did not happen during recording. So under the hood, rr 2.0 introduces "diversion sessions", which run arbitrary code instead of following a replay. When you run a debuggee function from gdb, we clone the current replay session to a diversion session, run your requested function, then destroy the diversion and resume the replay.
  • Issues involving Haswell have been fixed. rr now runs reliably on Intel CPU families from Westmere to Haswell.
  • Support for running rr in a VM has been improved. Due to a VMWare bug, rr is not as reliable in VMWare guests as in other configurations, but in practice it still works well.
  • Trace compression has been implemented, with compression ratios of 5-40x depending on workload, dramatically reducing rr's storage and I/O usage.
  • Many many bugs have been fixed to improve reliability and enable rr to handle more diverse workloads.

All the features normally available from gdb now work with rr, making this an important milestone.

The ability to run debuggee functions makes it much easier to use rr to debug Firefox. For example you can dump DOM, frame and layer trees at any point during replay. You can debug Javascript to some extent by calling JS engine helpers such as DumpJSStack(). Some Mozilla developers have successfully used rr to fix real bugs. I use it for most of my Gecko debugging --- the first of my research projects that I've actually wanted to use :-).

Stephen Kitt has packaged rr for Debian.

Considerable progress has been made towards x86-64 support, but it's not ready yet. We expect x86-64 support to be the next milestone.

I recorded a screencast showing a quick demo of rr on Firefox:

Monday, 8 September 2014

VMWare CPUID Conditional Branch Performance Counter Bug

This post will be uninteresting to almost everyone. I'm putting it out as a matter of record; maybe someone will find it useful.

While getting rr working in VMWare guests, we ran into a tricky little bug. Typical usage of CPUID. e.g. to detect SSE2 support, looks like this pseudocode:

CPUID(0); // get maximum supported CPUID subfunction M
if (S <= M) { 
  CPUID(S); // execute subfunction S
}
Thus, CPUID calls often occur in pairs with a conditional branch between them. The bug is that in a VMWare guest, when we count the number of conditional branches executed, the conditional branch between those two CPUIDs is usually (but not always) omitted from the count. We assume this is a VMWare bug because it does not happen on the same hardware outside of a VM, and it does not happen in a KVM-based VM.

Experiments show that some code sequences trigger the bug and other equivalent sequences don't. Single-stepping and other kinds of interference suppress the bug. My best guess is that VMWare optimizes some forms of the above code, perhaps to reduce the number of VM exits, and in so doing skips execution of the conditional branch, without taking into account that this might perturb performance counter values. Admittedly, it's unusual for software to rely on precise performance counter values the way rr does.

This sucks for rr because rr relies on these counts being accurate. We sometimes find that replay diverges because one of these conditional branches was not counted during recording but is counted during replay. (The other way around is possible too, but less frequently observed.) We have some heuristics and workarounds, but it's difficult to fully work around without adding significant complexity and/or slowdown.

The bug is easily reproduced: just use rr to record and replay anything simple. When replaying, rr automatically detects the presence of the bug and prints a warning on the console:

rr: Warning: You appear to be running in a VMWare guest with a bug
    where a conditional branch instruction between two CPUID instructions
    sometimes fails to be counted by the conditional branch performance
    counter. Partial workarounds have been enabled but replay may diverge.
    Consider running rr not in a VMWare guest.

Steps forward:

  • Find a way to report this bug to VMWare.
  • Linux hosts can run rr in KVM-based VMs or directly on the host. Xen VMs might work too.
  • Parallels apparently supports PMU virtualization now; if Parallels doesn't have this bug, it might be the best way to run rr on a Mac or Windows host.
  • We can add a "careful mode" that would probably almost always replay successfully, albeit with additional overhead.
  • The bug is less likely to show up once rr supports x86-64. At least in Firefox, CPUID instructions are most commonly used to detect the presence of SSE2, which is unnecessary on x86-64.
  • In practice, recording Firefox in VMWare generally works well without hitting this bug, so maybe we don't need to invest a lot in fixing it.

Monday, 11 August 2014

Milestones On The Road To Christianity

Around the age of 20 I found myself struggling with some fairly deep philosophical questions. The most important was this: assuming (as I did) naturalism is true, then what should I do?

It seemed clear to me then (and still does) that if naturalism is true, the is-ought problem is insurmountable. There can be no objective moral truths or goals. The best we can do is identify commonly held moral values and pursue them. Unfortunately --- if honesty is one of those values --- we cannot tell others that their behavior is, in any objective sense, wrong. For example, we observe that Hitler's moral opinions are different from ours, but we could not claim that our moral opinions are intrinsically more valid. All we could do is wage war against him and hope our side prevails. Might makes right.

That doesn't make naturalism incoherent, but it opens a chasm between what naturalists can really believe about moral statements and the way almost everyone uses them in practice. The more die-hard naturalists are prone to say things like "naturalism is true, and therefore everyone should ... (stop believing in God, etc)" without respecting the limitation that the consequent ought-statements are subjective opinions, not objectively rational facts. It's really very difficult to be a proper moral relativist through-and-through!

Making this all much more difficult was my awareness of being able to reshape my own moral opinions. The evolutionary-psychology approach of "these are the values imbued by my primate brain; work them out" seems totally inadequate when the rational part of my brain can give priority to any subset of values (or none) and use that as justification for rewriting the others. Given a real choice between being a hero and a monster, on what grounds can one make that decision? It seemed a bit narrow-minded to reject monstrosity simply because it was less popular.

This all made me very dissatisfied with naturalism as a worldview. If it's true, but is powerless to say how one should live --- indeed, denies that there can be any definitive guidance how to live --- it's inadequate. Like a scientific theory that lacks predictive power, whether it's true or not, one has to keep looking for more.

(OK, I was a weird kid, but everyone thinks about this, right?)

Friday, 8 August 2014

cf1e5386ecde9c2eb9416c9b07416686

Choose Firefox Now, Or Later You Won't Get A Choice

I know it's not the greatest marketing pitch, but it's the truth.

Google is bent on establishing platform domination unlike anything we've ever seen, even from late-1990s Microsoft. Google controls Android, which is winning; Chrome, which is winning; and key Web properties in Search, Youtube, Gmail and Docs, which are all winning. The potential for lock-in is vast and they're already exploiting it, for example by restricting certain Google Docs features (e.g. offline support) to Chrome users, and by writing contracts with Android OEMs forcing them to make Chrome the default browser. Other bad things are happening that I can't even talk about. Individual people and groups want to do the right thing but the corporation routes around them. (E.g. PNaCl and Chromecast avoided Blink's Web standards commitments by declaring themselves not part of Blink.) If Google achieves a state where the Internet is really only accessible through Chrome (or Android apps), that situation will be very difficult to escape from, and it will give Google more power than any company has ever had.

Microsoft and Apple will try to stop Google but even if they were to succeed, their goal is only to replace one victor with another.

So if you want an Internet --- which means, in many ways, a world --- that isn't controlled by Google, you must stop using Chrome now and encourage others to do the same. If you don't, and Google wins, then in years to come you'll wish you had a choice and have only yourself to blame for spurning it now.

Of course, Firefox is the best alternative :-). We have a good browser, and lots of dedicated and brilliant people improving it. Unlike Apple and Microsoft, Mozilla is totally committed to the standards-based Web platform as a long-term strategy against lock-in. And one thing I can say for certain is that of all the contenders, Mozilla is least likely to establish world domination :-).

Thursday, 17 July 2014

Multiverses And Anthropic Reasoning

I liked this article summarizing the current state of science regarding multiverse theories. It's very clear and well-illustrated, and, as far as I know, accurate.

This quote is particularly interesting:

So as appealing as the idea is that there are other Level 1 Multiverses out there with different constants than our own, we have good physical reasons based on observable evidence to think it’s unlikely, and zero good reasons (because wanting it to be so is not a good reason) to think it’s likely.

He doesn't mention why anyone would "want it to be so", i.e. believe that other universes of a "Level 1 Multiverse" could have different constants to our own. However, I'm pretty sure he had in mind the selection-bias explanation for the anthropic coincidences. That is, if we accept that only a narrow range of possible values for the fundamental physical constants are compatible with the existence of intelligent life (and most scientists do, I think), then we would like to be able to explain why our universe's constants are in that range. If there are an abundance of different universes, each with different values for the physical constants, then most of them would be dead but a lucky few would sustain intelligent life, and naturally we can only observe one of those latter.

This reasoning relies on the assumption that there are abundance of different universes with different values for the physical constants. Scientists obviously would prefer to be able to deduce this from observations rather than pull it out of thin air. As discussed in the above article, theories of chaotic inflation --- which are reasonably well-grounded in observations of our own universe --- predict the existence of alternate universes. If those universes could have different values for physical constants (or even different physical laws), we'd have an observationally-grounded theory that predicts exactly the kind of multiverse needed to power the selection-bias explanation for the anthropic coincidences. Unfortunately for proponents of that explanation, the science isn't working out.

Of course, the selection-bias explanation could still be valid, either because new information shows that chaotic-inflation universes can get different constants after all, or because we assume another level of multiverse, whose existence is not due to chaotic inflation. However, many scientists (such as in the article above) find the assumption of a higher-level multiverse quite unsatisfactory.

Unsurprisingly, I'm comfortable with the explanation that our universe was intentionally created for intelligent life to live in. Incidentally, you don't need to be a classical theist to adopt this explanation; some atheist philosophers argue with varying degrees of seriousness that we are (probably) living in a simulation.