Tuesday, 9 September 2014

rr 2.0 Released

Thanks to the hard work of our contributors, rr 2.0 has been released. It has many improvements over our 1.0 release:

  • gdb's checkpoint, restart and delete checkpoint commands are supported.
    These are implemented using new infrastructure in rr 2.0 for fast cloning of replay sessions.
  • You can now run debuggee functions from gdb during replay.
    This is a big feature for rr, since normally a record-and-replay debugger will only replay what happened during recording --- and of course, function calls from gdb did not happen during recording. So under the hood, rr 2.0 introduces "diversion sessions", which run arbitrary code instead of following a replay. When you run a debuggee function from gdb, we clone the current replay session to a diversion session, run your requested function, then destroy the diversion and resume the replay.
  • Issues involving Haswell have been fixed. rr now runs reliably on Intel CPU families from Westmere to Haswell.
  • Support for running rr in a VM has been improved. Due to a VMWare bug, rr is not as reliable in VMWare guests as in other configurations, but in practice it still works well.
  • Trace compression has been implemented, with compression ratios of 5-40x depending on workload, dramatically reducing rr's storage and I/O usage.
  • Many many bugs have been fixed to improve reliability and enable rr to handle more diverse workloads.

All the features normally available from gdb now work with rr, making this an important milestone.

The ability to run debuggee functions makes it much easier to use rr to debug Firefox. For example you can dump DOM, frame and layer trees at any point during replay. You can debug Javascript to some extent by calling JS engine helpers such as DumpJSStack(). Some Mozilla developers have successfully used rr to fix real bugs. I use it for most of my Gecko debugging --- the first of my research projects that I've actually wanted to use :-).

Stephen Kitt has packaged rr for Debian.

Considerable progress has been made towards x86-64 support, but it's not ready yet. We expect x86-64 support to be the next milestone.

I recorded a screencast showing a quick demo of rr on Firefox:

Monday, 8 September 2014

VMWare CPUID Conditional Branch Performance Counter Bug

This post will be uninteresting to almost everyone. I'm putting it out as a matter of record; maybe someone will find it useful.

While getting rr working in VMWare guests, we ran into a tricky little bug. Typical usage of CPUID. e.g. to detect SSE2 support, looks like this pseudocode:

CPUID(0); // get maximum supported CPUID subfunction M
if (S <= M) { 
  CPUID(S); // execute subfunction S
}
Thus, CPUID calls often occur in pairs with a conditional branch between them. The bug is that in a VMWare guest, when we count the number of conditional branches executed, the conditional branch between those two CPUIDs is usually (but not always) omitted from the count. We assume this is a VMWare bug because it does not happen on the same hardware outside of a VM, and it does not happen in a KVM-based VM.

Experiments show that some code sequences trigger the bug and other equivalent sequences don't. Single-stepping and other kinds of interference suppress the bug. My best guess is that VMWare optimizes some forms of the above code, perhaps to reduce the number of VM exits, and in so doing skips execution of the conditional branch, without taking into account that this might perturb performance counter values. Admittedly, it's unusual for software to rely on precise performance counter values the way rr does.

This sucks for rr because rr relies on these counts being accurate. We sometimes find that replay diverges because one of these conditional branches was not counted during recording but is counted during replay. (The other way around is possible too, but less frequently observed.) We have some heuristics and workarounds, but it's difficult to fully work around without adding significant complexity and/or slowdown.

The bug is easily reproduced: just use rr to record and replay anything simple. When replaying, rr automatically detects the presence of the bug and prints a warning on the console:

rr: Warning: You appear to be running in a VMWare guest with a bug
    where a conditional branch instruction between two CPUID instructions
    sometimes fails to be counted by the conditional branch performance
    counter. Partial workarounds have been enabled but replay may diverge.
    Consider running rr not in a VMWare guest.

Steps forward:

  • Find a way to report this bug to VMWare.
  • Linux hosts can run rr in KVM-based VMs or directly on the host. Xen VMs might work too.
  • Parallels apparently supports PMU virtualization now; if Parallels doesn't have this bug, it might be the best way to run rr on a Mac or Windows host.
  • We can add a "careful mode" that would probably almost always replay successfully, albeit with additional overhead.
  • The bug is less likely to show up once rr supports x86-64. At least in Firefox, CPUID instructions are most commonly used to detect the presence of SSE2, which is unnecessary on x86-64.
  • In practice, recording Firefox in VMWare generally works well without hitting this bug, so maybe we don't need to invest a lot in fixing it.

Monday, 11 August 2014

Milestones On The Road To Christianity

Around the age of 20 I found myself struggling with some fairly deep philosophical questions. The most important was this: assuming (as I did) naturalism is true, then what should I do?

It seemed clear to me then (and still does) that if naturalism is true, the is-ought problem is insurmountable. There can be no objective moral truths or goals. The best we can do is identify commonly held moral values and pursue them. Unfortunately --- if honesty is one of those values --- we cannot tell others that their behavior is, in any objective sense, wrong. For example, we observe that Hitler's moral opinions are different from ours, but we could not claim that our moral opinions are intrinsically more valid. All we could do is wage war against him and hope our side prevails. Might makes right.

That doesn't make naturalism incoherent, but it opens a chasm between what naturalists can really believe about moral statements and the way almost everyone uses them in practice. The more die-hard naturalists are prone to say things like "naturalism is true, and therefore everyone should ... (stop believing in God, etc)" without respecting the limitation that the consequent ought-statements are subjective opinions, not objectively rational facts. It's really very difficult to be a proper moral relativist through-and-through!

Making this all much more difficult was my awareness of being able to reshape my own moral opinions. The evolutionary-psychology approach of "these are the values imbued by my primate brain; work them out" seems totally inadequate when the rational part of my brain can give priority to any subset of values (or none) and use that as justification for rewriting the others. Given a real choice between being a hero and a monster, on what grounds can one make that decision? It seemed a bit narrow-minded to reject monstrosity simply because it was less popular.

This all made me very dissatisfied with naturalism as a worldview. If it's true, but is powerless to say how one should live --- indeed, denies that there can be any definitive guidance how to live --- it's inadequate. Like a scientific theory that lacks predictive power, whether it's true or not, one has to keep looking for more.

(OK, I was a weird kid, but everyone thinks about this, right?)

Friday, 8 August 2014

cf1e5386ecde9c2eb9416c9b07416686

Choose Firefox Now, Or Later You Won't Get A Choice

I know it's not the greatest marketing pitch, but it's the truth.

Google is bent on establishing platform domination unlike anything we've ever seen, even from late-1990s Microsoft. Google controls Android, which is winning; Chrome, which is winning; and key Web properties in Search, Youtube, Gmail and Docs, which are all winning. The potential for lock-in is vast and they're already exploiting it, for example by restricting certain Google Docs features (e.g. offline support) to Chrome users, and by writing contracts with Android OEMs forcing them to make Chrome the default browser. Other bad things are happening that I can't even talk about. Individual people and groups want to do the right thing but the corporation routes around them. (E.g. PNaCl and Chromecast avoided Blink's Web standards commitments by declaring themselves not part of Blink.) If Google achieves a state where the Internet is really only accessible through Chrome (or Android apps), that situation will be very difficult to escape from, and it will give Google more power than any company has ever had.

Microsoft and Apple will try to stop Google but even if they were to succeed, their goal is only to replace one victor with another.

So if you want an Internet --- which means, in many ways, a world --- that isn't controlled by Google, you must stop using Chrome now and encourage others to do the same. If you don't, and Google wins, then in years to come you'll wish you had a choice and have only yourself to blame for spurning it now.

Of course, Firefox is the best alternative :-). We have a good browser, and lots of dedicated and brilliant people improving it. Unlike Apple and Microsoft, Mozilla is totally committed to the standards-based Web platform as a long-term strategy against lock-in. And one thing I can say for certain is that of all the contenders, Mozilla is least likely to establish world domination :-).

Thursday, 17 July 2014

Multiverses And Anthropic Reasoning

I liked this article summarizing the current state of science regarding multiverse theories. It's very clear and well-illustrated, and, as far as I know, accurate.

This quote is particularly interesting:

So as appealing as the idea is that there are other Level 1 Multiverses out there with different constants than our own, we have good physical reasons based on observable evidence to think it’s unlikely, and zero good reasons (because wanting it to be so is not a good reason) to think it’s likely.

He doesn't mention why anyone would "want it to be so", i.e. believe that other universes of a "Level 1 Multiverse" could have different constants to our own. However, I'm pretty sure he had in mind the selection-bias explanation for the anthropic coincidences. That is, if we accept that only a narrow range of possible values for the fundamental physical constants are compatible with the existence of intelligent life (and most scientists do, I think), then we would like to be able to explain why our universe's constants are in that range. If there are an abundance of different universes, each with different values for the physical constants, then most of them would be dead but a lucky few would sustain intelligent life, and naturally we can only observe one of those latter.

This reasoning relies on the assumption that there are abundance of different universes with different values for the physical constants. Scientists obviously would prefer to be able to deduce this from observations rather than pull it out of thin air. As discussed in the above article, theories of chaotic inflation --- which are reasonably well-grounded in observations of our own universe --- predict the existence of alternate universes. If those universes could have different values for physical constants (or even different physical laws), we'd have an observationally-grounded theory that predicts exactly the kind of multiverse needed to power the selection-bias explanation for the anthropic coincidences. Unfortunately for proponents of that explanation, the science isn't working out.

Of course, the selection-bias explanation could still be valid, either because new information shows that chaotic-inflation universes can get different constants after all, or because we assume another level of multiverse, whose existence is not due to chaotic inflation. However, many scientists (such as in the article above) find the assumption of a higher-level multiverse quite unsatisfactory.

Unsurprisingly, I'm comfortable with the explanation that our universe was intentionally created for intelligent life to live in. Incidentally, you don't need to be a classical theist to adopt this explanation; some atheist philosophers argue with varying degrees of seriousness that we are (probably) living in a simulation.

Thursday, 3 July 2014

Implementing Scroll Animations Using Web Animations

It's fashionable for apps to perform fancy animations during scrolling. Some examples:

  • Parallax scrolling
  • Sticky headers that scroll normally until they would scroll out of view and then stop moving
  • Panels that slide laterally as you scroll vertically
  • Elements that shrink as their available space decreases instead of scrolling out of view
  • Scrollable panels that resist scrolling as you get near the end

Obviously we need to support these behaviors well on the Web. Also obviously, we don't want to create a CSS property for each of them. Normally we'd handle this diversity by exposing a DOM API which lets developers implement their desired behavior in arbitrary Javascript. That's tricky in this case because script normally runs on the HTML5 event loop which is shared with a lot of other page activities, but for smooth touch tracking these scrolling animation calculations need to be performed reliably at the screen refresh rate, typically 60Hz. Even for skilled developers, it's easy to have a bug where once in a while some page activity (e.g. an event handler working through some unexpected large data set) blows the 16ms budget to make touch dragging less than perfect, especially on low-end mobile devices.

There are a few possible approaches to fixing this. One is to not provide any new API, hope that skilled developers can avoid blowing the latency budget, and carefully engineer the browser to minimize its overhead. We took this approach to implementing homescreen panning in FirefoxOS. This approach sounds fragile to me. We could make it less fragile by changing event dispatch policy during a touch-drag, e.g. to suppress the firing of "non-essential" event handlers such as setTimeouts, but that would add platform complexity and possibly create compatibility issues.

Another approach would be to move scroll animation calculations to a Worker script, per an old high-level proposal from Google (which AFAIK they are not currently pursuing). This would be more robust than main-thread calculations. It would probably be a bit clumsy.

Another suggestion is to leverage the Web's existing and proposed animation support. Basically we would allow an animation on an element to be use another element's scroll position instead of time as the input to the animation function. Tab Atkins proposed this with declarative CSS syntax a while ago, though it now seems to make more sense as part of Web Animations. This approach is appealing because this animation data can be processed off the main thread, so these animations can happen at 60Hz regardless of what the main thread is doing. It's also very flexible; versions of all of the above examples can be implemented using it.

One important question is how much of the problem space is covered by the Web Animations approach. There are two sub-issues:

  • What scrolling effects depend on more than just the scroll position, e.g. scroll velocity? (There certainly are some, such as headers that appear when you scroll down but disappear when you scroll up.)
  • For effects that depend on just the scroll position, which ones can't be expressed just by animating CSS transforms and/or opacity as a function of the scroll position?
If people are aware of scrolling effects in either of those two categories, it would be very useful to hear about them.