Wednesday, 9 April 2014

Getting Back To Work

... is what we need now. So let me give a brief summary of what's happening with me work-wise.

Last year I fully divested my direct reports. No more management work to do, yay! I think they probably all ended up with a better manager than they had.

I've been involved in a lot of different projects and a lot of helping other people get their work done. In between I managed to carve out a few things of my own:

  • I built reftests for async panning to try to stem the tide of regressions we were encountering there. Unfortunately that's not quite done because most of those tests aren't running on TBPL yet.
  • I worked on CSS scroll snapping with an intern. Unfortunately the spec situation got bogged down; there's an impasse between us and Microsoft over the design of the CSS feature, and Google has decided not to do a CSS feature at all for now and try to do it with JS instead. I'm skeptical that will work will, and looking forward to their proposal, but it's taking a while.
  • I landed an implementation of the CSS OM GeometryUtils interface, described in hacks.mozilla.org blog posts here and here. This fixes a functionality gap in the Web platform and was needed by our devtools team. Almost everything you would need to know about the CSS box geometry of your page is now easy to get.
  • Lately I've been doing work on rr. People are trying to use it, and they've been uncovering bugs and inconveniences that Chris Jones and I have been fixing as fast as we can. Terrence Cole used it successfully to help fix a JS engine bug! I'm using it a lot myself for general-purpose debugging, and enjoying it. I want to spend a few more days getting some of the obvious rough edges we've found filed off and then make a new release.

Looking forward, I expect to be working on making our async scrolling fully generic (right now there are some edge cases where we can't do it), and working on some improvements to our MediaStreamGraph code for lower latency video and audio.

Fighting Media Narratives

... is perhaps futile. A lot of what I have to say has already been said. Yet, in case it makes a difference:

  • Almost all Mozilla staff supported keeping Brendan Eich as CEO, including many prominent LGBT staff, and many made public statements to that effect. A small number of Tweeters calling for him to step down got all the media attention. The narrative that Mozilla staff as a group "turned against Brendan" is false. It should go without saying, but most Mozilla staff, especially me, are very upset that he's gone. I've known him, worked with him and fought alongside him (and sometimes against him :-) ) for fourteen years and having him ripped away like this is agonizing.
  • The external pressure for Brendan to step down was the primary factor driving the entire situation. The same issue exploded in 2012 but there was less pressure and he got through it. No doubt Mozilla could have handled it better but the narrative that blames Mozilla for Brendan's departure misses the big picture. Boycotting Mozilla (or anyone for that matter) for cracking under intense pressure is like shooting a shell-shocked soldier.
  • As a Christian, Mozilla is as friendly a workplace as any tech organization I've known --- which is to say, not super friendly, but unproblematic. Because of our geographic spread --- not just of headcount, but of actual power --- and our broad volunteer base I think we have more real diversity than many of our competitors. The narrative that Mozilla as a group has landed on one side of the culture war is false, or at least no more true than for other tech organizations. In fact one thing I've really enjoyed over the last couple of weeks is seeing a diverse set of Mozilla people pull together in adversity and form even closer bonds.

I'll also echo something else a lot of people are saying: we have to fix Internet discourse somehow. It's toxic. I wrote about this a while back, and this episode has made me experience the problem at a whole new level. I'll throw one idea out there: let's communicate using only recorded voice+video messages, no tweets, no text. If you want to listen to what I have to say, you have to watch me say it, and hopefully that will trigger some flickers of empathy. If you want me to listen to you, you have to show me your face. Want to be anonymous, do it the old-fashioned way and wear a mask. Yeah I know we'd have to figure out searchability, skimmability, editing, etc etc. Someone get to work on it.

Mozilla Matters

How much does the world need Mozilla? A useful, if uncomfortable, thought experiment is to consider what the world would be like without Mozilla.

Consider the world of Web standards. Microsoft doesn't contribute much to developing new Web features, and neither does Apple these days. Mozilla and Google do. Google, per Blink's own policy (mirroring our own), relies on feedback and implementation by other browser vendors, i.e. usually us. If you take Mozilla out of the equation, it's going to be awfully hard to apply the "two independent implementations" test for new Web features. (Especially since Webkit and Blink still have so much shared heritage.) For example it's hard to see how important stuff like Web Audio, WebGL and WebRTC would have become true multi-vendor standards without us. Without us, most progress would depend on unilateral extensions by individual vendors. That has all the downsides of a single-implementation ecosystem --- a single implementation's bugs become the de-facto standard; the standards process, if there even is one, becomes irrelevant; and even more power accrues to the dominant vendor.

In the bigger picture, it would be dangerous to leave the Web --- and the entire application platform space --- in the hands of three very large US corporations who have few scruples and who each have substantial non-Web interests to protect, including proprietary desktop and mobile platforms. It has always been very important that there be a compelling vendor-neutral platform for people to deploy content and apps on, a platform without gatekeepers and without taxes. The Web is that platform ---- for now. Mozilla is dedicated to preserving and enhancing that, but the other vendors are not.

Mozilla has plenty of faults, and lots of challenges, but our mission is as important as ever ... probably more important, given how computing devices and the Internet penetrate more and more of our lives. More than ever, pursuing our mission is the greatest good I can do with the talents God has given me.

Monday, 7 April 2014

Responsible Self-Censorship

People may be wondering why I, as one of the most notorious Christians at Mozilla, have been silent during the turmoil of recent days. It is simply because I haven't been able to think of anything to say that won't cause more harm in the current environment. It is not because I am afraid, malicious, or suffering from a lack of freedom of speech in any way. As soon as I have the environment and the words to say something helpful, I will.

By "the current environment" I mean the situation where many of the culture warriors of all stripes are picking over every utterance looking for something to be angry against or something to fuel their anger against others.

Update Actually, right after posting that, I thought of something to say.

I have never loved the people of Mozilla as much as I do right now. With just a few exceptions, your commitment and grace under fire is inspiring and (in the old-fashioned sense) awesome. I'm glad I'm here for this.

Thursday, 27 March 2014

Conflict

I was standing in a long line of people waiting to get their passports checked before departing Auckland to the USA. A man pushed into the line ahead of me. People around me sighed and muttered, but no-one did anything. This triggered my "if no-one else is willing to act, it's up to me" instinct, so I started a conversation:

Roc: Shouldn't you go to the back of the line?
Man: I'm in a hurry to get to my flight.
There are only two flights to the USA in the immediate future and AFAIK neither of them is imminent.
Roc: Which flight are you on?
Man: What's the big deal? I'm hardly going to slow you down.
Roc: But it's rude.
Man: So what do you want me to do?
Roc: I want you to move to the back of the line.
Man: You're going to have to make me.

At that point I couldn't think of anything to say or do so I let it go. It turned out that he was on my flight and it would have been unpleasant if he'd ended up sitting next to me.

I was a bit shocked by this incident, though I probably shouldn't have been. This kind of unapologetic rudeness is not something I encounter very often from strangers in face-to-face situations. I guess that means I'm fortunate. Surprisingly I felt quite calm during the conversation and only felt rage later.

Embarrassingly-much-later, it dawned on me that Jesus wants me to forgive this man. I am now making the effort to do so.

Wednesday, 26 March 2014

Introducing rr

Bugs that reproduce intermittently are hard to debug with traditional techniques because single stepping, setting breakpoints, inspecting program state, etc, is all a waste of time if the program execution you're debugging ends up not even exhibiting the bug. Even when you can reproduce a bug consistently, important information such as the addresses of suspect objects is unpredictable from run to run. Given that software developers like me spend the majority of their time finding and fixing bugs (either in new code or existing code), nondeterminism has a major impact on my productivity.

Many, many people have noticed that if we had a way to reliably record program execution and replay it later, with the ability to debug the replay, we could largely tame the nondeterminism problem. This would also allow us to deliberately introduce nondeterminism so tests can explore more of the possible execution space, without impacting debuggability. Many record and replay systems have been built in pursuit of this vision. (I built one myself.) For various reasons these systems have not seen wide adoption. So, a few years ago we at Mozilla started a project to create a new record-and-replay tool that would overcome the obstacles blocking adoption. We call this tool rr.

Design

Here are some of our key design parameters:

  • We focused on debugging Firefox. Firefox is a complex application, so if rr is useful for debugging Firefox, it is very likely to be generally useful. But, for example, we have not concerned ourselves with record and replay of hostile binaries, or highly parallel applications. Nor have we explored novel techniques just for the sake of novelty.
  • We prioritized deployability. For example, we deliberately avoided modifying the OS kernel and even worked hard to eliminate the need for system configuration changes. Of course we ensured rr works on stock hardware.
  • We placed high importance on low run-time overhead. We want rr to be the tool of first resort for debugging. That means you need to start getting results with rr about as quickly as you would if you were using a traditional debugger. (This is where my previous project Chronomancer fell down.)
  • We tried to take a path that would lead to a quick positive payoff with a small development investment. A large speculative project in this area would fall outside Mozilla's resources and mission.
  • Naturally, the tool has to be free software.

I believe we've achieved those goals with our 1.0 release.

There's a lot to say about how rr works, and I'll probably write some followup blog posts about that. In this post I focus on what rr can do.

rr records and replays a Linux process tree, i.e. an initial process and the threads and subprocesses (indirectly) spawned by that process. The replay is exact in the sense that the contents of memory and registers during replay are identical to their values during recording. rr provides a gdbserver interface allowing gdb to debug the state of replayed processes.

Performance

Here are performance results for some Firefox testsuite workloads: These represent the ratio of wall-clock run-time of rr recording and replay over the wall-clock run-time of normal execution (except for Octane, where it's the ratio of the normal execution's Octane score over the record/replay Octane scores ... which, of course, are the same). It turns out that reftests are slow under rr because Gecko's current default Linux configuration draws with X11, hence the rendered pixmap of every test and reference page has to be slurped back from the X server over the X socket for comparison, and rr has to record all of that data. So I also show overhead for reftests with Xrender disabled, which causes Gecko to draw locally and avoid the problem.

I should also point out that we stopped focusing on rr performance a while ago because we felt it was already good enough, not because we couldn't make it any faster. It can probably be improved further without much work.

Debugging with rr

The rr project landing page has a screencast illustrating the rr debugging experience. rr lets you use gdb to debug during replay. It's difficult to communicate the feeling of debugging with rr, but if you've ever done something wrong during debugging (e.g. stepped too far) and had that "crap! Now I have to start all over again" sinking feeling --- rr does away with that. Everything you already learned about the execution --- e.g. the addresses of the objects that matter --- remains valid when you start the next replay. Your understanding of the execution increases monotonically.

Limitations

rr has limitations. Some are inherent to its design, and some are fixable with more work.

  • rr emulates a single-core machine. So, parallel programs incur the slowdown of running on a single core. This is an inherent feature of the design. Practical low-overhead recording in a multicore setting requires hardware support; we hope that if rr becomes popular, it will motivate hardware vendors to add such support.
  • rr cannot record processes that share memory with processes outside the recording tree. This is an inherent feature of the design. rr automatically disables features such as X11 shared memory for recorded processes to avoid this problem.
  • For the same reason, rr tracees cannot use direct-rendering user-space GL drivers. To debug GL code under rr we'll need to find or create a proxy driver that doesn't share memory with the kernel (something like GLX).
  • rr requires a reasonably modern x86 CPU. It depends on certain performance counter features that are not available in older CPUs, or in ARM at all currently. rr works on Intel Ivy Bridge and Sandy Bridge microarchitectures. It doesn't currently work on Haswell and we're investigating how to fix that.
  • rr currently only supports x86 32-bit processes. (Porting to x86-64 should be straightforward but it's quite a lot of work.)
  • rr needs to explicitly support every system call executed by the recorded processes. It already supports a wide range of syscalls --- syscalls used by Firefox --- but running rr on more workloads will likely uncover more syscalls that need to be supported.
  • When used with gdb, rr does not provide the ability to call program functions from the debugger, nor does it provide hardware data breakpoints. These issues can be fixed with future work.

Conclusions

We believe rr is already a useful tool. I like to use it myself to debug Gecko bugs; in fact, it's the first "research" tool I've ever built that I like to use myself. If you debug Firefox at the C/C++ level on Linux you should probably try it out. We would love to have feedback --- or better still, contributions!

If you try to debug other large applications with rr, you will probably encounter rr bugs. Therefore we are not yet recommending rr for general-purpose C/C++ debugging. However, if rr interests you, please consider trying it out and reporting/fixing any bugs that you find.

We hope rr is a useful tool in itself, but we also see it as just a first step. rr+gdb is not the ultimate debugging experience (even if gdb's backtracing features get an rr-based backend, which I hope happens!). We have a lot of ideas for making vast improvements to the debugging experience by building on rr. I hope other people find rr useful as a building block for their ideas too.

I'd like to thank the people who've contributed to rr so far: Albert Noll, Nimrod Partush, Thomas Anderegg and Chris Jones --- and to Mozilla Research, especially Andreas Gal, for supporting this project.

Monday, 24 March 2014

Mozilla And The Silicon Valley Cartel

This article, while overblown (e.g. I don't think "no cold calls" agreements are much of a problem), is an interesting read, especially the court exhibits attached at the end. One interesting fact is that Mozilla does not appear on any of Google's do-not-call lists --- yay us! (I can confirm first-hand that Mozilla people have been relentlessly cold-emailed by Google over the years. They stopped contacting me after I told them not to even bother trying until they open a development office in New Zealand.)

Exhibit 1871 is also very interesting since it appears the trigger that caused Steve Jobs to fly off the handle (which in turn led to the collusion at the heart of the court case) was Ben Goodger referring members of the Safari team for recruitment by Google, at least one of whom was formerly from Mozilla.