Sunday, 24 June 2018

Yosemite: Clouds Rest And Half Dome

On Saturday morning, immediately after the Mozilla All Hands, I went with some friends to Yosemite for an outstanding five-night, five-day hiking-and-camping trip! We hiked from the Cathedral Lakes trailhead all the way down to Yosemite Valley, ascending Clouds Rest and Half Dome along the way. The itinerary:

  • Saturday night: camped at Tuolumne Meadows
  • Sunday: hiked from Cathedral Lakes trailhead past the Cathedral Lakes to Sunrise High Sierra Camp
  • Monday: hiked from Sunrise HSC past the Sunrise Lakes to camp just north of Clouds Rest
  • Tuesday: hiked up and over Clouds Rest and camped just north of the trail leading up to Half Dome
  • Wednesday: left most of our gear in camp, climbed Half Dome, returned to camp, and hiked down to camp in Little Yosemite Valley
  • Thursday: hiked out to Yosemite Valley

Apart from the first day, each day was relatively short in terms of distance, but the first few days were quite strenuous regardless because of the altitude. I've never spent much time above 2500m and I was definitely unusually short of breath. The highest points on the trail were around 3000m, where the air pressure was down to 700 millibars.

The weather was (predictably) good: cold at night the first couple of nights, warmer later, but always warm and sunny during the day.

We saw lots of animals — deer, marmots, chipmunks, woodpeckers, other birds, lizards, and other animals you don't see in New Zealand. Also lots of interesting trees, flowers and other plants.

The mosquitoes at Sunrise HSC were terrible in the morning! My friend said it was the worst he'd ever seen, even having grown up in South Florida.

I've never camped for so many consecutive nights before — in New Zealand we usually stay in huts. I got to use my "squeeze bag" mechanical water filter a lot; it works very well and doesn't have the latency of the chemical purifiers.

Swimming in the Merced River at Little Yosemite Valley after a hot day felt very good!

I thought my fear of heights would kick in climbing the cables to get to the top of Half Dome, but it didn't at all. The real challenge was upper body strength, using my arms to pull myself up the cables — my strength is all in my legs.

Needless to say, Clouds Rest and Half Dome had amazing views and they deserve their iconic status. I'm very thankful to have had the chance to visit them.

My companions on the trip were also great, a mix of old friends and new. Thank you.

Monday, 11 June 2018

Bay Area Visit

I'm on my way to San Francisco for a guest visit to the Mozilla All Hands ... thanks, Mozilla!

After that, I'll be taking a break for a few days and going hiking with some friends. Then I'll spending a week or so visiting more people to talk about rr and Pernosco. Looking forward to all of it!

Sunday, 10 June 2018

Crypto-Christians In Tech

This is not about cryptocurrencies; for that, watch this. Nor is it about cryptography. It's about the hidden Christians working in tech.

I sometimes get notes from Christian tech people thanking me for being open about my Christian commitment, because they feel that few of their colleagues are. That matches my experience, but it's a combination of factors: most tech people aren't Christians, but more are than you think — they're just not talking about it. Both of these are sad, but I expect the former. The latter is more problematic. I would encourage my brothers and sisters in tech to shine brighter. Here are some concerns I've had — or heard — over the years:

What can I do without being a jerk?
When asked what I did during the weekend, I say I worshiped the Creator. Sometimes I just say I went to church.

Sometimes I write blog posts about Christ. People don't have to read them if they don't want to.

I used to put Christian quotes in my email signature, but I got bashed over that and decided it wasn't worth fighting over. Now my email signatures are obscured. Those who seek, find. I should try emoji.

Sometimes I'm probably a jerk. Sorry!

Won't my career suffer?
It may. People I've worked with (but not closely) have told me they look down on me because I'm a Christian. Surely more have thought so, but not said so. But Jesus is super-clear that we need to take this on the chin and respond with love.

I don't want to be associated with THOSE OTHER Christians.
I know, right? This is a tough one because the easy path is to disavow Christians who embarrass us, but I think that is often a mistake. I could write a whole post about this, but Christians need unity and that sometimes means gritting our teeth and acknowledging our relationship with people who are right about Christ and wrong about everything else.

Another side of this is that if your colleagues only know of THOSE OTHER Christians (or perhaps just those who are particularly thick-skinned or combative), they need you to show them an alternative.

Woah, persecution!
No. Claiming I've ever experienced persecution would embarrass me among my brothers and sisters who really have.

People are generally very good about it, especially in person. People who are jerks about it generally turn out to be jerks to everyone. In the long run it will reduce the number of awkward conversations people have around you about how awful those Christians are, not knowing where you stand. But this is not about our comfort anyway.

What if I screw up and give people a bad impression?
Bad news: you will. Good news: if you were perfect, you might give people the false impression that Christianity is about being a good person (or worse, trying to make other people "good"). But of course it isn't: it's about us recognizing our sin, seeking reconciliation with people and God, and obtaining forgiveness through Christ; not just once, but every day. How can we demonstrate that if we never fail?

Saturday, 26 May 2018

rr 5.2.0 Released

I released rr 5.2.0. This is a minor update that fixes some bugs, improves chaos mode a bit, and adds some features to help with trace portability. If rr's working for you, you probably don't need to upgrade.

Friday, 25 May 2018

Intel CPU Bug Affecting rr Watchpoints

I investigated an rr bug report and discovered an annoying Intel CPU bug that affects rr replay using data watchpoints. It doesn't seem to be hit very often in practice, which is good because I don't know any way to work around it. It turns out that the bug is probably covered by an existing Intel erratum for Skylake and Kaby Lake (and probably later generations, but I'm not sure), which I even blogged about previously! However, the erratum does not mention watchpoints and the bug I've found definitely depends on data watchpoints being set.

I was able to write a stand-alone testcase to characterize the bug. The issue seems to be that if a rep stos (and probably rep movs) instruction writes between 1 and 64 bytes (inclusive), and you have a read or write watchpoint in the range [64, 128) bytes from the start of the writes (i.e., not triggered by the instruction), then one spurious retired conditional branch is (usually) counted. The alignment of the writes does not matter, and it's not related to speculative execution.

If you find rr failing during replay with watchpoints set, and the failures go away if you remove the watchpoints, it could well be this bug. Broadwell and earlier don't seem to have the bug.

A possible workaround would be to disable "fast-string optimization" in the kernel at boot time. I don't think there's any way for users to force this to happen in Linux, currently, but someone could write a kernel patch adding a command-line option for that and send it upstream. It would be great if they did!

Fortunately this sort of bug does not affect Pernosco.

Update: Pernosco

In the two years since I left Mozilla I've been working on a new debugger (with Kyle Huey, most of that time). We call it Pernosco — "know thoroughly". Pernosco takes rr recordings of failing runs, analyzes them in the cloud, and provides a Web interface for developers to debug them. It uses components of rr but the debugger implementation is completely different. Pernosco's data-oriented approach has features and performance that rr's approach cannot match — and nor can any other debugger. We've reached the point where external users have used Pernosco to fix real bugs, and their response has been very enthusiastic. It's exciting!

Developers spend a lot of time figuring out bugs. We think Pernosco will create a lot of value for software development organizations. We intend to capture some of that value by offering Pernosco as a paid cloud service. We have many ideas for making debugging easier and more fun, and we need a sustainable business model to make them happen. It's especially important because debugging is an area that has been chronically under-invested in across the industry.

Saturday, 12 May 2018

rr Chaos Mode Improvements

rr's chaos mode introduces nondeterminism while recording application execution, to try to make intermittent bugs more reproducible. I'm always interested in hearing about bugs that cannot be reproduced under chaos mode, especially if those bugs have been diagnosed. If we can figure out why a bug was not reproducible under chaos mode, we can often extend chaos mode to make it reproducible, and this improves chaos mode for everyone. If you encounter such a bug, please file an rr issue about it.

I just landed one such improvement. To trigger a specific Spidermonkey JS engine bug, some thread X had to do a FUTEX_WAKE to wake up thread Y, then immediately yield to let thread Y run for a while without X running any further. rr chaos mode assigns random priorities to threads and strictly adheres to them, so in some runs it would assign X a low priority and Y a high priority and schedule Y whenever both were runnable. However, rr's syscall buffering optimization means the rr supervisor process is not notified after the FUTEX_WAKE and has no opportunity to interrupt X and schedule Y instead, so we keep running the lower-priority X thread, violating our scheduling policy. (Chaos mode randomizes scheduling intervals so it was possible for X to yield at the right point, but very unlikely because the "window of vulnerability" is very small.) The fix is quite easy: in chaos mode, FUTEX_WAKE should not use the syscall buffering optimization. This adds some overhead, but hopefully not all that much, because every FUTEX_WAKE is normally paired with a FUTEX_WAIT (futex-using code should not issue a FUTEX_WAKE if there are no waiters), and a FUTEX_WAIT yields, which is already an expensive operation.

The same sorts of issues exist for other system calls that can make another higher-priority thread runnable, and I've added some slightly more elaborate fixes for those.

One day I should do a proper evaluation of these techniques and publish them...