Monday, 31 December 2018

Vox On Nietzsche

When I was thinking of becoming a Christian I wanted to read some anti-Christian books. I'd heard Nietzsche was worth reading so I read The Anti-Christ and Twilight Of The Idols. If anything they pushed me towards Christ: rather than presenting arguments against Christianity, they assume it's false and then rant about the implications of that — implications which are wholly unattractive to anyone reluctant to give up on morality. So I can recommend those books to anyone :-).

I was reminded of that by this Vox piece. The author tries to put some distance between Nietzsche and the "alt-right" but only partially succeeds. It's certainly true that atheist alt-righters, in rejecting Jesus but idolizing secular Christendom, have it exactly the wrong way around (though I'm glad they understand Jesus is incompatible with their ideology). It's also correct that Nietzsche argued for demolishing the trappings of Christianity that people hold onto after rejecting Jesus. Unfortunately for the Vox thesis, as far as I read, Nietzsche focused his contempt not on the geopolitics of "Christendom", but (quoting Vox) "egalitarianism, community, humility, charity, and pity". In this, Nietzsche is on the side of Nazis and against progressives and other decent human beings.

The Vox author points out that Nietzsche himself was against racism and anti-Semitism, but those who embrace his philosophy, who "reckon with a world in which there is no foundation for our highest values", can end up anywhere. If you see "egalitarianism, community, humility, charity, and pity" as non-obligatory or contemptible, your prejudices are likely to blossom into racism and worse. Fortunately Nietzsche's philosophy is incompatible with human nature, our imago Dei; intellectuals (both actual and aspiring) pay lip service to "a world in which there is no foundation for our highest values", but they do not and cannot live that way.

Friday, 21 December 2018

Hollyford Track

Previously I recounted our Milford Track trip up to the point where the rest of our group departed, leaving my children and I in Milford. On the morning of December 12 we flew in a light plane from Milford up the coast to Martins Bay; from there we walked inland over the following four days up the Hollyford Valley until we reached the lower end of the Hollyford road.

The flight itself was a great experience. We flew down the Milford Sound to the ocean and turned north to fly up the coast to Martins Bay. We were flying pretty low and got a great view of the Sound, the rugged and relatively inaccessible Fiordland coast, and the bottom end of the Hollyford Valley. Our pilot didn't have other passengers that day, so he brought along his dive gear and went diving at Martins Bay after he dropped us off, leaving his plane parked beside the tiny gravel airstrip.

We walked for about an hour from the airstrip to Martins Bay Hut and spent the rest of the day based there. Probably my best moment of the trip happened nearly right away! I thought I'd try swimming across the Hollyford River to the sandspit, but as soon as I got into the water four dolphins appeared and swam around me for a couple of minutes until, presumably, they got bored. That was an amazing experience and completely unexpected. I felt blessed and privileged. Apparently dolphins and seals often swim from the ocean up the Hollyford River all the way to the head of Lake Mckerrow, which must be around 15km inland.

That day we also visited the Long Reef seal colony about 20 minutes walk from Martins Bay Hut. We were a bit nervous since December is calving time for the seals, and indeed we met a seal on the track who barked at us, sending us running the other way! I also saw, from a distance, a Fiordland crested penguin.

By the evening of that day five other trampers had arrived at Martins Bay Hut, but it's a large hut with plenty of room for up to 24 so it still felt very spacious.

The following day we walked to Hokuri Hut along the shore of Lake Mckerrow and had a relaxing afternoon. It rained, but only after we'd arrived at the hut. (In fact we didn't use our rain jackets at all on the Hollyford Track.) A couple of the trampers from Martins Bay Hut joined us, and we also had a couple coming south from Demon Hut. A group of four visited the hut; they had rafted down the Pyke River and the Hollyford River to Lake Mckerrow and were planning to fly out once they reached Martins Bay. Rather than stay in the hut they camped by the lake. Apparently they saw seals catching fish down there.

On the third day we walked the infamous Demon Trail along Lake Mckerrow to Mckerrow Island Hut. It's several hours of picking one's way over piles of large, slippery rocks. We took it slowly and it didn't bother us, but we were glad to reach the end. We crossed "3-wire bridges" for the first time and mostly enjoyed them.

We'd been warned that Mckerrow Island Hut was dirty and rodent-infested, but despite the hut being a bit old (built in the 1960s) it seemed fine and the location is wonderful — a very short track to a beach with great views down Lake Mckerrow. We saw no sign of rodents, though they may have been deterred because we had six people in the hut that night. Two of them were pack-rafting from the Hollyford road end, down the Hollyford River, out to Martins Bay, then carrying their rafts to Big Bay, over to the Pyke River, and back to the Hollyford confluence.

Our fourth day was pretty easy, about six hours of walking to get to the Hidden Falls Hut. On the fifth day we walked for just two and a half hours to reach the Hollyford Road end, a fine riverside spot to wait for a couple of hours for a shuttle to pick us up.

The Hollyford was a harder walk than a Great Walk, and would have been harder still with less perfect weather, but it was a bit quieter and the Hollyford Valley is just as stunning, so it was well worth doing. As you'd expect the trampers we met were, on average, a lot more hard-core. Apparently we just missed meeting a couple of Chileans who walked from the road to the ocean and back carrying surfboards, which sounds crazy. We met a few guys who had done the pack-rafting round trip from the Hollyford Road end to Martins Bay to Big Bay and back down the Pyke River in just over 24 hours, which is also crazy. We took it relatively easy and I'm happy with that.

Thursday, 20 December 2018

Milford Track 2018

Earlier this month I spent 11 days in the South Island walking the Milford Track and then, after a short break in Milford, the Hollyford Track.

It was my second time on the famous Milford Track. I took my kids again, and this time went with some friends from Auckland Chinese Presbyterian Church. We booked it back in June in the first hour or two after bookings opened for this summer; it's the most popular track in New Zealand and books up very fast. Note that despite being popular, because you have to book, it's not actually busy on the track. There are only 40 unguided walkers allowed per day on each section of track. There are another 40 or so guided walkers staying at the Ultimate Hikes lodges, but they start an hour or two behind the unguided walkers each day, so you seldom see many of them.

Once again we were lucky to have mostly good weather. Unlike last time, the weather on our first day (December 7) was excellent. The boat trip up to the end of Lake Te Anau to the trailhead is a wonderful start to the experience; you feel yourself leaving civilization behind as you enter the Fiordland mountains via the fjords of Lake Te Anau.

Our only rainy day was the third day (out of four), when we crossed Mckinnon Pass. Unfortunately this meant that once again I could not see the view at the pass, which is apparently spectacular on a good day. I guess I'll have to try again sometime! Next time, if the weather's good on day two, I should go as fast as possible up the Clinton Valley to Mintaro Hut, drop my gear there and carry on up to the pass for a look around before returning to Mintaro. I guess a reasonably fit person without a pack can probably get to the top from the hut in an hour and a half.

Bad weather days on these trips don't bother me that much since I will probably be able to go again if I really want to. I feel bad for foreign visitors who are much less likely have that chance!

I did get a chance to explore Lake Mintaro and its streams this time. It's very close to the hut and well worth a walk around.

I'm not very good at identifying wildlife but I think we saw a number of whio (blue ducks). They're still endangered but it appears their numbers are rebounding thanks to the intensive predator trapping going on in the Clinton and Arthur valleys and elsewhere. Apparently it is now quite rare for the trappers to catch stoats there. There is a beech mast this season which will probably mean large-scale aerial poison drops will be needed this winter to keep rats down.

Overall I really enjoyed the time with family and friends, met some interesting people, and thanked God for the beauty of Fiordland both in the sun and in the wet. We had a particularly good time stopping for over an hour at Giant's Gate Falls near the end of the track, where the warmth of the sun and the spray from the falls mostly keep the sandflies at bay.

After we got to Milford on the last day most of our group checked into Milford Lodge and cleaned up. The next day we did a Milford Sound cruise with some kayaking, which was lots of fun. Then the rest of our group bussed out to Te Anau while the kids and I stayed another night before starting the Hollyford Track on December 12. That deserves its own blog post.

Wednesday, 28 November 2018

Capitalism, Competition And Microsoft Antitrust Action

Kevin Williamson writes an ode to the benefits of competition and capitalism, one of his themes being the changing fortunes of Apple and Microsoft over the last two decades. I'm mostly sympathetic, but in a hurry to decry "government intervention in and regulation of the part of our economy that is, at the moment, working best", he forgets or neglects to mention the antitrust actions brought by the US government against Microsoft in the mid-to-late 1990s. Without those actions, there is a high chance things could have turned out very differently for Apple. At the very least, we do not know what would have happened without those actions, and no-one should use the Apple/Microsoft rivalry as an example of glorious laissez-faire capitalism that negates the arguments of those calling for antitrust action today.

Would Microsoft have invested $150M to save Apple in 1997 if they hadn't been under antitrust pressure since 1992? In 1994 Microsoft settled with the Department of Justice, agreeing to refrain from tying the sale of other Microsoft products to the sale of Windows. It is reasonable to assume that the demise of Apple, Microsoft's only significant competitor in desktop computer operating systems, would have increased the antitrust scrutiny on Microsoft. At that point Microsoft's market cap was $150B vs Apple's $2B, so $150M seems like a cheap and low-risk investment by Gates to keep the US government off his back. I do not know of any other rational justification for that investment. Without it, Apple would very likely have gone bankrupt.

In a world where the United States v. Microsoft Corporation (2001) antitrust lawsuit didn't happen, would the iPhone have been as successful? In 1999 I was so concerned about the potential domination of Microsoft over the World Wide Web that I started making volunteer contributions to (what became) Firefox (which drew me into working for Mozilla until 2016). At that time Microsoft was crushing Netscape with superior engineering, lowering the price of the browser to zero, bundling IE with Windows and other hardball tactics that had conquered all previous would-be Microsoft competitors. With total domination of the browser market, Microsoft would be able to take control of Web standards and lead Web developers to rely on Microsoft-only features like ActiveX (or later Avalon/WPF), making it practically impossible for anyone but Microsoft to create a browser that could view the bulk of the Web. Web browsing was an important feature for the first release of the iPhone in 2007; indeed for the first year, before the App Store launched, it was the only way to do anything on the phone other than use the built-in apps. We'll never know how successful the iPhone would have been without a viable Web browser, but it might have changed the competitive landscape significantly. Thankfully Mozilla managed to turn the tide to prevent Microsoft's total browser domination. As a participant in that battle, I'm convinced that the 2001 antitrust lawsuit played a big part in restraining Microsoft's worst behavior, creating space (along with Microsoft blunders) for Firefox to compete successfully during a narrow window of opportunity when creating a viable alternative browser was still possible. (It's also interesting to consider what Microsoft could have done to Google with complete browser domination and no antitrust concerns.)

We can't be sure what the no-antitrust world would have been like, but those who argue that Apple/Microsoft shows antitrust action was not needed bear the burden of showing that their counterfactual world is compelling.

Sunday, 25 November 2018

Raglan

We spent a couple of days in Raglan celebrating our wedding anniversary. It's a small coastal town a couple of hours drive south of Auckland, famous for surfing, quiet at this time of year though I hear it gets very busy in the summer holidays. We had a relaxing time, exploring the town a little bit and driving down the coast to Te Toto Gorge to climb Mt Karioi. That's a smaller version of the nearby Mt Pirongia — the summit area comprises many steep volcanic hillocks and ridges, all covered in dense bush. The track is rough, and in one place there are chains to help climb up and down. After two and a half hours we got as far as the lookout for some fabulous views over Raglan and the coast further north, and decided not to bother with the extra hour to the true summit. The views over the ocean on the way up were quite spectacular. On the way up we could see the snowy peaks of Mt Taranaki and, from the lookout, Mt Ruapehu — each about 170km away.

Tuesday, 13 November 2018

Comparing The Quality Of Debug Information Produced By Clang And Gcc

I've had an intuition that clang produces generally worse debuginfo than gcc for optimized C++ code. It seems that clang builds have more variables "optimized out" — i.e. when stopped inside a function where a variable is in scope, the compiler's generated debuginfo does not describe the value of the variable. This makes debuggers less effective, so I've attempted some qualitative analysis of the issue.

I chose to measure, for each parameter and local variable, the range of instruction bytes within its function over which the debuginfo can produce a value for this variable, and also the range of instruction bytes over which the debuginfo says the variable is in scope (i.e. the number of instruction bytes in the enclosing lexical block or function). I add those up over all variables, and compute the ratio of variable-defined-bytes to variable-in-scope-bytes. The higher this "definition coverage" ratio, the better.

This metric has some weaknesses. DWARF debuginfo doesn't give us accurate scopes for local variables; the defined-bytes for a variable defined halfway through its lexical scope will be about half of its in-scope-bytes, even if the debuginfo is perfect, so the ideal ratio is less than 1 (and unfortunately we can't compute it). In debug builds, and sometimes in optimized builds, compilers may give a single definition for the variable value that applies to the entire scope; this improves our metric even though the results are arguably worse. Sometimes compilers produce debuginfo that is simply incorrect; our metric doesn't account for that. Not all variables and functions are equally interesting for debugging, but this metric weighs them all equally. The metric assumes that the points of interest for a debugger are equally distributed over instruction bytes. On the other hand, the metric is relatively simple. It focuses on what we care about. It depends only on the debuginfo, not on the generated code or actual program executions. It's robust to constant scaling of code size. We can calculate the metric for any function or variable, which makes it easy to drill down into the results and lets us rank all functions by the quality of their debuginfo. We can compare the quality of debuginfo between different builds of the same binary at function granularity. The metric is sensitive to optimization decisions such as inlining; that's OK.

I built a debuginfo-quality tool in Rust to calculate this metric for an arbitrary ELF binary containing DWARF debuginfo. I applied it to the main Firefox binary libxul.so built with clang 8 (8.0.0-svn346538-1~exp1+0~20181109191347.1890~1.gbp6afd8e) and gcc 8 (8.2.1 20181105 (Red Hat 8.2.1-5)) using the default Mozilla build settings plus ac_add_options --enable-debug; for both compilers that sets the most relevant options to -g -Os -fno-omit-frame-pointer. I ignored the Rust compilation units in libxul since they use LLVM in both builds.

In our somewhat arbitrary metric, gcc is significantly ahead of clang for both parameters and local variables. "Parameters" includes the parameters of inlined functions. As mentioned above, the ideal ratio for local variables is actually less than 1, which explains at least part of the difference between parameters and local variables here.

gcc uses some debuginfo features that clang doesn't know about yet. An important one is DW_OP_GNU_entry_value (standardized as DW_OP_entry_value in DWARF 5). This defines a variable (usually a parameter) in terms of an expression to be evaluated at the moment the function was entered. A traditional debugger can often evaluate such expressions after entering the function, by inspecting the caller's stack frame; our Pernosco debugger has easy access to all program states, so such expressions are no problem at all. I evaluated the impact of DW_OP_GNU_entry_value and the related DW_OP_GNU_parameter_ref by configuring debuginfo-quality to treat definitions using those features as missing. (I'm assuming that gcc only uses those features when a variable value is not otherwise available.)

DW_OP_GNU_entry_value has a big impact on parameters but almost no impact on local variables. It accounts for the majority, but not all, of gcc's advantage over clang for parameters. DW_OP_GNU_parameter_ref has almost no impact at all. However, in most cases where DW_OP_GNU_entry_value would be useful, users can work around its absence by manually inspecting earlier stack frames, especially when time-travel is available. Therefore implementing DW_OP_GNU_entry_value may not be as high a priority as these numbers would suggest.

Improving the local variable numbers may be more useful. I used debuginfo-quality to compare two binaries (clang-built and gcc-built), computing, for each function, the difference in the function's definition coverage ratios, looking only at local variables and sorting functions according to that difference:

debuginfo-quality --language cpp --functions --only-locals ~/tmp/clang-8-libxul.so ~/tmp/gcc-8-libxul.so
This gives us a list of functions starting with those where clang is generating the worst local variable information compared to gcc (and ending with the reverse). There are a lot of functions where clang failed to generate any variable definitions at all while gcc managed to generate definitions covering the whole function. I wonder if anyone is interested in looking at these functions and figuring out what needs to be fixed in clang.

Designing and implementing this kind of analysis is error-prone. I've made my analysis tool source code available, so feel free to point out any improvements that could be made.

Update Helpful people on Twitter pointed me to some excellent other work in this area. Dexter is another tool for measuring debuginfo quality; it's much more thorough than my tool, but less scalable and depends on a particular program execution. I think it complements my work nicely. It has led to ongoing work to improve LLVM debuginfo. There is also CheckDebugify infrastructure in LLVM to detect loss of debuginfo, which is also driving improvements. Alexandre Oliva has an excellent writeup of what gcc does to preserve debuginfo through optimization passes.

Update #2 Turns out llvm-dwarfdump has a --statistics option which measures something very similar to what I'm measuring. One difference is that if a variable has any definitions at all, llvm-dwarfdump treats the program point where it's first defined as the start of its scope. That's an assumption I didn't want to make. There is a graph of this metric over the last 5.5 years of clang, using clang 3.4 as a benchmark. It shows that things got really bad a couple of years ago but have since been improving.

Sunday, 4 November 2018

What Is "Evil" Anyway?

I found this Twitter thread insightful, given its assumptions. I think that, perhaps inadvertently, it highlights the difficulties of honest discussion of evil in a secular context. The author laments:

It is beyond us, today, to conclude that we have enemies whose moral universe is such that loyalty to our own morality requires us to understand it and them as evil.
That is, evil means moral principles (and the people who hold them) which are incompatible with our own. That definition is honest and logical, and I think probably the best one can do under physicalist assumptions. Unfortunately it makes evil entirely subjective; it means other people can accurately describe us and our principles as evil in just the same way as we describe them as evil. All "evil" must be qualified as "evil according to me" or "evil according to you".

This is a major problem because (almost?) nobody actually thinks or talks about evil this way in day to day life, neither explicitly nor implicitly. Instead we think and act as if "evil" is an objective fact independent of the observer. Try tacking on to every expression of moral outrage the caveat "... but I acknowledge that other people have different moral assumptions which are objectively just as valid". It doesn't work.

Christians and many other monotheists avoid this problem by identifying a privileged frame of moral reference: God's. Our moral universe may or may not align with God's, or perhaps we don't care, or we may have trouble determining what God's is, but at least it lets us define evil objectively.

The Twitter thread raises a further issue: when one encounters evil people — people whose moral universe is incompatible with our own — what shall we do? Without a privileged frame of moral reference, one can't honestly seek to show them they are wrong. At best one can use non-rational means, including force, to encourage them to change their assumptions, or if that fails, perhaps they can be suppressed and their evil neutralized. This too is most unsatisfactory.

The Christian worldview is a lot more hopeful. We believe in a standard by which all will be measured. We believe in God's justice for transgressions. We believe in redemption through Jesus for those who fall short (i.e. everyone). We seek to love those who (for now) reject God's moral universe ... a group which sometimes includes ourselves. We see that even those most opposed or indifferent to God's purposes can change. These beliefs are not purely subjective, but grounded in objective truths about what God has done and is doing in the world.

Thursday, 1 November 2018

Comments on "REPT: Reverse Debugging of Failures in Deployed Software"

It's a pretty good paper! In fact, it won a "best paper" award.

The basic idea is to use Intel PT to gather a control-flow trace and keep just the last segment in a memory buffer, and dump that buffer as part of a crash memory dump. Then they perform some inference based on final memory values and the control-flow trace to figure out what the values of registers must have been during execution leading up to the crash. The inference is non-trivial; they have to use an iterative approach and sometimes make optimistic assumptions that later are corrected. With the register values and memory values they infer, they can provide some amount of reverse-execution debugging over the time interval leading up to the crash, with negligible overhead during normal execution. It's all done in the context of Windows and WinDbg.

One qualm I have about this paper's approach is that their optimistic assumptions mean in some cases they report incorrect data values. They're able to show that their values are correct most of the time, but I would hesitate to show users data that has a significant chance of being incorrect. There might be some room for improvement here, e.g. distinguishing known-correct values from maybe-incorrect values or using more sophisticated confidence estimation.

Overall using PT to capture control flow to be saved in crash dumps seems like a really good idea. Everyone should do that. The details of the interference are probably less important, but some kind of inference like in this paper seems really useful for crash dump analysis. I think you'll still want full rr-style recording when you can afford it, though. (I think it's possible that one day we'll be able to have full recording with negligible overhead ... a man can dream!)

I have a couple of quibbles. The paper doesn't describe how they handle the x86 EFLAGS register. This is important because if EFLAGS is part of their register state, then even simple instructions like ADD aren't reversible, because they reset flags; but if ELFAGS is not part of their register state, then they can't deduce the outputs for instructions like CMOVxx, SETxx, ADC etc. I asked the lead author and they confirmed they have special handling for EFLAGS.

Another quibble I have is that they don't describe how they chose the bugs to evaluate REPT against, therefore it's difficult to be confident that they didn't cherry-pick bugs where REPT performs well. Unfortunately, this is typical for computer science papers in areas where there is no accepted standard corpus of test subjects. Our field should raise standards by expecting authors to document protocols that avoid the obvious biases — not just test cherry-picking, but also tuning tools to work well on the same tests we then report results for.

Sunday, 28 October 2018

Auckland Half Marathon 2018

This morning I ran the Auckland Half Marathon, for the sixth year in a row, fifth year barefoot. I got my best time ever: official time 1:45:07, net time 1:44:35. I had to push hard, and I've been sore all day! Glad to get under 1:45 net.

Last year I applied climbing tape to the balls of my feet to avoid the skin wearing through. Unfortunately I lost that tape and forgot to get more, so I tried some small patches of duct tape but they came off very early in the race. Nevertheless my feet held up pretty well, perhaps because I've been running more consistently during the year and perhaps because my feet are gradually toughening up year after year. The road was wet today because it rained overnight, but I can't tell whether that is better or worse for me.

Thursday, 25 October 2018

Problems Scaling A Large Multi-Crate Rust Project

We have 85K lines of Rust code implementing the backend of our Pernosco debugger. To impose some modularity constraints and to reduce build times, from the beginning we organized our code as a large set of crates in a single Cargo workspace in a single Gitlab repository. Currently we have 48 crates. This has mostly worked pretty well but as the number of our crates keeps increasing, we have hit some serious scalability problems.

The most fundamental issue is that many crates build one or more executables — e.g. command-line tools to work with data managed by the crate — and most crates also build an executable containing tests (per standard Rust conventions). Each of these executables is statically linked, and since each crate on average depends on many other crates (both our own and third-party), the total size of the executables is growing at roughly the square of the number of crates. The problem is especially acute for debug builds with full debuginfo, which are about five times larger than release builds built with debug=1 (optimized builds with just enough debuginfo for stack traces to include inlined functions). To be concrete, our 85K line project builds 4.2G of executables in a debug build, and 750M of executables in release. There are 20 command-line tools and 81 test executables, of which 50 actually run no tests (the latter are small, only about 5.7M each).

The large size of these executables slows down builds and creates problems for our Gitlab CI as they have to be copied over the network between build and test phases. But I don't know what to do about the problem.

We could limit the number of test executables by moving all our integration tests into a single executable in some super-crate ... though that would slow down incremental builds because that super-crate would need to be rebuilt a lot, and it would be large.

We could limit the number of command-line tools in a similar way — combine all the tool executables a super-tool crate that uses the "Swiss Army knife" approach, deciding what its behavior should be by examining argv[0]. Again, this would penalize incremental builds.

Cargo supports a kind of dynamic linking with its dylib option, but I'm not sure how to get that to work. Maybe we could create a super-crate that reexports every single crate in our workspace, attach all tests and binary tools to that crate, and ask Cargo to link that crate as a dynamic library, so that all the tests and tools are linking to that library. This would also hurt incremental builds, but maybe not as much as the above approaches. Then again, I don't know if it would actually work.

Another option would be to break up the project into separate independently built subprojects, but that creates a lot of friction.

Another possibility is that we should simply use fewer, bigger crates. This is probably more viable than it was a couple of years ago, when we didn't have incremental rustc compilation.

I wonder if anyone has hit this problem before, and tried the above solutions or come up with any other solutions.

Wednesday, 24 October 2018

Harmful Clickbait Headline About IT Automation

Over the years a number of parents have asked me whether they should steer their children away from IT/programming careers in case those jobs are automated away in the future. I tell them that the nature of programming work will continue to change but programming will be one of the last jobs to be fully automated; if and when computers don't need to be programmed by people, they will be capable of almost anything. (It's not clear to me why some people are led to believe programming is particularly prone to automation; maybe because they see it as faceless?)

Therefore I was annoyed by the headline in the NZ Herald this morning: "Is AI about to unseat our programmers?" Obviously it's deliberate clickbait, and it wasn't chosen by Juha Saarinen, whose article is actually quite good. But I'm sure some of the parents I've talked to, and many others like them whom I'll never know, will have fears reinforced by this headline and perhaps some people will be turned away from programming careers to their detriment, and the detriment of New Zealands's tech industry.

Monday, 22 October 2018

The Fine Line Between Being A Good Parent And A Bad Parent

Two incidents will illustrate.

Early 2013: I take my quite-young kids to hike to the summit of Mt Taranaki. The ascent is more grueling than I expected; there's a long scree slope which is two steps forward, one step back. My kids start complaining, then crying. I have to decide whether to turn back or to cajole them onward. There are no safety issues (the weather is perfect and it's still early in the day), but the stakes feel high: if I keep pushing them forward but we eventually fail, I will have made them miserable for no good reason, and no-one likes a parent who bullies their kids. I roll the dice and press on. We make it! After the hike, we all feel it was a great achievement and the kids agree we did the right thing to carry on.

Two weeks ago: I take my kids and a couple of adult international students from our church on an overnight hiking trip to the Coromandel Peninsula. On Friday we hike for four hours to Crosbies Hut and stay there overnight. It's wonderful — we arrive at the hut around sunset in glorious weather, eat a good meal, and the night sky is awesome. The next day I return to our starting point, pick up the car and drive around to Whangaiterenga campsite in Kauaeranga Valley so my kids and our guests can descend into the valley by a different route that crosses Whangaiterenga stream a few times. I had called the visitor's centre on Friday to confirm that that track is open and the stream crossings are easy. My kids are now quite experienced (though our guests aren't) and should be able to easily handle this on their own. I get to the pickup point ahead of schedule, but two hours after I expected them to arrive, they still haven't :-(.

To cut the story short, at that point I get a text message from them and after some communication they eventually walk out five hours late. They were unable to pick up the trail after the first stream crossing (maybe it was washed out), and had to walk downstream for hours, also taking a detour up a hill to get phone reception temporarily. The kids made good decisions and gained a lot of confidence from handling an unexpected situation on their own.

What bothers me is that both of these situations could easily have turned out differently. In neither case would there have been any real harm — the weather in Coromandel was excellent and an unexpected night in the bush would have been perfectly safe given the gear they were carrying (if indeed they weren't found before dark). Nevertheless I can see that my decisions could have looked bad in hindsight. If we make a habit of taking these kinds of small risks — and I think we should! — then not all of them are going to pay off. I think, therefore, we should be forgiving of parents who take reasonable risks even if they go awry.

Tuesday, 2 October 2018

The Costs Of Programming Language Fragmentation

People keep inventing new programming languages. I'm surprised by how many brand-new languages are adopted by more than just their creators, despite the network effects that would seem to discourage such adoption. Good! Innovation and progress in programming languages depend on such adoption. However, let's not forget that fragmentation of programming languages reduces the sum of those beneficial network effects.

One example is library ecosystems. Every new language needs a set of libraries for commonly used functionality. Some of those libraries can be bindings to existing libraries in other languages, but it's common for new languages to trigger reimplementation of, e.g., container data structures, HTTP clients, and random number generators. If the new language did not exist, that effort could have been spent on improving existing libraries or some other useful endeavour.

Another example is community support. Every new language needs an online community (IRC, StackOverflow, etc) for developers to help one another with questions. Fragmenting users across communities makes it harder for people to find answers.

Obviously the efforts needed to implement and maintain languages and runtimes themselves represents a cost, since focusing efforts on a smaller number of languages would normally mean better results.

I understand the appeal of creating new programming languages from scratch; like other green-field development, the lure of freedom from other people's decisions is hard to resist. I understand that people's time is their own to spend. However, I hope people consider carefully the social costs of creating a new programming language especially if it becomes popular, and understand that in some cases creating a popular new language could actually be irresponsible.

Tuesday, 25 September 2018

More Realistic Goals For C++ Lifetimes 1.0

Over two years ago I wrote about the C++ Lifetimes proposal and some of my concerns about it. Just recently, version 1.0 was released with a blog post by Herb Sutter.

Comparing the two versions shows many important changes. The new version is much clearer and more worked-out, but there are also significant material changes. In particular the goal has changed dramatically. Consider the "Goal" section of version 0.9.1.2: (emphasis original)

Goal: Eliminate leaks and dangling for */&/iterators/views/ranges
We want freedom from leaks and dangling – not only for raw pointers and references, but all generalized Pointers such as iterators—while staying true to C++ and being adoptable:
1. We cannot tolerate leaks (failure to free) or dangling (use-after-free). For example, a safe std:: library must prevent dangling uses such as auto& bad = vec[0]; vec.push_back(); bad = 42;.
Version 1.0 doesn't have a "Goal" section, but its introduction says
This paper defines the Lifetime profile of the C++ Core Guidelines. It shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time.
The new goal is much more modest, I think much more reasonable, and highly desirable! (Partly because "modern C++" has introduced some extremely dangerous new idioms.)

The limited scope of this proposal becomes concrete when you consider its definition of "Owner". An Owner can own at most one type of data and it has to behave much like a container or smart pointer. For example, consider a data structure owning two types of data:

class X {
public:
    X() : a(new int(0)), b(new char(0)) {}
    int* get_a() { return &*a; }
    char* get_b() { return &*b; }
private:
    unique_ptr<int> a;
    unique_ptr<char> b;
};
This structure cannot be an Owner. It is also not an Aggregate (a struct/class with public fields whose fields are treated as separate variables for the purposes of analysis). It has to be a Value. The analysis has no way to refer to data owned by Values; as far as I can tell, there is no way to specify or infer accurate lifetimes for the return values of get_a and get_b, and apparently in this case the analysis defaults to conservative assumptions that do not warn. (The full example linked above has a trivial dangling pointer with no warnings.) I think this is the right approach, given the goal is to catch some common errors involving misuse of pointers, references and standard library features. However, people need to understand that code free of C++ Lifetime warnings can still easily cause memory corruption. (This vindicates the title of my previous blog post to some extent; insofar as C++ Lifetimes was intended to create a safe subset of C++, that promise has not eventuated.)

The new version has much more emphasis on annotation. The old version barely mentioned the existence of a [[lifetime]] annotation; the new version describes it and shows more examples. It's now clear you can use [[lifetime]] to group function parameters and into lifetime-equivalence classes, and you can also annotate return values and output parameters.

The new version comes with a partial Clang implementation, available on godbolt.org. Unfortunately that implementation seems to be very partial. For example the following buggy program is accepted without warnings:

int& f(int& a) {
    return a;
}
int& hello() {
    int x = 0;
    return f(x);
}
It's pretty clear from the spec that this should report a warning, and the corresponding program using pointers does produce a warning. OTOH there are some trivial false positives I don't understand:
int* hello(int*& a) {
    return a;
}
:2:5: warning: returning a dangling Pointer [-Wlifetime]
    return a;
    ^
:1:12: note: it was never initialized here
int* hello(int*& a) {
           ^
The state of this implementation makes it unreliable as a guide to how this proposal will work in practice, IMHO.

Monday, 17 September 2018

The Danger Of GMail's "Smart Replies"

At first I was annoyed by GMail's "Smart Reply" buttons because they represent a temptation to delegate (more) shaping of my human interactions to Google's AI ... a temptation that, for some reason known only to Google, can be disabled in the GMail mobile app but not the desktop Web client. I do not want the words I use to communicate, or the words others use to communicate to me, to be shaped by the suggestions of an algorithm that is most likely opaque even to its masters, let alone a mere consumer like me.

I just realized, though, that they're potentially a lot worse than that. I got an email suggesting I take an action, and the suggested "smart replies" are:

  • Sounds like a good idea.
  • I like that idea.
  • Yes, I agree.
But ... what if I don't agree? Does showing me only positive responses actually prime my brain to make me more likely to agree? Is it possible to tweak the wording of an email to ensure the algorithm produces responses of a particular type? (Probably.) More importantly, did anyone at Google actually consider and study such effects before rolling out this feature? Or did the team just roll out the feature, collect the bonus, and move on? If they did study it, are the results public and what were they? Wouldn't it be wise to require this kind of study and disclosure before subtly interfering with the cognitive processes of hundreds of millions of people?

For now I'm switching back to GMail Classic, and when (I assume) Google forces the new UI on me anyway, the path of least resistance will be to use a Firefox extension to block the Smart Reply buttons (yay Web!). Of course hundreds of millions of people will unwittingly submit to Google's reckless mental meddling.

Tuesday, 4 September 2018

"Crazy Rich Asians"

Pretty good movie. A few observations... (Spoilers!)

I don't know what the ultra-rich really get up to, but for me the most absurd part of the movie was the MJ scene. Eleanor's early hand was trash; there was no way she she could have amassed the pungs-and-bamboos winning hand she did, not with Rachel also collecting bamboos.

Maybe I misunderstood everything, but didn't the Astrid-Michael subplot undermine the main plot by proving Eleanor was right all along? Michael and Astrid set aside their different backgrounds and family disapproval to marry (presumably for love), but Michael couldn't cope with the pressure and ruins their marriage ... just like Eleanor fears will happen with Rachel. Main plot: true love wins! Subplot: ... er no it doesn't.

The entire movie screams "FIRST WORLD PROBLEMS". In particular the idea that a man like Michael could not simply be grateful for his situation is marginally plausible but nearly unforgivable.

My source tells me the actors' Cantonese was pretty bad.

I'd watch Michelle Yeoh read the phone book.

Saturday, 1 September 2018

Rangitoto Fog

Visiting Rangitoto is one of my favourite things to do in Auckland. Catch the 9:15am ferry from the downtown terminal, arrive on the island just before 10am, walk up to the top, see the incredible views over Auckland and the Hauraki Gulf, and then back down via the lava caves and easily make the 12:45pm ferry getting you back to the city by 1:30pm. You've experienced a unique 600-year-old island with extraordinary geology, flora and fauna, and had a good walk, in four hours.

Today was extra-special. Very thick fog blanketed the harbour and inner Gulf, and the ferry proceeded very slowly to the island; the trip that normally takes 30 minutes took 75. We passed a number of becalmed yachts that apparently were supposed to be racing, but instead were drifting aimlessly through the fog. It was surreal. Once we finally reached the island and headed inland, we almost immediately left the fog, but the fog left behind spiderwebs sparkling with dew and rocks steaming in the sun. From Rangitoto's summit we could still see large fog banks covering Waiheke Island, Motuihe Island, and much of the inner Gulf. It was wonderful!

Friday, 24 August 2018

Long Live The Desktop Computer

Eight years ago I bought a Dell Studio XPS 8100 desktop for a home computer at a moderate price (NZD 3,100). I've just replaced a failing 1TB hard drive with a 500GB SSD, but other than that I've done no upgrades. What's interesting to me is that it's still a perfectly good machine: quad-core i7, 12GB RAM, NVIDIA GPU with 2GB VRAM. Everything I do, this machine could still do well, including software development for work. I guess if I wanted to play the latest AAA game titles or use a 4K monitor on it, I'd be unhappy, but I can't think of anything else I'd even consider doing that would be a problem, and those could be addressed by upgrading the video card. If this machine doesn't fail catastrophically I can see us continuing to use it for many more years. (I run Linux on it; the situation might be different if it was Windows.)

This is interesting because up until 2010 I'd been in the habit of upgrading computers at least every five years because they would improve dramatically over that time in ways that mattered to me. That stopped happening. It hasn't entirely stopped for everyone — Mozilla developers are getting new desktops with double-digit numbers of cores to speed up Firefox builds — but I run my heavy-duty workloads in the cloud now, because really big machines aren't efficiently utilized by a single developer. I guess the economics of utilization and colocation will making cloud-based heavy lifting (not necessarily public clouds) increasingly prevalent over time.

One of the implications is that declining desktop sales don't necessarily mean declining desktop usage. I think they must at least partly reflect longer upgrade cycles.

Another implication is that component reliability for desktops is becoming more important. It doesn't really matter if parts wear out after five years, if you're going to replace the whole machine before then anyway. If the expected lifespan of a machine is fifteen years, it's worth buying more reliable parts.

Another implication is longevity bottlenecks might shift to relatively minor features like what types of USB ports your machine has. I guess some of this can be alleviated by upgrades and dongles but it's worth thinking about.

Friday, 17 August 2018

ASAN And LSAN Work In rr

AddressSanitizer has worked in rr for a while. I just found that LeakSanitizer wasn't working and landed a fix for that. This means you can record an ASAN build and if there's an ASAN error, or LSAN finds a leak, you can replay it in rr knowing the exact addresses of the data that leaked — along with the usual rr goodness of reverse execution, watchpoints, etc. Well, hopefully. Report an issue if you find more problems.

Interestingly, LSAN doesn't work under gdb, but it does work under rr! LSAN uses the ptrace() API to examine threads when it looks for leaks, and it can't ptrace a thread that gdb is already ptracing (the ptrace design deeply relies on there being only one ptracer per thread). rr uses ptrace too, but when one rr tracee thread tries to ptrace another rr tracee thread, rr emulates the ptrace calls so that they work as if rr wasn't present.

Tuesday, 14 August 2018

Diagnosing A Weak Memory Ordering Bug

For the first time in my life I tracked a real bug's root cause to incorrect usage of weak memory orderings. Until now weak memory bugs were something I knew about but had subconciously felt were only relevant to wizards coding on big iron, partly because until recently I've spent most of my career using desktop x86 machines.

Under heavy load a Pernosco service would assert in Rust's std::thread::Thread::unpark() with the error "inconsistent state in unpark". Inspecting the code led to the disturbing conclusion that the only way to trigger this assertion was memory corruption; the value of self.inner.state should always be between 0 and 2 inclusive, and if so then we shouldn't be able to reach the panic. The problem was nondeterministic but I was able to extract a test workload that reproduced the bug every few minutes. I tried recording it in rr chaos mode but was unable to reproduce it there (which is not surprising in hindsight since rr imposes sequential consistency).

With a custom panic handler I was able to suspend the process in the panic handler and attach gdb to inspect the state. Everything looked fine; in particular the value of self.inner.state was PARKED so we should not have reached the panic. I disassembled unpark() and decided I'd like to see the values of registers in unpark() to try to determine why we took the panic path, in particular the value of self.inner (a pointer) loaded into RCX and the value of self.inner.state loaded into RAX. Calling into the panic handler wiped those registers, so I manually edited the binary to replace the first instruction of the panic handler with UD2 to trigger an immediate core-dump before registers were modified.

The core-dump showed that RCX pointed to some random memory and was not equal to self.inner, even though we had clearly just loaded it from there! The value of state in RAX was loaded correctly via RCX, but was garbage because we were loading from the wrong address. At this point I formed the theory the issue was a low-level data race, possibly involving relaxed memory orderings — particularly because the call to unpark() came from the Crossbeam implementation of Michael-Scott lock-free queues. I inspected the code and didn't see an obvious memory ordering bug, but I also looked at the commit log for Crossbeam and found that a couple of memory ordering bugs had been fixed a long time ago; we were stuck on version 0.2 while the released version is 0.4. Upgrading Crossbeam indeed fixed our bug.

Observation #1: stick to sequential consistency unless you really need the performance edge of weaker orderings.

Observation #2: stick to sequential consistency unless you are really, really smart and have really really smart people checking your work.

Observation #3: it would be really great to have user-friendly tools to verify the correctness of unsafe, weak-memory-dependent code like Crossbeam's.

Observation #4: we need a better way of detecting when dependent crates have known subtle correctness bugs like this (security bugs too). It would be cool if the crates.io registry knew about deprecated crate versions and cargo build warned about them.

Monday, 13 August 2018

The Parallel Stream Multiplexing Problem

Imagine we have a client and a server. The client wants to create logical connections to the server (think of them as "queries"); the client sends a small amount of data when it opens a connection, then the server sends a sequence of response messages and closes the connection. The responses must be delivered in-order, but the order of responses in different connections is irrelevant. It's important to minimize the start-to-finish latency of connections, and the latency between the server generating a response and the client receiving it. There could be hundreds of connections opened per second and some connections produce thousands of response messages. The server uses many threads; a connection's responses are generated by a specific server thread. The client may be single-threaded or use many threads; in the latter case a connection's responses are received by a specific client thread. What's a good way to implement this when both client and server are running in the same OS instance? What if they're communicating over a network?

This problem seems quite common: the network case closely resembles a Web browser fetching resources from a single server via HTTP. The system I'm currently working on contains an instance of this internally, and communication between the Web front end and the server also looks like this. Yet even though the problem is common, as far as I know it's not obvious or well-known what the best solutions are.

A standard way to handle this would be to multiplex the logical connections into a single transport. In the local case, we could use a pair of OS pipes as the transport, a client-to-server pipe to send requests and a server-to-client pipe to return responses. The client allocates connection IDs and the server attaches connection IDs to response messages. Short connections can be very efficient: a write syscall to open a connection, a write syscall to send a response, maybe another write syscall to send a close message, and corresponding read syscalls. One possible problem is server write contention: multiple threads sending responses must make sure the messages are written atomically. In Linux this happens "for free" if your messages are all smaller than PIPE_BUF (4096), but if they aren't you have to do something more complicated, the simplest being to hold a lock while writing to the pipe, which could become a bottleneck for very parallel servers. There is a similar problem with client read contention, which is mixed up with the question of how you dispatch received responses to the thread reading from a connection.

A better local approach might be for the client to use an AF_UNIX socket to send requests to the server, and with each request message pass a file descriptor for a fresh pipe that the server should use to respond to the client. It requires a few more syscalls but client threads require no user-space synchronization, and server threads require no synchronization after the dispatch of a request to a server thread. A pool of pipes in the client might help.

The network case is harder. A naive approach is to multiplex the logical connections over a TCP stream. This suffers from head-of-line-blocking: a lost packet can cause delivery of all messages to be blocked while the packet is retransmitted, because all messages across all connections must be received in the order they were sent. You can use UDP to avoid that problem, but you need encryption, retransmits, congestion control, etc so you probably want to use QUIC or something similar.

The Web client case is interesting. You can multiplex over a WebSocket much like a TCP stream, with the same disadvantages. You could issue an HTTP request for each logical connection, but this would limit the number of open connections to some unknown maximum, and could have even worse performance than the Websocket if the browser and server don't negotiate QUIC + HTTP2. A good solution might be to multiplex the connections into a RTCDataChannel in non-ordered mode. This is probably quite simple to implement in the client, but fairly complex to implement in the server because the RTCDataChannel protocol is complicated (for good reasons AFAIK).

This multiplexing problem seems quite common, and its solutions interesting. Maybe there are known best practices or libraries for this, but I haven't found them yet.

Monday, 30 July 2018

Gerv

I'm sad that Gerv is no longer with us, but I'm also glad because I'm confident he is in the presence of Jesus, awaiting the final resurrection.

I never spent very much time with him, but I really appreciated getting together at Mozilla events with Gerv and a small group of other Mozilla Christians to pray every morning. That tradition continues, and long may it do so!

I have always been inspired by the way Gerv and his family lived their lives to the full, to the glory of God, in the face of his long illness. I've had a sheltered life of little contact with sickness and death, but that will probably not last, and I expect in times to come I will treasure Gerv's example.

Wednesday, 11 July 2018

Why Isn't Debugging Treated As A First-Class Activity?

Mark Côté has published a "vision for engineering workflow at Mozilla": part 2, part 3. It sounds really good. These are its points:

  • Checking out the full mozilla-central source is fast
  • Source code and history is easily navigable
  • Installing a development environment is fast and easy
  • Building is fast
  • Reviews are straightforward and streamlined
  • Code is landed automatically
  • Bug handling is easy, fast, and friendly
  • Metrics are comprehensive, discoverable, and understandable
  • Information on “code flow” is clear and discoverable

Consider also Gitlab's advertised features:

  • Regardless of your process, GitLab provides powerful planning tools to keep everyone synchronized.
  • Create, view, and manage code and project data through powerful branching tools.
  • Keep strict quality standards for production code with automatic testing and reporting.
  • Deploy quickly at massive scale with integrated Docker Container Registry.
  • GitLab's integrated CI/CD allows you to ship code quickly, be it on one - or one thousand servers.
  • Configure your applications and infrastructure.
  • Automatically monitor metrics so you know how any change in code impacts your production environment.
  • Security capabilities, integrated into your development lifecycle.

One thing developers spend a lot of time on is completely absent from both of these lists: debugging! Gitlab doesn't even list anything debugging-related in its missing features. Why isn't debugging treated as worthy of attention? I genuinely don't know — I'd like to hear your theories!

One of my theories is that debugging is ignored because people working on these systems aren't aware of anything they could do to improve it. "If there's no solution, there's no problem." With Pernosco we need to raise awareness that progress is possible and therefore debugging does demand investment. Not only is progress possible, but debugging solutions can deeply integrate into the increasingly cloud-based development workflows described above.

Another of my theories is that many developers have abandoned interactive debuggers because they're a very poor fit for many debugging problems (e.g. multiprocess, time-sensitive and remote workloads — especially cloud and mobile applications). Record-and-replay debugging solves most of those problems, but perhaps people who have stopped using a class of tools altogether stop looking for better tools in the class. Perhaps people equate "debugging" with "using an interactive debugger", so when trapped in "add logging, build, deploy, analyze logs" cycles they look for ways to improve those steps, but not for tools to short-circuit the process. Update This HN comment is a great example of the attitude that if you're not using a debugger, you're not debugging.

Sunday, 24 June 2018

Yosemite: Clouds Rest And Half Dome

On Saturday morning, immediately after the Mozilla All Hands, I went with some friends to Yosemite for an outstanding five-night, five-day hiking-and-camping trip! We hiked from the Cathedral Lakes trailhead all the way down to Yosemite Valley, ascending Clouds Rest and Half Dome along the way. The itinerary:

  • Saturday night: camped at Tuolumne Meadows
  • Sunday: hiked from Cathedral Lakes trailhead past the Cathedral Lakes to Sunrise High Sierra Camp
  • Monday: hiked from Sunrise HSC past the Sunrise Lakes to camp just north of Clouds Rest
  • Tuesday: hiked up and over Clouds Rest and camped just north of the trail leading up to Half Dome
  • Wednesday: left most of our gear in camp, climbed Half Dome, returned to camp, and hiked down to camp in Little Yosemite Valley
  • Thursday: hiked out to Yosemite Valley

Apart from the first day, each day was relatively short in terms of distance, but the first few days were quite strenuous regardless because of the altitude. I've never spent much time above 2500m and I was definitely unusually short of breath. The highest points on the trail were around 3000m, where the air pressure was down to 700 millibars.

The weather was (predictably) good: cold at night the first couple of nights, warmer later, but always warm and sunny during the day.

We saw lots of animals — deer, marmots, chipmunks, woodpeckers, other birds, lizards, and other animals you don't see in New Zealand. Also lots of interesting trees, flowers and other plants.

The mosquitoes at Sunrise HSC were terrible in the morning! My friend said it was the worst he'd ever seen, even having grown up in South Florida.

I've never camped for so many consecutive nights before — in New Zealand we usually stay in huts. I got to use my "squeeze bag" mechanical water filter a lot; it works very well and doesn't have the latency of the chemical purifiers.

Swimming in the Merced River at Little Yosemite Valley after a hot day felt very good!

I thought my fear of heights would kick in climbing the cables to get to the top of Half Dome, but it didn't at all. The real challenge was upper body strength, using my arms to pull myself up the cables — my strength is all in my legs.

Needless to say, Clouds Rest and Half Dome had amazing views and they deserve their iconic status. I'm very thankful to have had the chance to visit them.

My companions on the trip were also great, a mix of old friends and new. Thank you.