Tuesday, 27 December 2016

On "Arrival"

Summary: beautifully shot, thought-provoking nonsense that's worth watching. High marks for imagination, but can't we have a science fiction movie that is imaginative, thought-provoking and stays logical for the entire duration?

Some spoiler-laden complaints follow...

The arrow of time is not a Sapir-Whorf phenomenon.

No-one builds high technology without acquiring the same basic mathematics we have.

There aren't indestructible substances that we can't figure out anything about.

Presented with a technology that controls gravity, you'd have scientists and their equipment jam-packed around it monitoring and experimenting 24-7 for years.

We don't need to worry about how to communicate with alien visitors. They'd learn our languages with ease and talk to us if and only if they want to.

I hope in real life the USA would put together a decent-sized science team instead of relying on a couple of quirky individuals.

Why would Louise tell Ian about their daughter's future if she knew he'd leave?

What happens if Louise tries to prevent what she forsees? Maybe this is connected to the previous question.

It was a nice change to have a leading couple in which the man was the superfluous accessory/love interest.

Saturday, 24 December 2016

October rr Talk Online

I should have put this up earlier, but my rr talk at Google in October is online. It's similar to previous rr talks I've given, but towards the end I talk more about future directions for debugging.

Wednesday, 21 December 2016

Disemploy The Middle/Upper Class

Machines will increasingly disrupt human employment. I wrote about this a few years ago, and I stand by what I wrote; it's hard to see a future where most humans are still employable.

The most obvious humane solution would be to tax the machine-owners heavily and redistribute the money as some kind of guaranteed income. This would require major attitude changes across the political spectrum. A post-work future has other issues too, like how to replace work as a source of self-worth. We need to have serious discussions and plans, but we don't. One reason is that the burden will fall initially on mostly lower-income people, because their jobs have tended to be easier to automate; I and my middle/upper-class peers feel safe continuing to reap the benefits of the automated society for a while yet.

Slowing down economically-motivated technological progress is hard, but directing investments to accelerate it in selected areas is not hard. So I say socially-conscious technologists and investors should focus on disrupting the employment of the middle and upper classes. When lawyers, accountants and middle managers are losing their jobs en masse along with poorer people, we are much more likely to see equitable solutions.

Some Comments On "Sapiens"

Over the weekend I read Sapiens, by Yuval Harari. It's a good read even if you've already read a lot of pop-science/history, telling a history of the human race with emphasis on "revolutions" --- "cognitive", "agricultural", "industrial", etc. Unfortunately, the more he ventures into areas where I have detailed knowledge --- in particular, Christianity and computing --- the more it becomes apparent that he's sloppy.

He clearly has no sympathy for Christianity, and monotheism in general, which is fine, but he doesn't seem to try to treat them fairly. For example he carefully describes the St. Bartholomew's Day massacre as being solely about Catholics and Protestants disputing over doctrines of salvation; he surely knows that other issues, including struggles over temporal power, were deeply involved. Elsewhere, on economics, he says religions hold "money is the root of all evil", but he or his editors should have known this is a popular misquotation of 1 Timothy 6:10: "For the love of money [φιλαργυρία, philargyria, "avarice"] is the root of all evil". His interpretation of Matthew 19:16-30, the parable of the rich young ruler, is likewise off. There's a throw-away line (I don't have the text here) where he suggests Christianity grew mainly or solely through conquest and bloodshed, which it did at times but did not do, for example, in its critical first few centuries, nor in its recent explosive growth in Africa and Asia.

That sort of thing is normal among Harari's pop-science peers so I hardly fault him for it. He betters them by also showering skepticism on post-Enlightenment intellectual projects. He points out that the idea of "all people are created equal" is absurd once you rip away its theistic underpinnings. He states clearly that societies cannot function without shared myths, or at least "belief in belief" (e.g., belief in the value of money). In the conclusion of his book, he takes atheism to its bitter end and acknowledges its great problem: "what then shall we do?" I can't give him full marks, because throughout the book he backslides and shows irrational dedication to principles such as "individual suffering matters" without acknowledging that this is just his shared myth — but overall, well done.

He talks a bit about the computing revolution and the future of history, but there he misses the mark. He talks about computer viruses evolving via mutation and natural selection, which hasn't happened and isn't going to happen for various reasons, but he says very little about AI, or about how the Internet is reshaping society, which are much more important.

Generally when an author is sloppy in areas you know about, you should be cautious about everything else they say. I have to hope he's better than that, but it's still a good read regardless.

Saturday, 19 November 2016

Overcoming Stereotypes One Parent At A Time

I just got back from a children's sports club dinner where I hardly knew anyone and apparently I was seated with the other social leftovers. It turned out the woman next to me was very nice and we had a long conversation. She was excited to hear that I do computer science and software development, and mentioned that her daughter is starting university next year and strongly considering CS. I gave my standard pitch about why CS is a wonderful career path --- hope I didn't lay it on too thick. The daughter apparently is interested in computers and good at maths, and her teachers think she has a "logical mind", so that all sounded promising and I said so. But then the mother started talking about how that "logical mind" wasn't really a girly thing and asking whether the daughter might be better doing something softer like design. I pushed back and asked her not to make assumptions about what women and men might enjoy or be capable of, and mentioned a few of the women I've known who are extremely capable at hard-core CS. I pointed out that while CS isn't for everyone and I think people should try to find work they're passionate about, the demand and rewards are often greater for people in more technical roles.

This isn't the first time I've encountered mothers to a greater or lesser extent steering their daughters away from more technical roles. I've done a fair number of talks in high schools promoting CS careers, but at least for girls maybe targeting their parents somehow would also be worth doing.

I'll send this family some links to Playcanvas and other programming resources and hope that they, plus my sales pitch, will make a difference. One of the difficulties here is that you never know or find out whether what you did matters.

Thursday, 17 November 2016

Stop Saying "Xs Do Y" Disingenuously

I keep seeing inflammatory statements about members of some group where it's left deliberately unclear whether they mean some, most, or all members of the group.

For example, someone will tweet "Australians club baby penguins!" The intent is to encourage outrage against the malevolence of Australians. This is unjust (I guess), but if pressed the author will fall back on the defense that indeed more than one Australian is known to have bashed penguins and "that's all they meant".

To some extent they're exploiting a defect of the English language :-(.

An extension of this fallacy is to say "Xs Do Y and yet they do Z" where some Xs do Y and some Xs do Z and there is some inherent contradiction between Y and Z. This is designed to highlight dishonest or incoherent behaviour, but of course glosses over the possibility that the sets of Xs doing Y and Xs doing Z have small intersection. This is especially frustrating for members of X who do Y but not Z. Example: "Yesterday Australians liked cute animals, today Australians club baby penguins, WTF?"

Sure, these examples are ridiculous but real ones are everywhere. (I don't cite real examples because their content would distract from the point.)

Update A couple of people mentioned that this is a subcategory of motte-and-bailey behaviour. Actually I really like that Slate Star Codex blog. There's an article there that is quite similar to a theme I wrote about.

Monday, 14 November 2016

Handling Hardware Lock Elision In rr

Intel's Hardware Lock Elision feature lets you annotate instructions with prefixes to indicate that they perform lock/unlock operations. The CPU then turns those into hardware memory transactions so that the instructions in the locked region are performed speculatively and only committed at the unlock. The difference between HLE and the more capable RTM transactional memory support is that HLE is supposed to be fully transparent. The prefixes are ignored on non-HLE-supporting CPUs so you can just add them to your code and things will hopefully get faster --- no CPUID checks are necessary. Unfortunately, by default, Intel's hardware performance counters count events in aborted transactions, even though they didn't really happen in terms of user-space effects. Thus when rr records a program that uses HLE, our conditional branch counter may report a value higher than the number of conditional branches that "really" executed, and this breaks rr. (FWIW we discovered this problem when Emilio was using rr to debug intermittent failures in Servo using the latest version of the Rust parking_lot crate.)

For RTM we have some short-term hacks to disable RTM usage in glibc, and the medium-term solution is to use "CPUID faulting" to trap CPUID and modify the feature bits to pretend RTM is not supported. This approach doesn't work for HLE because there is no need to check CPUID before using it.

Fortunately Intel provides an IN_TXCP flag that you can set on a hardware performance counter to indicate that it should not count events in aborted transactions. This is exactly what we need. However, for replay we need to be able to program the PMU to send an interrupt after a certain number of events have occurred, and the Linux kernel prevents us from doing that for IN_TXCP counters. Apparently that's because if you request an interrupt after a small number of events and then execute an HLE transaction that generates more than that number of events, the CPU will detect the overflow, abort the transaction, roll the counter back to its pre-transaction value, then the kernel notices there wasn't really an overflow, restarts the transaction, and you're in an infinite loop.

The solution to our dilemma is to use two counters to count conditional branches. One counter is used to generate interrupts, and it is allowed to count events in aborted transactions. Another counter uses IN_TXCP to avoid counting events in aborted transactions, and we use this counter only for measurement, never for generating interrupts. This setup works well. It means that during replay our interrupt might fire early, because the interrupt counter counted events in aborted transactions, but that's OK because we already have a mechanism to carefully step forward to the correct stopping point.

There is one more wrinkle. While testing this new approach I noticed that there are some cases where the IN_TXCP counter reports spurious events. This is obviously a nasty little bug in the hardware, or possibly the kernel. On my system you can reproduce it just by running perf stat -e r5101c4 -e r2005101c4 ls --- the second event is just the IN_TXCP version of the first event (retired conditional branches), so should always report counts less than or equal to the first event, but I get results like

 Performance counter stats for 'ls':
         1,994,374      r5101c4                                                     
         1,994,382      r2005101c4
I have a much simpler testcase than ls which I'll try to get someone at Intel to look at. For now, we're working around it in rr by using the results of the regular counter when the IN_TXCP counter's value is larger. This should work as long as an IN_TXCP overcount doesn't occur in an execution sequence that also uses HLE, and both of those are hopefully rare.

Sunday, 13 November 2016

Misinterpreting Close Contests

I often read about close rugby games decided by a margin equal to or less than the value of one penalty kick. Many reports will interpret the result as meaning the winning side was better-coached, has a knack for winning tight games, is generally better, etc. However, a gust of wind, a muscle twitch, or many other events mostly outside anyone's control could have changed the result, in which case the analysis would have been completely different. Thus, such analysis is nonsense.

The same problem afflicts analysis of close election results. If the margin of victory is very small, who actually won is not evidence of any preexisting condition, but it is often erroneously interpreted as strong evidence for the appeal of some candidate, or the effectiveness of some strategy, etc. In a close contest, "who won" will have a dramatic effect on the future but is irrelevant to explaining the past.

To avoid this cognitive bias, I wish rugby writers reporting on a close game would complete their analysis before they watch the last few minutes of the game. Likewise for people reporting on elections.

Friday, 11 November 2016

Welcoming Richard Dawkins

Richard Dawkins wants New Zealand to invite Trump/Brexit-refugee scientists to move here to create "the Athens of the modern world".

I appreciate the compliment he pays my country (though, to be honest, I don't know why he singled us out). I would be delighted to see that vision happen, but in reality it's not going to. Every US election the rhetoric ratchets up and people promise to move here, but very very few of them follow through. Even Dawkins acknowledges it's a pipe-dream. This particular dream is inconceivable because "the Athens of the modern world" would need a gigantic amount of steady government funding for research, and that's not going to happen here.

To be honest it's a little bit frustrating to keep hearing people talk about moving to New Zealand without following through ... it feels like being treated more as a rhetorical device than a real place and people. That said, I personally would be delighted to welcome any science/technology people who really want to move here, and New Zealand's immigration system makes that relatively easy. I'd be especially delighted for Richard Dawkins to follow his own lead.

Wednesday, 9 November 2016

Dangerous Permissions

I know that "man's anger does not produce the righteousness that God desires". But I also know from Jesus' example that there is such a thing as valid, righteous anger. The danger is that when I feel a desire to be unreasonably angry, which is fairly often, I try to find a way to classify it as "righteous anger" so I have permission, or even an obligation, to let that feeling flourish. Better still if some other person validates my anger, commends it, or commands it as a duty! Often the Internet is an excellent resource for finding validation for my anger.

This is especially pernicious when the permission is actually reasonable but I use it as an excuse to enjoy rage. It's easy to look at the world and feel justified anger at, say, the latest irresponsible behavior of technology vendors. But since there's little I can do about it, feeding such anger is counterproductive; it makes me an angry, grumpy, frustrated person with an inflated sense of superiority and scorn.

A completely different sort of example is racism. Some forms of racism come easily to me and I have to be constantly vigilant against them. If someone (not naming names, cough cough) or some community was telling me it's OK to be racist (explicitly or implicitly), my job would be that much harder. Although there isn't any sort of "righteous racism", there are justified complaints about aspects of specific cultures that can be used as a cover for racist feelings. To dodge this problem, many communities try to exterminate the permission by treating any criticism of culture (or at least certain cultures) as de facto racist; this goes too far. I just try to avoid thinking about these issues unless it's really necessary, which it seldom is, and when I do think about them I have to be very very careful.

I think everyone needs to learn to identify those moments when we're seeking permission to indulge a sinful desire, and understand that even valid grounds for that permission does not validate the desire.

Sunday, 30 October 2016

Auckland Half Marathon #4

Official time 1:46:34, net time 1:45:14. This is my new personal best; I've been getting faster every year. This year I did a bit more training than last year, but I'm a bit disappointed because I didn't push myself as hard as last year.

At this pace and distance my feet aren't quite tough enough; both this year and last year a spot of skin on my right foot wore through and started bleeding. I don't feel it while I'm running but it's not a good look, so either I need to do more training or wear my Vibrams if I'm going to run faster or further.

Thursday, 27 October 2016

Implications Of ASLR Side-Channel Attacks

This paper is a pretty interesting attack on kernel ASLR, based on observing timing differences in (invalid) accesses to kernel memory from user-space with Intel's TSX hardware transactional memory primitives. There is other recent work on kernel ASLR information leaks based on cache timing and BTB timing. I was a bit underwhelmed by the BTB attacks but this TSX-based attack is much stronger. My guess is that there are a lot more timing-channel and other side-channel attacks to be discovered under this thread model. While I was at Berkeley, I was quite stunned to hear that Intel's SGX enclave extension was designed for a threat model that explicitly excludes side-channel attacks!

This all reminds me of the wave of CFI bypass attacks. CFI and ASLR are supposed to reduce exploitability of bugs providing attack primitives like buffer overflows, but in the long run, they may not be much good. This increases the importance of denying attackers access to these primitives in the first place. Programming languages that reduce the TCB are an important part of that. Glad to be writing most of our new code in Rust!

Tangentially, I wonder whether the publication of the TSX-timing-channel paper was a good thing overall (other than for the careers of the researchers), given the paper's conclusion that there are no practical countermeasures without hardware changes or significant performance degradation ... not even a microcode update to disable TSX is available. Ostensibly the value of attack papers, like white-hat security analysis, is to stimulate the creation and deployment of defensive countermeasures, but in this case there really aren't any. Would we all be better off, in practice, if the issue had been reported to hardware vendors and silently fixed, with microcode updates available for older hardware, before publication? Like Andrew Myers, I feel that the incentives to favour attack research over defense mean we're spending a lot of public money to mostly make the security situation worse.

Valuing America

There's a lot of anti-Americanism around. Some of it is justified, but a lot of it is a knee-jerk reaction that is unjustified and dangerous. It's especially dangerous when it takes the form of false equivalence: "sure, Russia/China/North Korea/Iran/Saudi Arabia is bad, but the USA is bad too". That position is wrong, whether it's taken by left-wing activists or Donald Trump.

In the former countries (and many others), if you're a popular opponent of the government you are very likely to end up in prison or dead. To a first approximation, countries that look more like America in their economic systems and institutions deliver more freedom and less poverty to their citizens. (China's history is a great validating experiment.) It is no accident that Western Europe turned out better than Eastern Europe, or that former Soviet satellite states would rather be part of NATO and Europe than in the Russian orbit, or that South Korea and North Korea turned out very differently, or that Taiwan and Hong Kong turned out better than mainland China. I think it's telling that "peace activists" in New Zealand reliably protest all sorts of US military action but don't bother about Russia slaughtering civilians in Syria, for example; they unwittingly show their respect for the USA by holding it to a higher standard.

That false equivalence creates various problems. It undermines support for sanctions and other action against the more problematic countries. Worse, it encourages the idea that it would be good for US global power to be substantially curtailed. To me, that seems like a very bad idea: I am very concerned about existential threats, especially nuclear weapons proliferation, and without strong US leadership that will more quickly become a free-for-all with disastrous consequences for humanity. It would be excellent if other great powers, like China or the EU, decided to take these responsibilities seriously, but that doesn't seem to be happening. Indeed, Russia and China recklessly enable Iran, North Korea and others for their own short-term ends. They (and much of the West) privately expect the USA to be the world's policeman and clean up the mess, while publicly berating it whenever that goes wrong.

The opposite error would be to uncritically accept US systems and actions as good. Most people in New Zealand and elsewhere are pretty far away from that error (except when they cherry-pick evidence to support their arguments). Personally I think New Zealand, for example, gets a lot of things more right than the USA does. That does not make the USA dispensable.

Thursday, 20 October 2016

Dell, Your Website Security Is Broken

You can download firmware and BIOS updates from Dell. Unfortunately the download link is plain HTTP :-(. Fortunately the page provides SHA hashes for the download, which are even correct --- though I imagine practically no-one checks them. Unfortunately, the download page itself is plain HTTP so those hashes can't be trusted either :-(.

Interestingly, the download page is available via HTTPS as well, but Google searches for "Dell bios update" etc point to the insecure version of the site. I have no idea why that would be.

Wednesday, 19 October 2016

Pivoting To Cyber-Forestry

Forestry work is dangerous. Wouldn't it be great if we could save money and lives using autonomous drones to trim trees in rural and urban areas? We could call it cyber-forestry, or we could call it frickin' flying chainsaws ... I could go either way.

You may be thinking of potential downsides, like what happens if the flying chainsaws get hacked. Don't worry, our security people will be the best in the business. Well, among the best. Well, as good as Yahoo's anyway.

Government regulation could become an issue. We'll use the Uber approach: scale first, lobby later. With thousands of flying chainsaws deployed everywhere, who dares regulate us?

Yeah OK, I admit this could turn out badly for humanity. But hey, if we don't do it, someone else will. Better for us to be first, because we're wiser and smarter than them. Whoever they are.

Sunday, 16 October 2016

Ironic World Standards Day

Apparently World Standards Day is on October 14. Except in the USA it's celebrated on October 27 and in Canada on October 5.

Are they trying to be ironic?

Saturday, 15 October 2016

Tawharanui Revisited

Today I visited Tawharanui peninsula again with some friends and family. The weather wasn't great --- it was a bit wet, windy and cold at times --- but it was still a good trip.

Tawharanui is protected by a predator-proof fence, and has populations of some of NZ's endangered native birds. We were lucky enough to see some North Island saddlebacks.

At a lookout at the end of the peninsula we could see a number of extremely large "lion's mane" jellyfish in the water. These have also been washing up further north recently. My parental vanity was gratified by my children immediately recalling the Sherlock Holmes story The Adventure Of The Lion's Mane.

Monday, 3 October 2016

rr Paper: "Lightweight User-Space Record And Replay"

Earlier this year we submitted the paper Lightweight User-Space Record And Replay to an academic conference. Reviews were all over the map, but ultimately the paper was rejected, mainly on the contention that most of the key techniques have (individually) been presented in other papers. Anyway, it's probably the best introduction to how rr works and how it performs that we currently have, so I want to make it available now in the hope that it's interesting to people.

Update The paper is now available on arXiv.

Sunday, 2 October 2016

Bay Area Talks About rr And Beyond, October 2-7

I'm visiting the Bay Area in the coming week to talk about rr and where it could go. It's a mix of organised talks and smaller meetings. The talks are as follows:

  • Monday October 3, 2:30pm: Dropbox (SF office, 333 Brannan St)
  • Tuesday October 4, 11:00am: Google (MTV-PLY-2)
  • Thursday, October 6, 10:00am: Intel (Santa Clara, 3600 Juliette Lane)
  • Friday, October 7, 3:30pm: Berkeley (511 Soda Hall)

I haven't scheduled anything at Mozilla this time. They've already heard me talk about rr a lot :-).

Thanks to all the people who helped me organise this. I still have some free time in my schedule for additional meetings as necessary.

Saturday, 1 October 2016

rr 4.4.0 Released

I just pushed out the release of rr 4.4.0. It's mostly the usual reliability and syscall coverage improvements. There are a few highlights:

  • Releases are now built with CMAKE_BUILD_TYPE=Release. This significantly improves performance on some workloads.
  • We support recording and replaying Chromium-based applications, e.g. chromium-browser and google-chrome. One significant issue was that Chromium (via SQLite) requires writes performed by syscalls to be synced automatically with memory-maps of the same file, even though this is technically not required by POSIX. Another significant issue is that Chromium spawns a Linux task with an invalid TLS area, so we had to ensure rr's injected preload library does not depend on working glibc TLS.
  • We support Linux 4.8 kernels. This was a significant amount of work because in 4.8, PTRACE_SYSCALL notifications moved to being delivered before seccomp notifications instead of afterward. (It's a good change, though, because as well as fixing a security hole, it also improves rr recording performance; the number of ptrace notifications for each ptrace-recorded syscall decreases from 3 to 2.) This also uncovered a serious (to rr) kernel bug with missing PTRACE_EVENT_EXIT notifications, which fortunately we were able to get fixed upstream (thanks to Kees Cook).
  • Keno Fischer contributed some nice performance improvements to the "slow path" where we are forced to context-switch between tracees and rr.
  • Tom Tromey contributed support for accessing thread-local variables while debugging during replay. This is notable because the "API" glibc exports for this is ghastly.

Tuesday, 20 September 2016

Is Apple A Christian Environment?

There's an article in Tech.Mic, syndicated to the New Zealand Herald and probably other places, about sexism in Apple's workplace culture. Most of it sounds all too plausible :-(. I haven't seen many examples of this sort of behaviour at Mozilla or elsewhere, but I wouldn't expect to have since, very sadly, I haven't had many chances to work with women in the office.

One detail doesn't sound right, though:

"several people" who have quit, citing a "white, male, Christian, misogynist, sexist environment"
Is Apple really a Christian environment? In my experience of the Silicon Valley technical workforce, Christians are extremely scarce and most of those keep quiet about it at work ... often for good reason. The environment is, in fact, fairly Christian-hostile. In social settings I have sometimes been in conversations where people started talking about what idiots those Christians are, assuming everyone present shares such views, and I'm sure there are more when I'm not around! I find it hard to believe Apple is different. (I know it's a cult, but not that kind of cult :-).)

I guess the quote probably meant "Christian" as some kind of general, malign force of culture and tradition, only loosely connected to actual Christians and Christianity. Regardless, it's frustrating to be saddled with.

Monday, 12 September 2016

Theism And The Simulation Argument

The simulation argument is quite popular these days; Elon Musk famously finds it compelling. I don't, but for those who do, I think it has an important consequence that is obvious yet underappreciated: if we live in a simulation, then a form of theism is true. If the simulation argument is valid, then atheism is improbable.

The agent responsible for the simulation would be the God of our universe: it intentionally designed, created and sustains our universe for some purpose. (That purpose might have nothing to do with us or our pocket of the universe, but since life on Earth is the most complicated system we have observed so far, we can at least take as a working hypothesis that the purpose is connected to us.) This already implies we have at least a God of classical deism.

Furthermore, if we can use our simulations as an analogy, we should assume the simulating agent has ongoing access to the substrate running the simulation and the ability to modify its state. In other words, the agent is likely to be omnipotent (in our universe) and miracles are possible. (This is not the "can make a rock so heavy he can't lift it" straw-man omnipotence targeted by many atheists, but it's well within Biblical parameters.) It's also possible for the agent to reveal itself to us, and that it might want to.

The other two "omnis" do not seem to follow. I see no reason to believe that the simulating agent would be omniscient, even just about our universe. Likewise I see no reason to believe it would be omnibenevolent. However, being omnipotent, it could make us believe it was omniscient and omnibenevolent if it wanted to.

I don't expect to see proponents of the simulation argument in churches en masse anytime soon :-). However, they should be taking theism seriously and atheism not seriously at all.

Saturday, 3 September 2016

Auckland Food 2016

Here are some places I like at the moment, valuing not just quality but also interest and value for money.

  • Uncle Man's on Karangahape Road (Malaysian). Best roti in the city in my opinion. Just a little bit better than Selera and Papa Rich.
  • Cafe Abyssinia in Mt Roskill. The only Ethiopian food in Auckland that I know of.
  • Kairali in Royal Oak. Good cheap South Indian food.
  • Jade Town on Dominion Road. Uyghur food, i.e., far-western Chinese. Get there early, around 6pm, or their specials are sold out.
  • Hog Heaven in Newmarket. The pulled pork sandwich is delicious. Go there soon because there's not much traffic and I fear it has not long to live.
  • Momotea in Newmarket and elsewhere. Decent food but their real attraction is their Taiwanese-style drinks and desserts. I like the black sesame milkshake and the "Momo Redbean Ice", perhaps the only time I've found a dessert to be bigger than it looks on the menu.
  • Viet Kitchen on Dominion Road. The pork-and-prawn pancake is delicious.
  • Hansan in Newmarket and elsewhere. Nominally also Vietnamese, but a different style that I've been told is more Cambodian. I think the food's generally less good than Viet Kitchen but it has milkshakes and desserts which I greatly enjoy, especially the coconut-banana-sago pudding.
  • Tombo in Newmarket, Korean-Japanese. The lunch boxes are good, and the dinner buffet is great, if a little overpriced.
  • Smile Dessert in Somerville. High-end Chinese dessert restaurant which also sells light meals. I like the walnut and black sesame soup.
  • Dae Bak in the city. Great value Korean BBQ buffet ($20 lunch, $22 dinner). A bit less variety than Gangnam Style in Takapuna, but cheaper, easier to get to, and a less cringe-worthy name.

Wednesday, 31 August 2016

Avoiding Cache Writebacks For Freed Memory

I wonder how much memory traffic is generated by the CPU writing out evicted cache lines for memory locations the application knows are dead because they belong to freed memory. This would be interesting to measure. If it's significant, then perhaps the application (or some lower level of the software stack) should be trying more aggressively to reuse freed memory, or perhaps it would be valuable to have hardware support for invalidating cache lines without writing them back. Though I suppose the latter case is problematic since it would make buggy software nondeterministic. At the OS level Linux has madvise(..., MADV_DONTNEED) to clear unwanted pages, but I'm not aware of anything analogous for the CPU cache.

Update Xi Yang pointed me at a paper on exactly this topic: Cooperative Cache Scrubbing.

For modern Java programs, 10 to 60% of DRAM writes are useless, because the data on these lines are dead - the program is guaranteed to never read them again.
That's a lot more than I expected! It would be very interesting to know what these numbers are for C++ or Rust programs. If the numbers hold up, it sounds like it would be worth adding some kind of hardware support to quash these writes. It would be nice to avoid the problem of nondeterministic behavior for buggy software; I wonder if you could have a "zero cache line" instruction that sets the cache line to zero at each level of cache, marks the lines as clean, and writes zeroes to RAM, all more efficiently than a set of non-temporal writes of zero. Actually you might want to be able to do this for a large range of virtual addresses all at once, since programs often free large chunks of memory.

Wednesday, 24 August 2016

Random Thoughts On Rust: crates.io And IDEs

I've always liked the idea of Rust, but to tell the truth until recently I hadn't written much Rust code. Now I've written several thousand lines of Rust code and I have some more informed comments :-). In summary: Rust delivers, with no major surprises, but of course some aspects of Rust are better than I expected, and others worse.

cargo and crates.io

cargo and crates.io are awesome. They're probably no big deal if you've already worked with a platform that has similar infrastructure for distributing and building libraries, but I'm mainly a systems programmer which until now meant C and C++. (This is one note in a Rust theme: systems programmers can have nice things.) Easy packaging and version management encourages modularising and publishing code. Knowing that publishing to crates.io gives you some protection against language/compiler breakage is also a good incentive.

There's been some debate about whether Rust should have a larger standard library ("batteries included"). IMHO that's unnecessary; my main issue with crates.io is discovery. Anyone can claim any unclaimed name and it's sometimes not obvious what the "best" library is for any given task. An "official" directory matching common tasks to blessed libraries would go a very long way. I know from browser development that an ever-growing centrally-supported API surface is a huge burden, so I like the idea of keeping the standard library small and keeping library development decentralised and reasonably independent of language/compiler updates. It's really important to be able to stop supporting an unwanted library, letting its existing users carry on using it without imposing a burden on anyone else. However, it seems likely that in the long term crates.io will accumulate a lot of these orphaned crates, which will make searches increasingly difficult until the discovery problem is addressed.


So far I've been using Eclipse RustDT. It's better than nothing, and good enough to get my work done, but unfortunately not all that good. I've heard that others are better but none are fantastic yet. It's a bit frustrating because in principle Rust's design enables the creation of extraordinarily good IDE support! Unlike C/C++, Rust projects have a standard, sane build system and module structure. Rust is relatively easy to parse and is a lot simpler than C++. Strong static typing means you can do code completion without arcane heuristics. A Rust IDE could generate code for you in many situations, e.g.: generate a skeleton match body covering all cases of a sum type; generate a skeleton trait implementation; one-click #[derive] annotation; automatically add use statements and update Cargo.toml; automatically insert conversion trait calls and type coercions; etc. Rust has quite comprehensive style and naming guidelines that an IDE can enforce and assist with. (I don't like a couple of the standard style decisions --- the way rustfmt sometimes breaks very().long().method().call().chains() into one call per line is galling --- but it's much better to have them than a free-for-all.) Rust is really good at warning about unused cruft up to crate boundaries --- a sane module system at work! --- and one-click support for deleting it all would be great. IDEs should assist with semantic versioning --- letting you know if you've changed stable API but haven't revved the major version. All the usual refactorings are possible, but unlike mainstream languages you can potentially do aggressive code motion without breaking semantics, by leveraging Rust's tight grip on side effects. (More about this in another blog post.)

I guess one Rust feature that makes an IDE's job difficult is type inference. For C and C++ an IDE can get adequate information for code-completion etc by just parsing up to the user's cursor and ignoring the rest of the containing scope (which will often be invalid or non-existent). That approach would not work so well in Rust, because in many cases the types of variables depend on code later in the same function. The IDE will need to deal with partially-known types and try very hard to quickly recover from parse errors so later code in the same function can be parsed. It might be a good idea to track text changes and reuse previous parses of unchanged text instead of trying to reparse it with invalid context. On the other hand, IDEs and type inference have some synergy because the IDE can display inferred types.

Saga Of The Exiles

When I was younger I was a huge fan of Julian May's sci-fi/fantasy Saga Of The Exiles. I've re-read some of it recently to see if it's really all that good. It is, and I highly recommend it. The setting is strong: in the not-far future, humans develop magical psychic powers and join the Milieu, a society of similarly-psychic alien races. This society's misfits are sent through a one-way time portal six million years into Earth's past --- where it turns out to be already occupied by warring alien races with similar powers. May places some deeply interesting characters into this setup, then charts their adventures over the course of the four main books and introduces more major elements. Another book, Intervention, explores the events leading from our present (well, 1980's present) to joining the Milieu, and caps off the Exiles story arc with a truly epic twist.

In my opinion, it's all very well done. May seems to be a polymath; according to Wikipedia she wrote thousands of encyclopedia articles about science, and also helped write a Catholic catechism. Along with swords, psycho-sorcery and sci-fi trappings, her stories are full of convincing geology, flora, and fauna, and she pulls in some theology too.

In addition to the books mentioned above, she wrote a "Milieu trilogy" set after the Intervention. I found these disappointing and would recommend not reading them. The main problem is that the key events in that period have already been outlined in the other books; to give the Milieu trilogy some suspense, she adds new plot elements that don't seem to fit, and then the really important events are given short shrift.

For some reason these books seem to be less well known than contemporaneous opuses such as The Belgariad or The Chronicles of Thomas Covenant. That's a shame. I think they'd be great material for a TV series. You'd need an enormous special-effects budget, but there's lots of material to enthrall audiences (and yes, plenty of sex and violence), and more big and original ideas than the usual "Tolkien with the serial numbers filed off".

Tuesday, 16 August 2016

False Accusations

One of the problems with false or hyperbolic accusations is that they encourage your target to do the very thing you falsely accuse them of doing. E.g., if the company isn't collecting private data but you make people believe they are, then perhaps they might as well collect it. If you actually care about the issue in question, this is likely to be counterproductive to your cause.

I understand the urge to lacerate an opponent and I've been guilty of it myself, but I'll try to keep the above in mind.

Another reason to restrain oneself is given by C. S. Lewis:

“Suppose one reads a story of filthy atrocities in the paper. Then suppose that something turns up suggesting that the story might not be quite true, or not quite so bad as it was made out. Is one's first feeling, 'Thank God, even they aren't quite so bad as that,' or is it a feeling of disappointment, and even a determination to cling to the first story for the sheer pleasure of thinking your enemies are as bad as possible? If it is the second then it is, I am afraid, the first step in a process which, if followed to the end, will make us into devils. You see, one is beginning to wish that black was a little blacker. If we give that wish its head, later on we shall wish to see grey as black, and then to see white itself as black. Finally we shall insist on seeing everything -- God and our friends and ourselves included -- as bad, and not be able to stop doing it: we shall be fixed for ever in a universe of pure hatred.”

Sunday, 7 August 2016

Why I Don't Watch "Game Of Thrones"

(A bit of a followup on my last post...) On Friday night someone was talking about Game Of Thrones and, seeing as I'm a D&D fan and generally enthusiastic about faux-medieval fantasy, I felt obliged to explain why I don't watch it. Not wanting to sound sanctimonious or self-righteous, I explained "I've heard it has more sex and violence than I can handle" ... an answer hopefully winsome but quite true!

A few clarifications for the record... Most of my Christian friends watch it and I don't judge them over it. I know my favourite pastor watches it! It may be fine for them. In this matter I'm probably the weaker brother that Paul talks about in Romans 14.

One thing that turned me off it was an on-set report I read, where the director made it clear the sex was aimed to titillate. I fear I'd end up watching it for the wrong reasons.

If I really cared to participate in the pop-culture phenomenon, I'd read the books. Turns out I don't. (Yes, I know the books are full of sex and violence too, but to some extent they're limited by one's own imagination.) I probably will read the books anyway, because I've heard they're good, but I have a long reading list. Also I prefer to invest my time in long sagas only if I'm confident they end satisfactorily (X-Files, I'm looking at you!) so I should probably hold out until George R. R. Martin gets it done.

Changing Attitudes To Pornography

For most of my life, mainstream culture in the English-speaking West has been highly accepting of pornography. Mass media near-universally portrays its production and consumption as benign, even positive. I read a respectable parenting book advocating that parents deliberately introduce pornography to their sons in a controlled manner. Those of us who label pornography as unhealthy have been marginalised as anti-sex, anti-joy prudes who seek control over others for envious or selfish reasons. Even just avoiding it oneself is viewed as slightly pathological.

Things seem to be changing, in some parts of NZ culture. Over the years the NZ Herald news site has run quite a few stories, on and off, on the damage done by addiction to online porn. It seems to me that these stories have been increasing in frequency. More interestingly, in the old days almost every author felt required to include some disclaimer along the lines of "I'm no prude, but ---". (No-one wants to appear guilty of that greatest of sins!) Now I often don't even see those disclaimers anymore.

I hope we reach the point soon where most of the culture can take the online porn problem seriously. It's difficult to tackle for so many reasons :-(. Technological development is rapidly outpacing the culture's and our brains' ability to adjust; we're still coming to grips with the effects of ubiquitous access to images and video, but VR porn is just around the corner. Some of the groups most opposed to porn --- the traditionally religious, and some decidedly anti-religious feminist groups --- find it difficult to cooperate. (On the flip side, porn proponents contain an unholy alliance of "boys will be boys" conservatism with anything-goes modern libertinism, who have no trouble at all cooperating!) A lot of the opposition to porn has been crippled by an attraction to blanket censorship that is ineffective and has dangers of its own.

I'd like to see more client-side tools to help people control their exposure to pornography. That includes hiding clickbait links to temptation as well as actual pornographic content. I'd like to see better education from parents and in schools about the harmful effects of pornography. I'd like to see, across the cultural spectrum, recognition that the way some men view women as primarily sex objects is a huge problem, and pornography is partly to blame. I think we may be slowly moving in the right direction, at least in some places.

Monday, 1 August 2016

The True Story Of "Amazing Grace"

"Amazing Grace" is a great hymn. We sang it at my wedding, because it's distinctively Christian but non-Christians are still comfortable singing it. The story behind is is also great, and I've heard preachers summarize it a few times: slave trader John Newton converts to Christianity, repudiates slavery, and writes "Amazing Grace" to express his remorse and celebrate God's grace. That summary is technically true, but it doesn't do justice to the story.

Most importantly, that summary obscures the fact that John Newton remained involved in the slave trade for many years after his apparent conversion. It wasn't immediately obvious to him that his occupation was incompatible with his Christian commitments --- no doubt partly because his livelihood depended on it not being obvious. Clearly, though, over time he came to completely repudiate slavery and see his former career as a dire sin. His wretched personal history made him acutely aware of God's grace and made him an influential advocate for abolition.

There are many other interesting details. As a young seaman he was so ill-disciplined he was, incredibly, punished for excessive profanity. For eighteen months he was practically enslaved himself in West Africa. He went through several cycles of drawing near to God and relapsing into terrible sins. It's a long story of a great sinner gradually becoming a great saint.

Friday, 22 July 2016

Further Improving My Personal Digital Security

A few months ago I moved my 2FA secrets (my Github account and three Google accounts) from a phone app to a Yubikey. Recently, somewhat inspired by Daniel Pocock's blog posts about SMS and phone security --- plus other news --- I've decided to reduce the trust in my phone further.

I don't want my phone to be usable in an account-recovery attack, so I've removed it as a recovery option for my Google and Github accounts. To not increase the risk of losing control of those accounts unrecoverably, I bought a second Yubikey as a backup and regenerated 2FA secrets for those accounts onto both Yubikeys. (For both Google and Github, generating 2FA secrets invalidates existing ones, but it's easy enough to load a secret into any number of devices while the QR code for the new secret is visible.) I generated new backup verification codes and printed them without saving them anywhere. (Temporary data for the print job might linger on my laptop storage, though that's encrypted with a decent password. More worrying is that the printer might keep data around... I probably should have copied them down by hand!)

Unfortunately my other really important account --- my online banking account --- is weakly protected by comparison. Westpac's personal-banking system uses simple user-name-and-password logon. There are heuristics to detect "suspicious" transfers, which you need to confirm with a code sent to your phone by SMS. This is quite unsatisfactory, though not unsatisfactory enough to justify the trouble of switching banks (given that generally Westpac would reimburse me for losses due to my account being compromised).

Thursday, 7 July 2016

Ordered Maps For Stable Rust

The canonical ordered-map type for Rust is std::collections::BTreeMap, which looks nice, except that the range methods are marked unstable and so can't be used at all except with the nightly-build Rust toolchain. Those methods are the only way to perform operations like "find first element greater than a given key", so BTreeMap is mostly useless in stable Rust.

This wouldn't be a big deal if crates.io had a good ordered-map library that worked with stable Rust ... but, as far as I can tell, until now it did not. I didn't want to switch to Rust nightly just to use ordered maps, so I solved this problem by forking container-rs's bst crate, modifying it to work on stable Rust (which meant ripping out a bunch of "unstable" annotations, fixing a few places that required unstable "box" syntax, and fixing some test code that depended on unboxed closures), and publishing the result as stable_bst. (Note: I haven't actually gotten around to using it yet, so maybe it's broken, but at least its tests pass.)

So, if you want to use ordered maps with stable Rust, give it a try. bst has a relatively simple implementation and, no doubt, is less efficient than BTreeMap, but it should be comparable to the usual C++ std::map implementations.

Currently it supports only C++-style lower_bound and upper_bound methods for finding elements less/greater than a given key. range methods similar to BTreeMap could easily be added, using a local copy of the unstable standard Bound type. I'm not sure if I'll bother but I'd accept PRs.

Update I realized the lower_bound and upper_bound methods were somewhat useless since they only return forward iterators, so I bit the bullet, implemented the range/range_mut methods, removed lower_bound/upper_bound and the reverse iterators which are superseded by range, and updated crates.io.

FWIW I really like the range API compared to C++-style upper/lower_bound. I always have to think carefully to use the C++ API correctly, whereas the range API is easy to use correctly: you specify upper and lower bounds, each of which can be unbounded, exclusive or inclusive, just like in mathematics. A nice feature of the range API (when implemented correctly!) is that if you happen to specify a lower bound greater than the upper bound, it returns an empty iterator, instead of returning some number of wrong elements --- or crashing exploitably --- as the obvious encoding in C++ would do.

Another somewhat obscure but cool feature of range is that the values for bounds don't have to be exactly the same type as the keys, if you set up traits correctly. ogoodman on github pointed out that in some obscure cases you want range endpoints that can't be expressed as key values. Their example is keys of type (A, B), lexicographically ordered, where B does not have min or max values (e.g., arbitrary-precision integers), and you want a range containing all keys with a specific value for A. With the BTreeMap and stable_bst::TreeMap APIs you can handle this by making the bounds be type B', where B' is B extended with artificial min/max values, and defining traits to order B/B' values appropriately.

Saturday, 2 July 2016

Itanium Zombie Claims Another Victim

Oracle owes HP $3B for (temporarily) dropping support for Itanium CPUs in 2011. This is the latest in a long line of embarrassments caused by that architecture. Its long, sad and very expensive story is under-reported and under-appreciated in the industry, probably because Intel, thanks to its x86 near-monopoly, ended up shrugging it off with no long-lasting ill effects. I'm disappointed about that; market efficiency requires that companies that make such enormous blunders should suffer. Ironically Intel's partners who jumped on the Itanium bandwagon --- HP, SGI, DEC, and even software companies such as Microsoft and Oracle --- ended up suffering a lot more than Intel did. Someone should do a proper retrospective and try to tally up the billions of dollars wasted and the products and companies ruined.

It was all so forseeable, too. I was in graduate school during Itanium development and there was massive skepticism in the CMU CS department that Itanium's explicit ILP would ever work well in the face of the unpredictable runtime behavior of real code. People correctly predicted that the compiler advances required were, in fact, unachievable. Corporate agendas, large budgets, and some over-optimistic academic researchers trumped common sense.

This amusing graph is a fine illustration of the folly of trusting "industry analysts", if any were needed.

Friday, 1 July 2016

rr 4.3.0 Released

I've just released rr 4.3.0. This release doesn't have any major new user-facing features, just a host of small improvements:

  • AVX (i.e. YMM) registers are exposed through gdb.
  • Optimizations for I/O-heavy tracees on btrfs. I highly recommend putting tracee data and the traces on the same btrfs filesystem to take advantage of this.
  • Support for dconf's shared memory usage.
  • Much better support for vfork.
  • Much better support for ptrace. This allows rr to record rr replay.
  • Support for tracees calling setuid.
  • Support for tracees compiled with AddressSanitizer.
  • Support for optimized release builds via cmake -DCMAKE_BUILD_TYPE=Release (thanks to Keno Fischer).
  • Keno Fischer also dived into the guts of rr and did some nice cleanups.
  • As always, syscall support was expanded and many minor bugs fixed.
  • This release has been tested on Ubuntu 16.04 and Fedora 24 (as well as older distros).
  • With the help of Brad Spengler, we got rr working on grsecurity kernels. (The changes to grsecurity only landed a few days ago.)

In this release I've fixed the last known intermittent test failure! Some recent Linux kernels have a regression in performance counter code that very rarely causes some counts to be lost. This regression seems to be fixed in 4.7rc5 which I'm currently running.

Ubuntu 16.04 was released with gdb 7.11.0, which contains a serious regression that makes it very unreliable with rr. The bug is fixed in gdb 7.11.1 which is shipping as an update to 16.04, so make sure to update.

Wednesday, 29 June 2016

Nexus 5X vs Wettest June Hour In Auckland's History

I had lunch with a friend in Newmarket today. I walked there from home, but it was raining so I half-ran there holding my phone and keys under my jacket. Unfortunately when I got there I was only holding my keys :-(. I ran around a bit looking for the phone, couldn't find it, and decided lunch with my friend was more important.

So later I walked home, keeping an eye out for my phone in vain --- during which it was really pouring; activated the Android Device Manager locator; drove to where it said the phone was; looked around (still raining) but couldn't find it and couldn't activate the ring (due to not having a phone!); drove home and activated the "call me when you find this phone" screen. Not long after that a miracle happens: a kid calls me on the phone. Not only has he found it and is willing to wait on the street for fifteen minutes while I drive back to pick it up, but the phone still works despite having been out in the rain for the wettest June hour in Auckland's history. Seriously, on my way home there were torrents of water in the gutters and whole streets were flooded. Congratulations to LG and Google, thanks to God the phone wasn't simply washed away, and thanks to the Grammar boys who found it and waited for me. (The latter I was able to thank directly by giving them my copy of China Mieville's wonderful The City And The City.)

Relearning Debugging With rr

As I've mentioned before, once you have a practical reverse-execution debugger like rr, you need to learn new debugging strategies to exploit its power, and that takes time. (Almost all of your old debugging strategies still work --- they're just wasting your time!) A good example presented itself this morning. A new rr user wanted to stop at a location in JIT-generated code, and modified the JIT compiler to emit an int3 breakpoint instruction at the desired location --- because that's what you'd do with a regular debugger. But with rr there's no need: you can just run past the generation of the code, determine the address of your generated instruction after the fact (by inserting a logging statement at the point where you would have triggered generation of int3, if you must), set a hardware execution breakpoint at that address, and reverse-execute until that location is reached.

One of the best reasons I've heard for not using rr was given by Jeff: "I don't want to forget how to debug on other platforms".

Tuesday, 28 June 2016

Handling Read-Only Shared Memory Usage In rr

One of rr's main limitations is that it can't handle memory being shared between recorded processes and not-recorded processes, because writes by a not-recorded process can't be recorded and replayed at the right moment. This mostly hasn't been a problem in practice. On Linux desktops, the most common cases where this occurs (X, pulseaudio) can be disabled via configuration (and rr does this automatically). However, there is another common case --- dconf --- that isn't easily disabled via configuration. When applications read dconf settings for the first time, the dconf daemon hands them a shared-memory buffer containing a one-byte flag. When those settings change, the flag is set. Whenever an application reads cached dconf settings, it checks this flag to see if it should refetch settings from the daemon. This is very efficient but it causes rr replay to diverge if dconf settings change during recording, because we don't replay that flag change.

Fortunately I've been able to extend rr to handle this. When an application maps the dconf memory, we map that memory into the rr supervisor process as well, and then replace the application mapping with a "shadow mapping" that's only shared between rr and the application. Then rr periodically checks to see whether the dconf memory has changed; if it is, then we copy the changes to the shadow mapping and record that we did so. Essentially we inject rr into the communication from dconf to the application, and forward memory updates in a controlled manner. This seems to work well.

That "periodic check" is performed every time the recorded process completes a traced system call. That means we'll forward memory updates immediately when any blocking system call completes, which is generally what you'd want. If an application busy-waits on an update, we'll never forward it and the application will deadlock, but that could easily be fixed by also checking for updates on a timeout. I'd really like to have a kernel API that lets a process be notified when some other process has modified a chunk of shared memory, but that doesn't seem to exist yet!

This technique would probably work for other cases where shared memory is used as a one-way channel from not-recorded to recorded processes. Using shared memory as a one-way channel from recorded to not-recorded processes already works, trivially. So this leaves shared memory that's read and written from both sides of the recording boundary as the remaining hard (unsupported) case.

Monday, 27 June 2016

Dear Ubuntu, Please Fix Your Debuginfo Packaging

Fedora has a delightfully simple approach to supplying debuginfo for its packages: you run sudo dnf debuginfo-install <original-package-name> and everything you need to debug that code is installed (including sources). It also installs debuginfo for dependent packages.

Ubuntu does not. With Ubuntu, you first have to figure out the name of the debuginfo package to install. This is not easy. Once you've figured that out. you install the package and find that it's mostly useless because it doesn't contain sources. So you try to install sources using sudo apt source ..., but that fails because "You must put some 'source' URIs in your sources.list". Then you look at /etc/apt/sources.list and discover that all the necessary deb-src URLs are there, but commented out by default. (Why? Because otherwise it would make things too easy?) Then you uncomment those URLs and try again, but get the same error. Then you wail and gnash your teeth a bit and figure out you need to run sudo apt-get update before retrying. Finally you've got the source package installed, then you run rr replay (or gdb if you must) and discover that it's not finding those sources. Then, unless you're more stubborn than I am (or just have more time to burn), you give up in despair.

Seriously Ubuntu (or Debian), just copy Red Hat here. Please.

Friday, 24 June 2016

Democracy Is Impressive

Two very surprising things happened in the last few months: US Republicans nominated Donald Trump as their candidate for President, and Britons voted to leave the EU. These events surprised me because they were fiercely opposed by the wealthy, politically powerful elites that are normally portrayed as running the world. I think it's amazing and inspiring that popular uprisings were able to overcome those forces. That fact should be celebrated, and recalled to refute defeatist rhetoric.

Unfortunately I can't celebrate the actual decisions themselves. I think that in the former case the people made a disastrous mistake, and in the latter case that remains to be seen. But the possibility that people will misuse their power is the price of democracy and all human freedom. I think it's a price worth paying.

Thursday, 23 June 2016

PlayCanvas Is Impressive

I've been experimenting on my children with different ways to introduce them to programming. We've tried Stencyl, Scratch, JS/HTML, Python, and CodeAcademy with varying degrees of success. It's difficult because, unlike when I learned to program 30 years ago, it's hard to quickly get results that compare favourably with a vast universe of apps and content they've already been exposed to. Frameworks and engines face a tradeoff between power, flexibility and ease-of-use; if it's too simple then it's impossible to do what you want to do and you may not learn "real programming", but if it's too complex then it may just be too hard to do what you want to do or you won't get results quickly.

Recently I discovered PlayCanvas and so far it looks like the best approach I've seen. It's a Web-based 3D engine containing the ammo.js (Bullet) physics engine, a WebGL renderer, a WYSIWYG editor, and a lot more. It does a lot of things right:

  • Building in a physics engine, renderer and visual editor gives a very high level of abstraction that lets people get impressive results quickly while still being easy to understand (unlike, say, providing an API with 5000 interfaces, one of which does what you want). Stencyl does this, but the other environments I mentioned don't. But Stencyl is only 2D; supporting 3D adds significant power without, apparently, increasing the user burden all that much.
  • Being Web-based is great. There's no installation step, it works on all platforms (I guess), and the docs, tutorials, assets, forkable projects, editor and deployed content are all together on the Web. I suspect having the development platform be the same as the deployment platform helps. (The Stencyl editor is Java but its deployed games are not, so WYS is not always WYG.)
  • Performance is good. The development environment works well on a mid-range Chromebook. Deployed games work on a new-ish Android phone.
  • So far the implementation seems robust. This is really important; system quirks and bugs make learning a lot harder, because novices can't distinguish their own bugs from system bugs.
  • The edit-compile-run cycle is reasonably quick, at least for small projects. Slow edit-compile-run cycles are especially bad for novices who'll be making a lot of mistakes.
  • PlayCanvas is programmable via a JS component model. You write JS components that get imported into the editor and are then attached to scene-graph entities. Components can have typed parameters that appear in the editor, so it's pretty easy to create components reusable by non-programmers. However, for many behaviors (e.g. autonomously-moving objects) you probably need to write code --- which is a good thing. It's a bit harder than Scratch/Stencyl but since you're using JS you have more power and develop more reusable skills, and cargo-culting and tweaking scripts works well. You actually have access to the DOM if you want although mostly you'd stick to the PlayCanvas APIs. It looks like you could ultimately do almost anything you want, e.g. add multiplayer support and voice chat via WebRTC.
  • PlayCanvas has WebVR support though I haven't tried it.
  • It's developed on github and MIT licensed so if something's broken or missing, someone can step in and fix it.

So far I'm very impressed and my child is getting into it.

Friday, 17 June 2016

Managing Vast, Sparse Memory On Linux

For a problem I'm working on, an efficient solution is to use a 32GB array, most of whose entries are unused. Fortunately this is no problem at all on 64-bit Linux using a couple of simple tricks.

The first trick is to allocate the array using the little-known (to me) MAP_NORESERVE option:

p = mmap(nullptr, 32 * 1024 * 1024 * 1024, PROT_READ | PROT_WRITE,
(I thought this was the default on Linux, so I'm not sure why it's needed, but it is with 4.5.5-201.fc23.x86_64 at least. Oddly enough, if I omit that flag and just do two 16GB mmaps, that works too.) Now you can write to that region and only the pages you write to will actually be allocated. Good times.

Now, once in a while I want to clear that memory or just some subsections of it. memset(array, 0, size) is out of the question since it would take a very long time and would also probably cause all those pages to be allocated, killing my process if I'm lucky and the entire system if I'm less lucky.

Fortunately we have madvise(array, size, MADV_DONTNEED)! For a MAP_ANONYMOUS mapping like ours, this delightful system call simply frees all the pages in the range and instructs the kernel to use fresh zero pages next time the memory is read or written. Not only is it theoretically efficient, it's fast in practice. I can touch 10,000 scattered pages in my 32GB array and then zero the entire array with one MADV_DONTNEED in 12ms total.

It would be nice if tricks like these worked for pages with values other than just zeroes. For example, Firefox sometimes needs large allocations filled with 0xFF000000 for graphics buffers, or special bit patterns representing "poisoned" memory for security purposes. I think you could implement something like that using userfaultfd or mapping a file from a custom FUSE filesystem, but they would probably be significantly slower.

Wednesday, 15 June 2016

Nastiness Works

One thing I experienced many times at Mozilla was users pressuring developers with nastiness --- ranging from subtle digs to vitriolic abuse, trying to make you feel guilty and/or trigger an "I'll show you! (by fixing the bug)" response. I know it happens in most open-source projects; I've been guilty of using it myself.

I particularly dislike this tactic because it works on me. It really makes me want to fix bugs. But I also know one shouldn't reward bad behavior, so I feel bad fixing those bugs. Maybe the best I can do is call out the bad behavior, fix the bug, and avoid letting that same person use that tactic again.

Perhaps you're wondering "what's wrong with that tactic if it gets bugs fixed?" Development resources are finite so every bug or feature is competing with others. When you use nastiness to manipulate developers into favouring your bug, you're not improving quality generally, you're stealing attention away from other issues whose proponents didn't stoop to that tactic and making developers a little bit miserable in the process. In fact by undermining rational triage you're probably making quality worse overall.

Monday, 13 June 2016

"Safe C++ Subset" Is Vapourware

In almost every discussion of Rust vs C++, someone makes a comment like:

the subset of C++14 that most people will want to use and the guidelines for its safe use are already well on their way to being defined ... By following the guidelines, which can be verified statically at compile time, the same kind of safeties provided by Rust can be had from C++, and with less annotation effort.
This promise is vapourware. In fact, it's classic vapourware in the sense of "wildly optimistic claim about a future product made to counter a competitor". (Herb Sutter says in comments that this wasn't designed with the goal of "countering a competitor" so I'll take him at his word (though it's used that way by others). Sorry Herb!)

(FWIW the claim quoted above is actually an overstatement of the goals of the C++ Core Guidelines to which it refers, which say "our design is a simpler feature focused on eliminating leaks and dangling only"; Rust provides important additional safety properties such as data-race freedom. But even just the memory safety claim is vapourware.)

To satisfy this claim, we need to see a complete set of statically checkable rules and a plausible argument that a program adhering to these rules cannot exhibit memory safety bugs. Notably, languages that offer memory safety are not just claiming you can write safe programs in the language, nor that there is a static checker that finds most memory safety bugs; they are claiming that code written in that language (or the safe subset thereof) cannot exhibit memory safety bugs.

AFAIK the closest to this C++ gets is the Core Guidelines Lifetimes I and II document, last updated December 2015. It contains only an "informal overview and rationale"; it refers to "Section III, analysis rules (forthcoming this winter)", which apparently has not yet come forth. (I'm pretty sure they didn't mean the New Zealand winter.) The informal overview shows a heavy dependence on alias analysis, which does not inspire confidence because alias analysis is always fragile. The overview leaves open critical questions about even trivial examples. Consider:

unique_ptr<int> p;
void foo(const int& v) {
  p = nullptr;
  cout << v;
void bar() {
  p = make_unique(7);
Obviously this program is unsafe and must be forbidden, but what rule would reject it? The document says
  • In the function body, by default a Pointer parameter param is assumed to be valid for the duration of the function call and not depend on any other parameter, so at the start of the function lset(param) = param (its own lifetime) only.
  • At a call site, by default passing a Pointer to a function requires that the argument’s lset not include anything that could be invalidated by the function.
Clearly the body of foo is OK by those rules. For the call to foo from bar, it depends on what is meant by "anything that could be invalidated by the function". Does that include anything reachable via global variables? Because if it does, then you can't pass anything reachable from a global variable to any function by reference, which is crippling. But if it doesn't, then what rejects this code?

Update Herb points out that example 7.1 covers a similar situation with raw pointers. That example indicates that anything reachable through a global variable cannot be passed by to a function by raw-pointer or reference. That still seems like a crippling limitation to me. You can't, for example, copy-construct anything (indirectly) reachable through a global variable:

unique_ptr<Foo> p;
void bar() {
  p = make_unique<Foo>(...);
  Foo xyz(*p); // Forbidden!

This is not one rogue example that is easily addressed. This example cuts to the heart of the problem, which is that understanding aliasing in the face of functions with potentially unbounded side effects is notoriously difficult. I myself wrote a PhD thesis on the subject, one among hundreds, if not thousands. Designing your language and its libraries from the ground up to deal with these issues has been shown to work, in Rust at least, but I'm deeply skeptical it can be bolted onto C++.


Aren't clang and MSVC already shipping previews of this safe subset? They're implementing static checking rules that no doubt will catch many bugs, which is great. They're nowhere near demonstrating they can catch every memory safety bug.

Aren't you always vulnerable to bugs in the compiler, foreign code, or mistakes in the safety proofs, so you can never reach 100% safety anyway? Yes, but it is important to reduce the amount of trusted code to the minimum. There are ways to use machine-checked proofs to verify that compilation and proof steps do not introduce safety bugs.

Won't you look stupid when Section III is released? Occupational hazard, but that leads me to one more point: even if and when a statically checked, plausibly safe subset is produced, it will take significant experience working with that subset to determine whether it's viable. A subset that rejects core C++ features such as references, or otherwise excludes most existing C++ code, will not be very compelling (as acknowledged in the Lifetimes document: "Our goal is that the false positive rate should be kept at under 10% on average over a large body of code").

Sunday, 12 June 2016

Mt Pirongia

At the end of April we did an overnight tramp up Mt Pirongia. Mt Pirongia is an old volcano southwest of Hamilton, not far from Kawhia on the west coast of the North Island. Its summit is 959m above sea level, most of the mountain is covered in thick bush, and there's a nice new DoC hut near the summit. It's only about two hours drive from Auckland so it's one of the closest hut tramps; I'd been thinking about doing it for years but had heard it was "hard core", and so put it off until my children were older.

On the Saturday we had pretty good weather the whole way. Up to about 700m it was pretty easy going but the central part of the mountain comprises several independent peaks, so there are numerous very steep up-and-down sections, some more climbing than walking, some with chains to help. It's tiring but there's not too much of it. It took us about five hours to get all the way to the hut. The views were spectacular; you can see out to the ocean to the west, and across the Waikato and King Country to the east. Supposedly on a clear day you can see Mt Taranaki and the central volcanic plateau, but it was too hazy for us.

The new hut is good, though it doesn't have the solar-powered LED lighting that many other new huts have. It was lovely to watch the evening descend. Because it was a one-night-only tramp, we brought better food than normal --- steaks, puddings, and bacon and eggs for breakfast. My children introduced the rest of our party to Bang! and everyone had a great time.

On Sunday we were in cloud most of the way down, but it was still lovely and took us about four hours. Overall it was, as I'd heard before, a little "hard core" and probably not the best first-time overnight tramp (the Pinnacles in Coromandel is still the best first-time overnight hut tramp from Auckland, IMHO), but well worth doing.

Whanganui River Journey

Back in April I did the Whanganui River Journey with some friends and family. This is branded as one of New Zealand's "Great Walks", but it's actually five days of canoeing from Taumarunui in the central North Island to Pipiriki, covering 145km. We stayed at campsites for three nights and at the John Coull Hut along the way.

I hadn't done a river trip like this before but I really enjoyed it. Like pretty much everyone else on the river (apart from a few jetboats) we were using open-top Canadian canoes. We were able to carry more gear than we would have while tramping. It's less tiring overall but my arms seldom get such a workout. The scenery is quite different; no vistas from mountain-tops, but seeing gorges and river valleys from the river has its own charm. The river gives you regular challenges as you shoot rapids and risk capsizing. Three of our four canoes had a capsize during the trip, which is actually no big deal, and in retrospect was an experience worth having :-). I discovered that there's a big difference between "container waterproof enough to keep rain out" and "container waterproof while submerged in a fast-flowing river".

Along the way we stopped for a walk to the "Bridge To Nowhere", a big concrete bridge across a valley in the middle of the bush. A road was built to it to support settlement in the area, but was too difficult to maintain so eventually was abandoned and the settlers moved out. Apparently exactly one car ever drove over the bridge. It was unfortunate for the settlers, many of whom were World War I veterans, but Whanganui National Park is a real treasure now.

Our canoe rental company (Blazing Paddles) was good. They picked us up at Pipiriki and drove us and our gear back to base at Taumarunui. That road trip took a couple of hours and was a lot of fun; I talked extensively to the driver, whose family has been in the central North Island area (one of my favourite places) for a hundred years. There are also some spectacular views of the volcanic plateau and the mighty three volcanoes Tongariro, Ngauruhoe and Ruapehu.

The photos here were taken by my friends after my own camera got wet on the first day...

Friday, 10 June 2016

Some Dynamic Measurements Of Firefox On x86-64

This follows up on my previous measurements of static properties of Firefox code on x86-64 with some measurements of dynamic properties obtained by instrumenting code. These are mostly for my own amusement but intuitions about how programs behave at the machine level, grounded in data, have sometimes been unexpectedly useful.

Dynamic properties are highly workload-dependent. Media codecs are more SSE/AVX intensive than regular code so if you do nothing but watch videos you'd expect qualitatively different results than if you just load Web pages. I used a mixed workload that starts Firefox (multi-process enabled, optimized build), loads the NZ Herald, scrolls to the bottom, loads an article with a video, plays the video for several seconds, then quits. It ran for about 30 seconds under rr and executes about 60 billion instructions.

I repeated my register usage result analysis, this time weighted by dynamic execution count and taking into account implicit register usage such as push using rsp. The results differ significantly on whether you count the consecutive iterations of a repeated string instruction (e.g. rep movsb) as a single instruction execution or one instruction execution per iteration, so I show both. Unlike the static graphs, these results for all instructions executed anywhere in the process(es), including JITted code, not just libxul.

  • As expected, registers involved in string instructions get a big boost when you count string instruction repetitions individually. About 7 billion of the 64 billion instruction executions "with string repetitions" are iterations of string instructions. (In practice Intel CPUs can optimize these to execute 64 iterations at a time, under favourable conditions.)
  • As expected, sp is very frequently used once you consider its implicit uses.
  • String instructions aside, the dynamic results don't look very different from the static results. Registers R8 to R11 look a bit more used in this graph, which may be because they tend to be allocated in highly optimized leaf functions, which are more likely to be hot code.

  • The surprising thing about the results for SSE/AVX registers is that they still don't look very different to the static results. Even the bottom 8 registers still aren't frequently used compared to most general-purpose registers, even though I deliberately tried to exercise codec code.
  • I wonder why R5 is the least used bottom-8 register by a significant margin. Maybe these results are dominated by a few hot loops that by chance don't use that register much.

I was also interested in exploring the distribution of instruction execution frequencies:

A dot at position x, y on this graph means that fraction y of all instructions executed at least once is executed at most x times. So, we can see that about 19% of all instructions executed are executed only once. About 42% of instructions are executed at most 10 times. About 85% of instructions are executed at most 1000 times. These results treat consecutive iterations of a string instruction as a single execution. (It's hard to precisely define what it means for an instruction to "be the same" in the presence of dynamic loading and JITted code. I'm assuming that every execution of an instruction at a particular address in a particular address space is an execution of "the same instruction".)

Interestingly, the five most frequently executed instructions are executed about 160M times. Those instructions are for this line, which is simply filling a large buffer with 0xff000000. gcc is generating quite slow code:

132e7b2: cmp    %rax,%rdx
132e7b5: je     132e7d1 
132e7b7: movl   $0xff000000,(%r9,%rax,4)
132e7bf: inc    %rax
132e7c2: jmp    132e7b2 
That's five instructions executed for every four bytes written. This could be done a lot faster in a variety of different ways --- rep stosd or rep stosq would probably get the fast-string optimization, but SSE/AVX might be faster.

Are Dynamic Control-Flow Integrity Schemes Worth Deploying?

Most exploits against C/C++ code today rely on hijacking CPU-level control flow to execute the attacker's code. Researchers have developed schemes to defeat such attacks based on the idea of control flow integrity: characterize a program's "valid control flow", and prevent deviations from valid control flow at run time. There are lots of CFI schemes, employing combinations of static and dynamic techniques. Some of them don't even call themselves CFI, but I don't have a better term for the general definition I'm using here. Phrased in this general way, it includes control-transfer instrumentation (CCFIR etc), pointer obfuscation, shadow stacks, and even DEP and ASLR.

Vendors of C/C++ software need to consider whether to deploy CFI (and if so, which scheme). It's a cost/benefit analysis. The possible benefit is that many bugs may become significantly more difficult --- or even impossible --- to exploit. The costs are complexity and run-time overhead.

A key question when evaluating the benefit is, how difficult will it be for CFI-aware attackers to craft exploits that bypass CFI? That has two sub-questions: how often is it possible to weaponize a memory-safety bug that's exploited via control-flow hijacking today, with an exploit that is permitted by the CFI scheme? And, crucially, will it be possible to package such exploitation techniques so that weaponizing common C/C++ bugs into CFI-proof exploits becomes cheap? A very interesting paper at Oakland this year, and related work by other authors, suggests that the answer to the first sub-question is "very often" and the answer to the second sub-question is "don't bet against it".

Coincidentally, Intel has just unveiled a proposal to add some CFI features to their CPUs. It's a combination of shadow stacks with dynamic checking that the targets of indirect jumps/calls are explicitly marked as valid indirect destinations. Unlike some more precise CFI schemes, you only get one-bit target identification; a given program point is a valid destination for all indirect transfers or none.

So will CFI be worth deploying? It's hard to say. If you're offered a turnkey solution that "just works" with negligible cost, there may be no reason not to use it. However, complexity has a cost, and we've seen that sometimes complex security measures can even backfire. The tail end of Intel's document is rather terrifying; it tries to enumerate the interactions of their CFI feature with all the various execution modes that Intel currently supports, and leaves me with the impression that they're generally heading over the complexity event horizon.

Personally I'm skeptical that CFI will retain value over the long term. The Oakland DOP paper is compelling, and I think we generally have lots of evidence that once an attacker has a memory safety bug to work on, betting against the attacker's ingenuity is a loser's game. In an arms race between dynamic CFI (and its logical extension to dynamic data-flow integrity) and attackers, attackers will probably win, not least because every time you raise the CFI bar you'll pay with increased complexity and overhead. I suggest that if you do deploy CFI, you should do so in a way that lets you pull it out if the cost-benefit equation changes. Baking it into the CPU does not have that property...

One solution, of course, is to reduce the usage of C/C++ by writing code in a language whose static invariants are strong enough to give you CFI, and much stronger forms of integrity, "for free". Thanks to Rust, the old objections that memory-safe languages were slow, tied to run-time support and cost you control over resources don't apply anymore. Let's do it.

Monday, 6 June 2016

How To Track Down Divergence Bugs In rr

Brendan Dolan-Gavitt asked me to write about how I debug rr itself. That's an interesting topic because it's a challenging problem; as I wrote a while ago, you sometimes feel you're in debugging hell so that others can go to heaven.

Brendan was talking about divergence bugs, i.e. bugs where replaying a recorded execution goes "off the rails" because there's some source of nondeterminism that differs between recording and replay. These can be further divided into recording bugs, where nondeterminism wasn't recorded correctly, and replay bugs, where there's enough information in the trace to replay but we replayed incorrectly.

rr saves the values of all general-purpose registers, the program counter, and the conditional branch counter, at most trace events, i.e. signals, context switches, and system calls that don't take the syscall-buffering fast path. We typically detect divergence bugs when replay reaches the execution point of the next event but the replay register/counter state doesn't match the recorded state. This includes cases where a different system call is performed, and most cases where behavior changes during replay produce writes to stdout/stderr that differ from those during recording. Here's an example:

[ERROR /home/kfischer/rr-vanilla/src/Registers.cc:303:maybe_print_reg_mismatch() errno: SUCCESS] rdx 0 != 0xff (replaying vs. recorded)
[FATAL /home/kfischer/rr-vanilla/src/Registers.cc:423:compare_register_files() errno: SUCCESS]
 (task 24036 (rec:24003) at time 1132365)
 -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:4473519809/4473519809)
Launch gdb with
  gdb /home/kfischer/.local/share/rr/latest-trace/mmap_hardlink_23_wine64
and attach to the rr debug server with:
  target remote :24036

In rr, most divergences are detected pretty soon after they are caused. In practice almost any change in control flow causes the conditional-branch counter to diverge. So the first thing I do is look for clues that something unusual happened just before the divergence was detected. One way to do that is to look at the trace leading up to the current point. If a rare system call or signal happened just before we diverged, it's worth looking closely at our handling of that! Another way is to use the emergency debugger mentioned above. This provides a gdbserver interface so you can inspect the state of the diverged process with gdb. Looking at the call stack can show that the process was doing something unusual that needs to be checked, especially if you have application source. Surprisingly often, with this data, I'm able to make some inspired guesses and write a standalone testcase that reproduces the problem, and from there it's usually easy.

When we have a divergence in one or a few register values, such as the issue linked above, that almost always means control flow did not diverge and we can make progress by reasoning backwards to figure out where that bad value must have come from. As you'd hope, rr itself is very useful for this; you can replay right up to the divergence point and reverse-stepi to see where the bad value is set. If reverse-execution is busted for some reason, we can tell rr to singlestep over some region of the trace and dump all register values at each step --- slow, but usually gives you the information you need. These techniques don't work if control flow diverged.

For harder bugs we generally need to be able to reproduce the bug by re-recording instead of just working off a single recorded trace, if we're going to make any progress. Re-recording with the system-call buffering optimization disabled is very helpful; it's a lot slower, but we'll check registers/counters at every single system call and catch divergences earlier. If the bug disappears entirely, we can safely bet it's in the syscall-buffering logic itself and we can easily narrow down which syscall wrapper is responsible. A similar but more aggressive approach which I've used in some desperate situations is to set the scheduler to context-switch every small number of conditional branches, and suppress noop-reschedule elision, so the trace records the register state very frequently ... effective, but often impractically slow.

Another tool in our toolbox is memory checksumming. To catch bugs where a memory location diverges and we don't catch the divergence until much later, rr has command-line options to periodically checksum memory regions and include those checksums in the trace. During replay we check the checksums to see if memory has diverged. For when you're close to the target, you can also dump actual memory contents and compare them. I don't use this facility very much, but when you need it, you really need it.

For bugs where we know we've recorded the correct data but replay goes wrong, we have the ability to record rr replay under rr (and, obviously, replay it), so we can actually use reverse-execution debugging to debug rr's replay code. I haven't used this a lot yet, since I got it working relatively recently, and sometimes it falls over ("and then you have two problems"), but when it works you get the feeling of lightning shooting out of your fingertips.

One general rule that I learned at Mozilla is to be very aggressive about using assertions to check invariants. This helps detect bugs closer to the root cause. Generally rr has assertions enabled all the time; the performance impact is not usually an issue since rr's efficiency (apart from the syscallbuf code) mostly comes down to switching from the application to rr as infrequently as possible. If we're running rr code, we're already losing.

Some of the hardest divergence bugs we've had in rr have been misbehavior in the hardware conditional-branch counter, either induced by OS/VM issues or by actual hardware bugs. These are agony. Definitely study Intel's errata sheets, read kernel source, and test carefully across different CPU/VM configurations.

I hope this helps! I'm always glad to answer questions about rr :-).