Sunday, 21 January 2018

Neal Stephenson's "Seveneves" (Mild Spoilers)

There's much discussion of orbital mechanics, disguised as a story. The rest isn't as good.

OK, actually I rather enjoyed it, but only because I'm a sucker for apocalyptic fiction and hard-ish science, and I gave immense credit for the chutzpah of his opening sentence, in which the moon explodes for no reason.

I found his treatment of religion more annoying than usual for sci-fi. His atheist wish-fulfillment fantasy "then everyone realized there's no God" is par for the course. Projecting thousands of years of human development without belief in God recurring, and with no other apparent solution to the meaning of life, is sloppy but also usual. What really grates is the ending, which reveals that — surprise! — people do care about having a supernatural purpose and, oddly, a powerful cabal has found one but they're keeping it secret. It reminded me of Contact where after relentlessly bashing religious rubes, at the very end Sagan reveals that the universe has been designed by, if not God, something seriously God-like. I find their lack of faith in lack of faith disturbing.

Wednesday, 17 January 2018

Long-Term Consequences Of Spectre And Its Mitigations

The dust is settling on the initial wave of responses to Spectre and Meltdown. Meltdown was relatively simple to deal with; we can consider it fixed. Spectre is much more difficult and has far-reaching consequences for the software ecosystem.

The community is treating Spectre as two different issues, "variant 1" involving code speculatively executed after a conditional branch, and "variant 2" involving code speculatively executed via an indirect branch whose predicted destination is attacker-controlled. I wish these had better names, but c'est la vie.

Spectre variant 1 mitigations

Proposals for mitigating variant 1 have emerged from Webkit, the Linux kernel, and Microsoft. The former two propose similar ideas: masking array indices so that even speculative array loads can't load out-of-bounds. MSVC takes a different approach, introducing LFENCE instructions to block speculative execution when the load address appears to be guarded by a range check. Unfortunately Microsoft says

It is important to note that there are limits to the analysis that MSVC and compilers in general can perform when attempting to identify instances of variant 1. As such, there is no guarantee that all possible instances of variant 1 will be instrumented under /Qspectre.
This seems to be a great weakness, as developers won't know whether this mitigation is actually effective on their code.

The Webkit and Linux kernel approaches have the virtue of being predictable, but at the cost of requiring manual code changes. The fundamental problem is that in C/C++ the compiler generally does not know with certainty the array length associated with an array lookup, thus the masking code must be introduced manually. Webkit goes further and adds protection against speculative loads guarded by dynamic type checks, but again this must be done manually in many cases since C/C++ have no built-in tagged union type.

I think "safe" languages like Rust should generalize the idea behind Webkit's mitigations: require that speculatively executed code adhere to the memory safety constraints imposed by the type system. This would make Spectre variant 1 a lot harder to exploit. It would subsume every variant 1 mitigation I've seen so far, and could be automatic for safe code. Unsafe Rust code would need to be updated.

Having said that, there could be variant-1 attacks that don't circumvent the type system, that none of these mitigations would block. Consider a browser running JS code:

let x = bigArray[iframeElem.contentWindow.someProperty];
Conceivably that could get compiled to some mix of JIT code and C++ that does
  if (iframeElemOrigin == selfDocumentOrigin) {
    index = ... get someProperty ...
    x = bigArray[index];
  } else {
    ... error ...
  }
The speculatively executed code violates no type system invariants, but could leak the value of the property across origins. This example suggests that complete protection against Spectre variant 1 will require draconian mitigations, either pervasive and expensive code instrumentation or deep (and probably error-prone) analysis.

Spectre variant 2 mitigations

There are two approaches here. One is microcode and silicon changes to CPUs to enable flushing and/or disabling of indirect branch predictors. The other is "retpolines" — replace indirect branches with an instruction sequence that doesn't trigger the indirect branch predictor. (More precisely, that doesn't use the BTB; the RSB prediction is used instead, but its prediction is directed to a safe destination address.) Apparently the Linux community is advising all compilers and assembly writers to avoid all indirect branches on Intel even in user-space. This means, for example, that we should update rr's handwritten assembly to avoid indirect branches. On the other hand, Microsoft is not giving such advice and apparently is not planning to introduce retpoline support in MSVC. I don't know why this difference is occurring, but it seems like a problem.

Assuming the Linux community advice is followed, things get even more complicated. Future CPUs can be secure against variant 2 without requiring retpolines. We will want to avoid retpolines on those CPUs for performance reasons. Also, Intel's future CET control-flow-integrity hardware will not work with retpolines, so we'll want to turn retpolines off for security! So software will need to determine at run-time whether retpolines should be used. JITs and handwritten assembly will need to add code to do that. This is going to be a burden on lots of software developers for a very long time.

Security/performance tradeoffs

There is now a significant performance penalty for running untrusted code. If you know for sure there is no malicious code running in your (virtual) machine you can turn off these mitigations and get significant performance wins. This wasn't really true before. (Unikernels reaped some performance benefits but created too many other problems to be generally useful.) Inventorying the entire collection of software running in your VM to verify that it's all trusted may be difficult in practice and reduces defense-in-depth ... but no doubt people will be tempted to do it.

We could see increased interest in source-based distributions like Gentoo. Recompiling your software stack to include just the mitigations that you need could bring performance benefits.

Javascript implications

The isolation boundary between Javascript and a browser content process' native code is not visible to the CPU, which makes hardware mitigations difficult to use for JS — and any other system running code in the same process with different levels of trust. It's hard to say what the immediate implications of this are, but I think it makes "one site per process" policies in browsers more appealing in the long term, at least as an option to deploy in case some future difficult-to-mitigate vulnerability hits. Right now browsers are trying to keep the problem manageable by making it difficult for JS to extract information from the timing channel (by limiting timer resolution and disabling features like SharedArrayBuffer that can be used to implement high-resolution timers), but this unfortunately limits the power of Web applications compared to native applications. For example, as long as it lasts we can't run idiomatic parallel Rust code in browsers via WebAssembly :-(. Also I suspect in the medium term attackers will find other ways to read the timing channel that will be less feasible to disable.

I think it would be a grave mistake to simply give up on mixing code with different trust labels in the same address space. Apart from having to redesign lot of software, that would set a hard lower bound on the cost of transitioning between trust zones. It would be much better if hardware mitigations can be designed to be usable within a single address space.

Other attacks

Perhaps the biggest question is whether we are seeing just the start of a flood of serious attacks based on Spectre-like ideas. I think it's entirely possible, and if so, then dealing with those attacks piecemeal as they surface is going to be incredibly expensive and painful. There is even a possibility that the cost of mitigations will compound as mitigations interfere with one another and fewer and fewer people are capable of understanding what's going on. Therefore I hope and pray that people in positions of power — CPU vendors, big software vendors, etc — work together to come up with comprehensive, preventative fixes that simply rule out these classes of attacks, and don't let themselves be entirely consumed by demands for immediate responses to zero-day vulnerabilities. I applaud the sentiment of RISC-V's statement to this end, self-serving as it is.

Sunday, 14 January 2018

Captain Sonar

Captain Sonar is a very interesting board game when played in "real-time" mode. It's basically a complicated upgrade of "Battleships", played by two teams each representing the crew of a submarine, each with an assigned role, each submarine trying to hunt down and destroy the other. In real-time mode, each team can make moves as quickly as they're able to carry them out. A team that can make moves more quickly then their opponents has a large advantage. That makes the game more stressful than any other board game I've played, since there is constant pressure to act quickly and minimize time taken to think ... but under such pressure it's easy to make mistakes that will damage your own submarine or make you lose track of your opponents.

The stress makes some people dislike the game, which is understandable, but other people love it. If you enjoy this sort of thing, it makes for an intense, shared experience which magnifies the fun. It's notable that after each real-time game I've seen, the entire group has spent at least ten minutes talking over the game and reliving the highlights. This doesn't happen as much in other games with this group.

It seems that a lot of the skill in the game is about each player focusing their attention on just the information needed for their role, blocking out all the other activity, while still coordinating with their team members as needed and perhaps overhearing information from the other team if relevant to their role, all under pressure. It's a very interesting exercise from that point of view. It might be quite a good activity for "team building", although as mentioned some people are going to hate it.

For certain kinds of jobs, perhaps introduce Captain Sonar as part of the interview process :-).

One of the problems with the game is that it's easy for a mistake made by one side to disadvantage the other side, such as when a captain announces a move but records a different move. When my team wins, I have uncomfortable doubts that we might have won because we made a mistake. We mitigate this problem by having two people operate as referees, each checking the moves made by one team, but even that doesn't fully eliminate the problem. Adding some software to the game might possibly help here.

Tuesday, 9 January 2018

Hooray For cargo build --all-targets

Rust 1.23 just came out and the release notes failed to mention what is, for me, by far its best new feature: cargo build --all-targets and cargo check --all-targets work properly. For the first time, we have a single command that really builds all targets — examples, tests, etc — in a single pass. Likewise we finally have a single command that checks all that code. This has improved our build times and made cargo check much more useful. I'm surprised there isn't more fuss about it.

Monday, 8 January 2018

The Fight For Patent-Unencumbered Media Codecs Is Nearly Won

Apple joining the Alliance for Open Media is a really big deal. Now all the most powerful tech companies — Google, Microsoft, Apple, Mozilla, Facebook, Amazon, Intel, AMD, ARM, Nvidia — plus content providers like Netflix and Hulu are on board. I guess there's still no guarantee Apple products will support AV1, but it would seem pointless for Apple to join AOM if they're not going to use it: apparently AOM membership obliges Apple to provide a royalty-free license to any "essential patents" it holds for AV1 usage.

It seems that the only thing that can stop AOM and AV1 eclipsing patent-encumbered codecs like HEVC is patent-infringement lawsuits (probably from HEVC-associated entities). However, the AOM Patent License makes that difficult. Under that license, the AOM members and contributors grant rights to use their patents royalty-free to anyone using an AV1 implementation — but your rights terminate if you sue anyone else for patent infringement for using AV1. (It's a little more complicated than that — read the license — but that's the idea.) It's safe to assume AOM members do hold some essential patents covering AV1, so every company has to choose between being able to use AV1, and suing AV1 users. They won't be able to do both. Assuming AV1 is broadly adopted, in practice that will mean choosing between making products that work with video, or being a patent troll. No doubt some companies will try the latter path, but the AOM members have deep pockets and every incentive to crush the trolls.

Opus (audio) has been around for a while now, uses a similar license, and AFAIK no patent attacks are hanging over it.

Xiph, Mozilla, Google and others have been fighting against patent-encumbered media for a long time. Mozilla joined the fight about 11 years ago, and lately it has not been a cause célèbre, being eclipsed by other issues. Regardless, this is still an important victory. Thanks to everyone who worked so hard for it for so long, and special thanks to the HEVC patent holders, whose greed gave free-codec proponents a huge boost.

Sunday, 7 January 2018

Ancient Browser-Wars History: MD5-Hashed Posts Declassified

2007-2008 was an interesting time for Mozilla. In the market, Firefox was doing well, advancing steadily against IE. On the technical front we were doing poorly. Webkit was outpacing us in performance and rapid feature development. Gecko was saddled with design mistakes and technical debt, and Webkit captured the mindshare of open-source contributors. We knew Google was working on a Webkit-based browser which would probably solve Webkit's market-share problems. I was very concerned and, for a while, held the opinion that Mozilla should try to ditch Gecko and move everything to Webkit. For me to say so loudly would have caused serious damage, so I only told a few people. In public, I defended Gecko from unfair attacks but was careful not to contradict my overall judgement.

I wasn't the only one to be pessimistic about Gecko. Inside Mozilla, under the rubric of "Mozilla 2.0", we thrashed around for considerable time trying to come up with short-cuts to reducing our technical debt, such as investments in automatic refactoring tools. Outside Mozilla, competitors expected to rapidly outpace us in engine development.

As it turned out, we were all mostly wrong. We did not find any silver bullets, but just by hard work Gecko mostly kept up, to an extent that surprised our competitors. Weaknesses in Webkit — some expedient shortcuts taken to boost performance or win points on specific compatibility tests, but also plain technical debt — became apparent over time. Chrome boosted Webkit, but Apple/Google friction also caused problems that eventually resulted in the Blink fork. The reaction to Firefox 57 shows that Gecko is still at least competitive today, even after the enormous distraction of Mozilla's failed bet on FirefoxOS.

One lesson here is even insiders can be overly pessimistic about the prospects of an old codebase; dedicated, talented staff working over the long haul can do wonders, and during that time your competitors will have a chance to develop their own problems.

Another lesson: in 2007-2008 I was overly focused on toppling IE (and Flash and WPF), and thought having all the open-source browsers sharing a single engine implementation wouldn't be a big problem for the Web. I've changed my mind completely; the more code engines share, the more de facto standardization of bugs we would see, so having genuinely separate implementations is very important.

I'm very grateful to Brendan and others for disregarding my opinions and not letting me lead Mozilla down the wrong path. It would have been a disaster for everyone.

To let off steam, and leave a paper trail for the future, I wrote four blog posts during 2007-2008 describing some of my thoughts, and published their MD5 hashes. The aftermath of the successful Firefox 57 release seems like an appropriate time to harmlessly declassify those posts. Please keep in mind that my opinions have changed.

  1. January 21, 2007: declassified
  2. December 1, 2007: declassified
  3. June 5, 2008: declassified
  4. September 7, 2008: declassified

On Keeping Secrets

Once upon a time I was at a dinner at a computer science conference. At that time the existence of Chrome was a deeply guarded secret; I knew of it, but I was sworn to secrecy. Out of the blue, one of my dinner companions turned to me and asked "is Google working on a browser?"

This was a terrible dilemma. I could not answer "no" or "I don't know"; Christians mustn't lie. "Yes" would have betrayed my commitment. Refusing to answer would obviously amount to a positive answer, as would any obvious attempt to dodge the question ("hey I think that's Donald Knuth over there!").

I can't remember exactly what I said, but it was something evasive, and I remember feeling it was not satisfactory. I spent a lot of time later thinking about what I should have said, and what I should say or do if a similar situation arises again. Perhaps a good answer would have been: "aren't you asking the wrong person?" Alternatively, go for a high-commitment distraction, perhaps a cleverly triggered app that self-dials a phone call. "You're going into labour? I'll be right there!" (Note: not really, this would also be a deception.) It's worth being prepared.

One thing I really enjoyed about working at Mozilla was that we didn't have many secrets to keep. Most of the secrets I had to protect were about other companies. Minimizing one's secrecy burden generally seems like a good idea, although I can't eliminate it because it's often helpful to other people for them to be able to share secrets with me in confidence.

Update The situation for Christians has some nuance.

Saturday, 6 January 2018

Meltdown/Spectre Needs Better Disclosure

There's far too much confusion around these vulnerabilities. One problem is naming: "Meltdown" is OK, but "Spectre" is being used to refer to a whole family of information leaks triggered by speculative execution, two specific examples of which are highlighted in the paper, which happen to be similar to two examples shown by Project Zero, but we don't have good names for those examples so people are calling them all kinds of things, e.g. "variant 1" and "variant 2", which isn't very helpful. Also the Spectre paper describes user-space-only attacks but Project Zero introduced attacks against the kernel and hypervisor; when someone says "we've blocked Spectre" it's not at all clear what they've done. It would have been better to have specific names for different issues that are going to be addressed separately, and a distinct name for the whole family.

We have Intel saying they're releasing microcode updates that fix everything, Google and Amazon say they've fixed everything in their clouds, but there are still efforts under way to fix various Spectre issues in the Linux kernel so obviously we're still some way away from the complete stack being fully protected, especially against Spectre "variant 1" both in the kernel and user-space. Google says that recompiling everything with retpolines blocks "variant 2", but people on LKML who seem to be in the loop with Intel say that retpolines aren't a reliable mitigation on Skylake.

Some confusion is understandable given the accelerated disclosure schedule, but I hope this gets sorted out soon. It's important for the the CPU vendors and the cloud vendors to say exactly what mitigations they have deployed, what attacks they are not mitigating, and what parts of the problem they expect their downstream customers to take responsibility for.

Another thing that has to happen: brains trusts inside the disclosure zone need to take a step back from desperate attempts to mitigate the known exploits, and figure out a long term plan for dealing with side-channel attacks leaking privileged data through supposedly-hidden hardware state. The Spectre paper portrays their attacks as the tip of an iceberg, and I suspect the authors are right. Blocking specific attacks one by one with expensive mitigations may not be sustainable — is definitely not sustainable for products that can't be on a rapid update cycle. It just got harder to write secure code, and on our current course it is going to keep getting harder. The Risc-V statement about this is self-serving but the right sentiment.