Monday, 13 June 2016

"Safe C++ Subset" Is Vapourware

In almost every discussion of Rust vs C++, someone makes a comment like:

the subset of C++14 that most people will want to use and the guidelines for its safe use are already well on their way to being defined ... By following the guidelines, which can be verified statically at compile time, the same kind of safeties provided by Rust can be had from C++, and with less annotation effort.
This promise is vapourware. In fact, it's classic vapourware in the sense of "wildly optimistic claim about a future product made to counter a competitor". (Herb Sutter says in comments that this wasn't designed with the goal of "countering a competitor" so I'll take him at his word (though it's used that way by others). Sorry Herb!)

(FWIW the claim quoted above is actually an overstatement of the goals of the C++ Core Guidelines to which it refers, which say "our design is a simpler feature focused on eliminating leaks and dangling only"; Rust provides important additional safety properties such as data-race freedom. But even just the memory safety claim is vapourware.)

To satisfy this claim, we need to see a complete set of statically checkable rules and a plausible argument that a program adhering to these rules cannot exhibit memory safety bugs. Notably, languages that offer memory safety are not just claiming you can write safe programs in the language, nor that there is a static checker that finds most memory safety bugs; they are claiming that code written in that language (or the safe subset thereof) cannot exhibit memory safety bugs.

AFAIK the closest to this C++ gets is the Core Guidelines Lifetimes I and II document, last updated December 2015. It contains only an "informal overview and rationale"; it refers to "Section III, analysis rules (forthcoming this winter)", which apparently has not yet come forth. (I'm pretty sure they didn't mean the New Zealand winter.) The informal overview shows a heavy dependence on alias analysis, which does not inspire confidence because alias analysis is always fragile. The overview leaves open critical questions about even trivial examples. Consider:

unique_ptr<int> p;
void foo(const int& v) {
  p = nullptr;
  cout << v;
void bar() {
  p = make_unique(7);
Obviously this program is unsafe and must be forbidden, but what rule would reject it? The document says
  • In the function body, by default a Pointer parameter param is assumed to be valid for the duration of the function call and not depend on any other parameter, so at the start of the function lset(param) = param (its own lifetime) only.
  • At a call site, by default passing a Pointer to a function requires that the argument’s lset not include anything that could be invalidated by the function.
Clearly the body of foo is OK by those rules. For the call to foo from bar, it depends on what is meant by "anything that could be invalidated by the function". Does that include anything reachable via global variables? Because if it does, then you can't pass anything reachable from a global variable to any function by reference, which is crippling. But if it doesn't, then what rejects this code?

Update Herb points out that example 7.1 covers a similar situation with raw pointers. That example indicates that anything reachable through a global variable cannot be passed by to a function by raw-pointer or reference. That still seems like a crippling limitation to me. You can't, for example, copy-construct anything (indirectly) reachable through a global variable:

unique_ptr<Foo> p;
void bar() {
  p = make_unique<Foo>(...);
  Foo xyz(*p); // Forbidden!

This is not one rogue example that is easily addressed. This example cuts to the heart of the problem, which is that understanding aliasing in the face of functions with potentially unbounded side effects is notoriously difficult. I myself wrote a PhD thesis on the subject, one among hundreds, if not thousands. Designing your language and its libraries from the ground up to deal with these issues has been shown to work, in Rust at least, but I'm deeply skeptical it can be bolted onto C++.


Aren't clang and MSVC already shipping previews of this safe subset? They're implementing static checking rules that no doubt will catch many bugs, which is great. They're nowhere near demonstrating they can catch every memory safety bug.

Aren't you always vulnerable to bugs in the compiler, foreign code, or mistakes in the safety proofs, so you can never reach 100% safety anyway? Yes, but it is important to reduce the amount of trusted code to the minimum. There are ways to use machine-checked proofs to verify that compilation and proof steps do not introduce safety bugs.

Won't you look stupid when Section III is released? Occupational hazard, but that leads me to one more point: even if and when a statically checked, plausibly safe subset is produced, it will take significant experience working with that subset to determine whether it's viable. A subset that rejects core C++ features such as references, or otherwise excludes most existing C++ code, will not be very compelling (as acknowledged in the Lifetimes document: "Our goal is that the false positive rate should be kept at under 10% on average over a large body of code").


  1. This matches my experience. MSVC's checker doesn't handle shared_ptr and the work being done on the clang checker seems stalled. I've had an open issue on how shared_ptr will be handled since February and there hasn't been much indication that there's a workable solution.

  2. Interesting point about aliasing being hard. There was a discussion on swift-users about aliasing issues with "inout" parameters, and someone pointed to a very long design document, that also looks like a bunch of voodoo, basically promising "less unsafe" behavior.

  3. Interesting point about aliasing being hard. There was a discussion on swift-users about aliasing issues with "inout" parameters, and someone pointed to a very long design document, that also looks like a bunch of voodoo, basically promising "less unsafe" behavior.

  4. Note that the code example doesn't compile as-is with Clang 3.8 and libc++, as the make_unique call fails to automatically deduce the template parameter. make_unique makes it work. Good example!

  5. Oh, i see, the < and > were eaten away by HTML parsing. I meant make_unique<int> , I suppose that's what happened to your snippet.

  6. Author here (of the Lifetimes paper you're mentioning):

    It is a work in progress, and I expected to get more done over the winter, but C++17 is happening: Things got very busy over the past 6-9 months as ISO C++ is now at the "feature freeze" phase of its cycle which happens next week, when we're closing the feature set of C++17 and sending it out for its primary international comment ballot over the summer. This phrase always causes extra work in the run-up months approaching feature freeze as everyone collectively works on what can make the cut for this release. Because I lead ISO C++ organizationally (as well as getting some of my own proposals into C++17), and of course Bjarne is heavily involved in its evolution technically, C++17 has had to take priority so as to get the standard out on tome, and it meant we've been distracted from other important things, including this. I plan to pick Lifetimes up again this summer.

    The last comment in the post is absolutely correct that experience is needed to establish how well our approach works. That also takes time, but it's essential. So part of my work over the next year will be applying it to more and more existing real-world code; to paraphrase an NBA expression, "real code don't lie."

    BTW, I like Rust (though I don't use it); there are lots of good languages. I'm not aware that Bjarne or I ever said anything about Rust, and this work wasn't designed with Rust in mind, or to compete with it; we're just working on making C++ better, and this is part of starting to surface results from work that I've been spending time on over the past decade. I hope that improving one language isn't viewed as a threat to another; in any case that's not a motivation here. FWIW, several of the Rust designers approached us about our Lifetimes design after the talks and again this month, and I think we're going to get together over the summer to chat (after this next standards meeting is over and we all get to recover for a few weeks).

    1. Thanks for being gracious. I've updated my post to avoid impugning your motives.

  7. P.S.: Oh, and that code example is covered by the rule you cite:

    "At a call site, by default passing a Pointer to a function requires that the argument’s lset not include anything that could be invalidated by the function."

    The call to foo(*p) is passing a reference (which is a a non-owning Pointer) whose lset is {p'}, and p' can be invalidated by writing to p. Because p is global and writeable it could be modified by foo(), and so the call foo(*p) would be diagnosed as an error because foo() could invalidate the reference parameter.

    (My paper doesn't explicitly cover raw references and for simplicity mentions only raw pointers, leaving the references to be treated analogously, but you can see this covered by what is in the paper if you just change the parameter to a * instead of &, and change the call to foo(p.get()) instead of foo(*p). -- Same example, just passing a raw pointer instead of using a reference, and now it's explicitly covered by the rules in the paper.)

    1. Thanks for that clarification. I've updated my post.

      So you're taking the approach that anything reachable from a global variable cannot be passed by reference to a function. This seems pretty severe to me since by "reachable" we must include "indirectly reachable". Talking about this in terms of references also helps show how severe this is, since it means you can't for example copy data reachable through a global variable using copy-constructors.

    2. Almost right: Anything *owned* by a *mutable* global variable could have the bug you just showed. If it's not an owner, or not mutable, the problem doesn't arise. And of course mutable global variables are problematic for many reasons, so we should all be discouraging those (e.g., if there must be a widely shared object, encapsulate it)...

      I agree the rule will likely be noisy, especially in existing code. My overall goal is to annotate as little as possible, and for the annotations you do write for legacy code to be mostly of the "trust me / suppress this warning" variety. Then you can make statements of the form "type- and memory-safe except for where rules are explicitly suppressed" which is the guarantee I'm aiming for, so that if a safety error does occur we've greatly reduced the surface area that needs to be checked or debugged by a human, and made all unsafe code greppable because it's annotated.

    3. Actually I think this means you can't call nonstatic member functions on objects owned via mutable global variables. Which means you might as well ban ownership via mutable global variables completely.

      I think I see some other severe problems too but I guess I should just wait for Lifetimes III to come out.

  8. I observed essentially the same problem with shared_ptr and wrote about it a few months ago:

    It's even more serious with shared_ptr than the global variables problem, and unless I'm missing something huge I think it kind of cripples the whole project with no way I can see to repair it.