Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Tuesday 23 February 2016

Rewrite Everything In Rust

I just read Dan Kaminsky's post about the glibc DNS vulnerability and its terrifying implications. Unfortunately it's just one of many, many, many critical software vulnerabilities that have made computer security a joke.

It's no secret that we have the technology to prevent most of these bugs. We have programming languages that practically guarantee important classes of bugs don't happen. The problem is that so much of our software doesn't use these languages. Until recently, there were good excuses for that; "safe" programming languages have generally been unsuitable for systems programming because they don't give you complete control over resources, and they require complex runtime support that doesn't fit in certain contexts (e.g. kernels).

Rust is changing all that. We now have a language with desirable safety properties that offers the control you need for systems programming and does not impose a runtime. Its growing community shows that people enjoy programming in Rust. Servo shows that large, complex Rust applications can perform well.

For the good of the Internet, and in fact humanity, we need to migrate our software from C/C++ to Rust (or something better) as quickly as possible. Here are some steps we need to take:

  • Use Rust. Using Rust instead of C/C++ makes code safer. Using Rust instead of any other language grows the Rust community, making it easier for other people to use Rust.
  • Rewrite code in Rust. Starting with our most critical infrastructure, rewrite C/C++ code to use Rust instead.
  • Extend Rust's guarantees. We can extend the classes of bugs that Rust prevents. For example, we should try to make Rust release builds check for integer overflow by default.
  • Verify "unsafe" Rust code. Sometimes the Rust type system is not strong enough to let you prove to the compiler that your code is safe, so you have to mark code blocks "unsafe". With tool support, you could instead generate a mathematical proof that your code is safe, checked by the compiler. (See Rustbelt.)
  • Make Rust implementations safer. Bugs in the compiler can invalidate the language's guarantees. We know how to build compilers that generate, along with the compiled code, a proof that the code maintains the language guarantees --- a proof that can be checked by a simple, trusted proof checker. Part of this would involve proving properties of the Rust language itself.

Of course, the language doesn't have to be Rust, but Rust is the best candidate I know of at this time.

This is a huge amount of work, but consider the enormous ongoing costs --- direct and indirect --- of the vulnerabilities that Rust would have prevented.

Comments

Anonymous
you go first :-) But seriously, I wonder why so many new projects are started using the old, known unsafe, languages? Is it inertia? Fear of the unknown? The newer languages have just as many problems, just a different type? Dunno. Glad I don't still have to teach C++ any more anyway, - Colin
Robert
It's probably a combination of factors --- inertia, fear of the unknown, fewer libraries, fear of having a smaller pool of potential contributors... Most of these fears will be mitigated the more people write code in Rust. Of course Mozilla is working on rewriting browser code in Rust. Of course, rewriting a huge amount of code is daunting. But as you say, for new projects, it shouldn't be such an issue.
Anonymous
Some of us have to build for systems outside of the Linux/OSX/Windows universe. For example, we have C++ compilers for AIX but no Rust compiler. So, C++ it is.
Corey
Good ol' Blub paradox? A lot of C++ programmers I talk to dismiss Rust as "another trendy language," "Why would you ever learn that?" etc. They think C++ can already do everything Rust can do, "What's the big deal about Rust? C++ has smart pointers, C++17 will add that feature," etc. etc. The C++ programmer argues that C++ supports safe programming, and if you can't program safely in C++, then you just suck, so Rust isn't needed (seriously, when I took a jab at C++, my question was, what parts of C++ am I supposed to use and which are forbidden? How am I supposed to navigate the practices of "good" C++ programming without it being a full time job?). It's kind of funny how most programmers think they're the "good programmer," invincible, and won't make mistakes, while everyone else just sucks (and really, this isn't a flaw of C++ programmers, but of human nature). But I think one thing they miss is that Rust takes the burden off the programmer to "program correctly", and puts that as the compiler's job to enforce. This means even us mortal programmers can write safe code.
Brian
Ocaml is type and memory safe as long as you avoid the Marshal library. It still hasn't taken the world by storm even though it has been around for a few decades. Guaranteed safety just isn't valued.
Brian
Ocaml is type and memory safe as long as you avoid the Marshal library. It still hasn't taken the world by storm even though it has been around for a few decades. Guaranteed safety just isn't valued.
Robert
In fact there's no shortage of languages with guaranteed safety that are massively used in practice. Java, C#, Javascript and Python are just a few. (They vary in the strength of that guarantee but they're all reasonably sound.) The problem is that those Languages, and OCaml, don't give you complete control over resources, and require complex runtime support. Therefore you can't use them in a kernel and you can't use them as a library you'd link into programs in a different language using standard mechanisms. When was the last time someone wrote an OCaml library that you would want to write Java and C# bindings for?
jan Tepo
Rust is hard. No, seriously. It takes time and effort to adjust your thinking from C to Rust. And many developers (me) are too lazy to do that. So they just keep writing more and more code in C.
mayankleoboy1
Why not make it a rule that any new code added to gecko should be written in rust? Nothing says to the world that you believe in a language better than using it yourself in production code.
Robert
A lot of the code being added to Gecko has complicated interactions with lots of Gecko C++ code. That makes it hard to write in Rust. But yes, for new code added to Gecko we should --- and are --- asking whether it can be written in Rust.
Anonymous
Geeze, can you say installed base and ROI. Not to mention programming resources, because the world is just chock full of Rust programmers.
Anonymous
Go to - https://www.rust-lang.org/ Edit the given example by removing comments ... get compilation error and think why just why
Robert
I think that try-it-out box is currently entirely broken.
Jason Dusek
Yeah, I see it too. So weird. Maybe the build product from the one with the comment is cached?
Anonymous
It is working when I try now. I've used the online editor a lot to test things, and I've enjoyed using Rust a lot so far. It has a bit of a learning curve.
Anonymous
Write everything in GO !! Starting with a simple tour www.golang.org
Jason Dusek
It's not clear how you'd do systems programming in a language like that, or how Go would provide libraries that other systems -- Postgres, Python, Nginx -- could load and use. Systems programming is one part doing things right and another part helping others to do so. ("Write everything in Java" is the rallying cry of an earlier generation; and it's about as realistic.)
Robert
Yeah, Go just isn't the right language for this, for the reasons I alluded to in my second paragraph.
Anonymous
I'm all for abandoning C++ for something better, but I think Nim is a much more suitable candidate for the job. Not only is a better language as a whole (check it out if you don't believe me), but it compiles to C++, so C and C++ can be freely reused with it. So with Nim, you don't have to rewrite everything, but instead reuse everything and write your new code in it.
Robert
Nim lacks Rust's ownership type system. The implications of that are too big to write in a blog comment, but basically in Rust you can build much more powerful statically-checked abstractions for safe memory management, parallelism, etc.
Anonymous
rvalues and unique pointers are in Nim's imminent future. Both are functionally borrowed from C++ and can have better compile time errors making it feel much like Rust in this regard. (currently you can hack things around with shallowCopy or ref-s of seq-s, but it is indeed not very helpful) Alas, the language is still in active development and essential features are yet to be implemented. :) Rust is much more mature (because of its much much larger user base). The reason I support Nim if over Rust are its much more powerful metaprogramming capabilities. Basically whatever piece of code you ever saw in any language, that made you go "Wow! This is so cool!" is (or will soon be) possible to write in Nim, with a quite likely big performance improvement.
Robert
Rust's ownership type system is more powerful than rvalues and unique pointers; safe borrowing is also essential. E.g. Rust's type system can be used to prevent iterator invalidation, but C++'s rvalues and unique pointers can't. Rust's metaprogramming is also pretty great.
Jason Dusek
Maybe when Nim develops stronger safety guarantees, we can rewrite some system libraries, filesystems or VPN servers in it. There is no shortage of work to be done here. And since Nim, Rust (and Swift) all have such clean handling of native code, there's no reason that using one limits us when we want to use the other.
Anonymous
Citations needed: " Servo shows that large, complex Rust applications can perform well."
Steve Klabnik
> There is essentially no webpage out there that it cannot get through at multiple hundreds of frames-per-second https://air.mozilla.org/bay-area-rust-meetup-february-2016/
kirbyfan64sos
Not trying to be rude... But isn't this what people said about C++ and Java and a slew of other languages? "Write everything in Java! It's awesome!" I mean, Rust has cool ideas, but modern C++ is strong in its own right.
Robert
No. Java doesn't give you complete control over resources and requires complex runtime support. C++ gives you those things, but not safety properties.
Germán Diago
@Corey There are many reasons why C++ (I talk about C++ not C) is still used. I am a long-term C++ user. Rust looks good to me. But I invested a lot of time to learn C++. Though, that is not the only reason I am not leaving it: 1. The C++ commitee has a body of the best experts in the world and is very active. 2. it is widely used. 3. Many of the features available in Rust for safety can be reached, though not in the language through *reasonable* discipline, static analysis tools and more. For you to get an idea, take a look at these talks and you will see that currently C++ is targeting already subsets + static analysis that target type-safe, bounds-safe and lifetime-safe profiles. Add C compatibility and wide availability and you will understand that, as a tool, it is very useful. Here you can get a link to two very interesting videos about the topic of writing good and safe c++: you will see it is not so difficult. Tools are coming to support these guidelines as well. https://www.youtube.com/watch?v=hEx5DNLWGgA https://www.youtube.com/watch?v=1OEu9C51K2A
Unknown
Modern C++ has added a lot, but does it have a real module system yet instead of one that is founded upon textual inclusion, and has anybody actually committed to adopting it? That alone is almost worth staying off C++, but it's not as if there aren't other reasons if you wanted to look for more.
Germán Diago
There is a modules proposal already in the works for at least a couple of years and two experimental implementations: Microsoft Compiler and Clang.
Robert
It remains to be seen whether the CCG will guarantee memory safety. Last I checked, it had gaping soundness holes related to aliasing. (Note: most of the complexity and innovation in Rust is about controlling aliasing.) When all the soundness holes are closed, then we'll have to evaluate whether it's usable to program in. At the very least, Rust is years ahead here. Even if those hold up, you have to consider whether rewriting your C++ code to fit the safe subset is actually worthwhile, compared to rewriting in Rust which gives you additional benefits. It's certain that any truly sound subset of C++ will be rather different from the C++ code that people currently write. What I'm looking for from Herb Sutter is not just a set of good guidelines and tools to check for them, but also a proof --- or at least an argument --- that if the tool finds no errors in a piece of code then that code cannot be the source of memory safety bugs.
Germán Diago
Did you see the video? It *can* be done. See Herb Sutter's video and demos. There is a subset, that, with the help of analysis is *guaranteed* to be type, bounds- and lifetime-safe. Which is what you are asking for. We all know Rust was designed for that from the ground up, noone will negate that. But I think you did not see the video as per your comment. It is a bit long, but Herb's video is exactly about this topic. Bonus: when you use all these profiles: bounds-, type- and memory-safe, you can opt-in to disable these features in a per-block or per-statement basis via attributes *only* deactivating the profile you want. So you can have a piece of code where you know that you will violate type-safety via a cast but will still check for the rest of the rules. It is not an all-or-nothing thing like other languages where you just have safe/unsafe. In this regard it will be superior, which is much to say taking into account the compatibility constraints of C++.
Robert
I read the paper when it came out. There were gaping holes related to the treatment of aliasing. AFAIK no new paper has been produced since.
Robert
Maybe you could point me to the latest version of the write-up containing the soundness argument?
Anonymous
> Add C compatibility and wide availability and you will understand that, as a tool, it is very useful. Rust has a zero-overhead C FFI; it is compatible with C. Rust is not a syntactic superset of C, but that gains very little. C calling into idiomatic C++ is just as complicated as C calling into Rust, because good C++/Rust APIs use features C simply doesn't have. C++ has a slight advantage calling into C, because the C++ compiler can parse C's headers, but there are external tools for doing that (rust-bindgen), and even C++ doesn't interface with C libraries exactly the same way it does native C++ ones, because there are still those `extern "C" {}` blocks (this will become even worse if C++ gets modules). Providing an idiomatic wrapper API for a C libraries, of course, is done the same way. Not having C's syntax is a strength anyway. For example, in C++, the syntactic validity of a statement can depend on the result of a template metaprogram, which itself will depend on type information declared earlier. This means that one does not simply parse a C++ program alone, one must do parsing and type resolution interleaved. Clang is an incredible engineering feat, and Rustc can get that good with less effort, and can surpass it with the same amount.
Germán Diago
Latest modules proposal: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0142r0.pdf. Fresh from last week or so.
Dan L
A memory safe language is a necessary but insufficient condition to "do it right": http://blog.acolyer.org/2016/02/23/not-quite-so-broken-tls/
Robert
Of course. As it happens, Rust provides a lot more than just memory safety. For example the type system lets you create abstractions for parallelism that statically prevent data races. And as I mentioned in the post, Rust can and should be extended to provide even more guarantees.
REALHIPHOPINYOURLIFE
Maybe if Rust had Algol/C style syntax people would be more willing. The fn -> () -< fn stuff turns some C and C++ people off.
Anonymous
I've never understood this; Rust's syntax *is* C-like. Notice the curly brackets-based scoping, the parenthesis-based function invocation, the fact that `-<` never appears anywhere in the syntax, the way `while` and `if` are lifted almost verbatim from C, `struct`s are lifted almost verbatim (albeit somewhat extended), `enum`s are also lifted from C (albeit extensively extended into a whole sum-type implementation), the comment syntax, a ton of operators (`&` reference, `*` dereference, `>>` and `<<` for shift, `&` and `|` for bitwise compare, `&&` and `||` for boolean compare, `.` for member access, `%` for remainder, `!` for invert, `'` for characters and `"` for strings, `[]` for array access, `=` for assignment and `==` for comparison, and while they also share several others I guess it doesn't count of they're centuries-old math notation), the flow control statements `return`, `break`, and `continue`, and the identifier naming rules. There are other things Rust steals from C, like most of the underlying memory and execution model, but you were just talking about the syntax. It also borrows extensively from C++ specifically, but you said "Algol/C style syntax" and I don't know Algol.
Anonymous
I've never understood this; Rust's syntax *is* C-like. Notice the curly brackets-based scoping, the parenthesis-based function invocation, the fact that `-<` never appears anywhere in the syntax, the way `while` and `if` are lifted almost verbatim from C, `struct`s are lifted almost verbatim (albeit somewhat extended), `enum`s are also lifted from C (albeit extensively extended into a whole sum-type implementation), the comment syntax, a ton of operators (`&` reference, `*` dereference, `>>` and `<<` for shift, `&` and `|` for bitwise compare, `&&` and `||` for boolean compare, `.` for member access, `%` for remainder, `!` for invert, `'` for characters and `"` for strings, `[]` for array access, `=` for assignment and `==` for comparison, and while they also share several others I guess it doesn't count of they're centuries-old math notation), the flow control statements `return`, `break`, and `continue`, and the identifier naming rules. There are other things Rust steals from C, like most of the underlying memory and execution model, but you were just talking about the syntax. It also borrows extensively from C++ specifically, but you said "Algol/C style syntax" and I don't know Algol.
Anonymous
There's no doubt in my mind that Rust is a better language than C++. C++ has accumulated too much cruft over the years to be as simple and accessible as Rust, while not being safer. It's not a fair comparison, but that's often the case when a fresh new technology threatens an established one. At the same time, it is to be expected, since Rust was *designed* to be better than C++, with all of C++'s faults in mind. However, safety isn't the only important aspect of systems programming languages. C set the standards for object file formats, which in turn inspired many module systems, and finally package managers which give us fancy ways of copying libraries from one place to another. The C object format is at the root of most dependency hell problems, where a program needs two different versions of a library to run and can't load them both into the same process. What can Rust do to alleviate that pain ? I looked at Cargo, but it seems as prone to DH as all other package managers out there. If you could guarantee that two distinct dependencies will *always* compile together, *that* would be a killer feature ! Many languages have strong guarantees and type systems, but very few get that part right (none that I know of, at least)
Anonymous
Rust uses name mangling to alleviate this problem. It can't help when you're bringing in external C-based dependencies, but pulling in two versions of the same pure-Rust dependency Just Works. Such is life when aiming to be C-compatible.
Anonymous
Yes, but what if I pull two versions of a library and try to import a function from one of them, how does Rust know which one I mean ? Since they both share a common namespace, and the function likely hasn't been renamed from one version to another, there's no possible way of knowing. From what you wrote and the Cargo manual, it seems that transitive dependencies are always handled correctly, which is a good start. However, solving dependency hell means curing all DH-related headaches, not just the most common. Otherwise, you can be sure that someone, somewhere, at some point, will curse your module system for not keeping its promises ;-) PS: conflicting C libraries are indeed trickier to include without causing linking problems, but if you mangle all the symbols of a C library relative to its dependent Rust one, then you can pretty much achieve the same result, can't you (I think I read this described somewhere as "linker trickery") ? It's a bit more invasive, but, as you said, such is life when aiming to be C-compatible :-D
Anonymous
I think we chip away at things one step at a time. I am interested in DNS servers and caches using Rust. I like DJB's work but installing it on OS X is a huge pain.
Anonymous
This was my thinking also. DJB's work is great and could be a good starting point for a Rust based implementation.
Anonymous
I know it is a cheap shot, but I have to say it: a blog post written by author of rather large project written in C++ (rr).
Robert
Sure, but rr started a long time before Rust was ready.
Anonymous
And in terms of rewriting everything to Rust … do you plan to rewrite rr?
Robert
Not at this stage, no. rr is low priority for rewriting in a safe language because it has a small user base and doesn't deal with untrusted input. I'd like to write new rr-related code in Rust if I can, though.
Anonymous
Writing something is a great thing. But rewriting is quite difficult. Isn't it? For me, yes. Thanks for sharing. Top Tips!! For more visit: http://www.paperrewriter.com/