Monday, 6 February 2017

What Rust Can Do That Other Languages Can't, In Six Short Lines

struct X {
  y: Y
}
impl X {
  fn y(&self) -> &Y { &self.y }
}

This defines an aggregate type containing a field y of type Y directly (not as a separate heap object). Then we define a getter method that returns the field by reference. Crucially, the Rust compiler will verify that all callers of the getter prevent the returned reference from outliving the X object. It compiles down to what you'd expect a C compiler to produce (pointer addition) with no space or time overhead.

As far as I know, no other remotely mainstream language can express this, for various reasons. C/C++ can't do it because the compiler does not perform the lifetime checks. (The C++ Core Guidelines lifetime checking proposal proposes such checks, but it's incomplete and hasn't received any updates for over a year.) Most other languages simply prevent you from giving away an interior reference, or require y to refer to a distinct heap object from the X.

This is my go-to example every time someone suggests that modern C++ is just as suitable for safe systems programming as Rust.

Update As I half-expected, people did turn up a couple of non-toy languages that can handle this example. D has some special-casing for this case, though its existing safety checks are limited and easily circumvented accidentally or deliberately. (Those checks are in the process of being expanded, though it's hard to say where D will end up.) Go can also do this, because its GC supports interior pointers. That's nice, though you're still buying into the fundamental tradeoffs of GC: some combination of increased memory usage, reduced throughput, and pauses. Plus, relying on GC means it's nearly impossible to handle the "replace a C library" use case.

27 comments:

  1. I guess the closest thing in C++ would be std::shared_ptr with the aliasing constructor. But it would be a lot more work to write, and would add some runtime costs!

    But you seem to say (I don't know Rust -- yet) that Rust enforces this lifetime constraint at compile time, right? Nice!

    ReplyDelete
    Replies
    1. Yeah, nice!

      Yeah, making 'y' a shared_ptr member here would force it to be a separate allocation. It would also mean the getter function incurs the cost of a thread-safe addref and release. It also means X and Y can't live in read-only memory.

      Delete
    2. "separate allocation"
      Just to be clear, I was referring to the aliasing ability of std::shared_ptr, where the shared_ptr seems to point at 'y' when dereferenced, but the reference count is actually tied to 'x'. So there is no separate allocation happening, 'y' is contained as a straight 'Y' within 'X'.
      (Better explanations there: http://www.codesynthesis.com/~boris/blog/2012/04/25/shared-ptr-aliasing-constructor/ )

      However all the other bad stuff is still there: addrefs&releases, reference count object that has to exist somewhere, no ROM objects...

      Delete
    3. Ah I see. Thanks for pointing that out. That's cool.

      Though it does point out that std::shared_ptr has to be two words, which sucks. I guess it always was but I just never thought it about it.

      Delete
  2. So basically this?

    ```
    class X
    attr_accessor :y
    end
    ```

    ReplyDelete
    Replies
    1. "containing a field y of type Y directly"

      Your code defines an instance variable that holds a Ruby VALUE, which is a reference, and a function that returns that reference. The content of the VALUE will be heap allocated.

      Delete
  3. And a full compilable example in Crystal:

    ```
    class Y
    end

    class X
    def initialize(y : Y)
    @y = y
    end

    def y
    @y
    end
    end

    y = Y.new
    x = X.new(y: y)
    puts x.y
    ```

    ReplyDelete
    Replies
    1. That makes 'y' use a separate heap allocation, which I explicitly said we don't want.

      Delete
    2. Not only that, Crystal has GC under the hood, which mounts the cost (but, to be fair, I don't know if it collects all of the objects or just some of the allocations are GCed).

      Delete
  4. These checks indeed are nice.
    A few things (facts):
    - We have gazillion of lines/projects in C++. One of the important reasons why C++ got popular is that it had near full backwards compat. with C. Rust is breaking that.
    - Support for architectures and tooling has decades to catch up.
    - "Compilation Provider" (Sony, Microsoft, Apple, Google, ...) barely currently provide working C and C++ toolchains. Shipping Rust, too, would be a huge investment.
    - There still is no support by a big "Compilation Provider".
    - Focus on better C++ static analysis is to a degree driven by Rust

    Personal opinion:
    - Rust is a great research project.
    - I think the Rust community has a HUGE problem. You outline exactly this with your blog. This one feature is nice. Most people I know who today write C++/System programming are senior and rarely trip on lifetime problems. It is to me by far not a as big problem as you make it out to be.
    - Rust has a huge potential to get junior devs into System programming because they do not be as proficient. So you should rather outline how easy System programming with Rust is.
    - Rust may have potential in parallelism. To me it seems however as though the section is not yet cooked through. If Rust can do parallelism straight-forward, effective and correct, THIS is the thing you should be advocating.

    ReplyDelete
    Replies
    1. I agree with much of what you say. Just a couple of replies:

      By building on LLVM Rust is actually leveraging a lot of the investment in modern C++ compilers. And Rust, at the levels above LLVM, is significantly less complicated than C++. So the situation is not as daunting as you may think.

      I think "most people I know ... rarely trip on lifetime problems" undersells the seriousness of those problems when they arise. These are serious security issues. Mozilla's browser developers are definitely elite level and they are excited about this stuff.

      It's certainly true that Rust's "parallelism without fear" story is strong. I'll write about that.

      Delete
    2. I believe the usecase of this that Manishearth described in his blog post http://manishearth.github.io/blog/2015/05/03/where-rust-really-shines/ is where no sane C++ senior could do this kind of pointer sharing without ownership problems (and thus would probably fall back to copying the memory with new heap allocations, or probably could ‘optimize’ it with shared_ptr or other smart pointer, but then would still have runtime checks with are nonexistent in Rust).

      Delete
    3. You don't think that it's a problem that every piece of widely deployed c/c++ infrastructure is effectively a door that only the government and mob can open, due to the current economics of exploit development?

      Delete
  5. In Java, this code won't produce a separate heap allocation for Y:
    class X {
    private Y y;
    }

    In most cases the heap is already pre-allocated, so creating a X object, won't result in two heap allocations.

    ReplyDelete
    Replies
    1. Those are two separate objects on the heap. That's what I mean by "separate heap allocation".

      Delete
  6. Or, in C++:
    struct X {
    Y y;
    };

    why would I need a getter for a reference if I can just refer to the member itself?

    ReplyDelete
    Replies
    1. This doesn't prevent any reference to y outliving X.

      Delete
    2. In Rust you also can do that:

      struct X {
      pub y: Y
      }

      However that is not the point. The point is that Rust will refuse to compile if returned reference would live longer than object itself. Other example would be:

      struct Matrix2x2([T; 4]);

      impl Matrix2x2 {
      fn get(&self, col: usize, row: usize) -> &T {
      self.0[col * 2 + row];
      }
      }

      In this (less trivial) example Rust would refuse to compile if you would write something like:

      let mut t: &Type;
      {
      t = matrix.get(1,1);
      }
      println!("{}", *t);

      Where it is perfectly valid in C++. Other popular example (that I often use):

      int main() {
      std::vector vec = {1,2,3};

      int& ref = vec[0];

      vec.push_back(4);

      std::cout << ref;
      return 0;
      }

      Will compile and even maybe will work, while:

      fn main() {
      let mut v = vec![1,2,3];

      let r = &r[0];

      v.push(4);

      println!("{}", r);
      }

      Will refuse to compile.

      Delete
  7. This comment has been removed by the author.

    ReplyDelete
  8. With the specified requirements (safety, no double heap allocation, being able to make a reference to an inner object), Go can do that too.

    (of course, Go also has GC, so it does have its limitations as a systems programming language)

    ReplyDelete
    Replies
    1. Yes, that's true. I'll update the post.

      Delete
  9. Nice example! I also like this one as something you can do in modern C++ and (rightfully) can't in Rust:

    #include
    #include

    int main() {
    std::vector xs = {1};
    auto& x = xs[0];
    xs.push_back(2);
    std::cout << x << std::endl;
    }

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. There was a paper at OOPSLA 2013 on a compiler-supported type-safe subset of C++, called Ironclad C++, which did (dynamic) lifetime checking to support such idioms.

    ReplyDelete
  12. I began to imagine the Haskell version of this, but it ended up being totally nonsensical because Haskell doesn't have "references" with a limited lifetime (since it has GC, and also just copies around as it needs).

    ReplyDelete
  13. Maybe I'm dense, but it is not apparent to me why this set of guarantees are so important. Could you elaborate on this?

    ReplyDelete
    Replies
    1. "Getter" methods that provide access to interior parts of an object are common and very useful.

      It's often a lot more efficient to return a reference (a pointer) to an interior part than copy it.

      However, C and C++ programs often have "use after free bugs" where the reference is used after the underlying object has been deleted. These bugs often create security issues and should be prevented.

      Delete