Wednesday, 24 October 2007

Abstraction Penalties, Stack Allocation And Ownership Types

Sometimes, cleaning up your code makes it slower even when it shouldn't. For example, you might extract code from a function and make it a separate function. Maybe the compiler should inline your code, but it doesn't, or it does but the result still isn't the same as the original single function. This is an abstraction penalty: you use the programming language to structure your code with additional abstractions, and you pay a performance penalty.

Grouping variables into structs or objects is a common source of abstraction penalties. For example, converting local variables nscoord x, y; to nsPoint pt; may prevent the coordinates from being placed in registers, in some compilers. Or the coordinates may need to be placed in memory if you pass an nsPoint to another function instead of two nscoords.

That aside, there's a much worse abstraction penalty associated with objects in common "safe" languages such as Java or C#: they require heap allocation and garbage collection. (I'm ignoring C# structs for now for reasons that will hopefully become obvious.) And despite what people will tell you, garbage collection is still slower than stack allocation, often by a lot. When I was doing serious Java programming a few years ago, the way to make Java code fast was still to avoid using objects or manually recycle them to avoid allocation and GC costs. And the problem is that a whole lot of language features are tied up with objects --- interfaces, casting, information hiding, code reuse. I suspect one of the biggest performance issues switching from C++ to Java or C# may be that in C++ you can stack-allocate objects, but in Java and C# you can't. Just look at all the places where Gecko uses nsAutoString or nsAutoTArray to good effect.

The problem is of course that in a safe language, you can't just hand out pointers to stack-allocated objects like candy, because someone might store one away in a dark place and then later, long after the method has returned, start using it as a dangling pointer. This cannot be allowed.

What the world needs is a type system that allows references to stack objects to be given out, but restricts their usage so that no-one can use them after the objects are destroyed. Fortunately many such systems exist. A whole family of "region calculi" have been developed, for example. However, I prefer "ownership types" as a more intuitive and generally useful framework; in particular I like Boyapati-style "explicitly parametric ownership types".

These work by adding an "ownership parameter" to each class type. These parameters are a lot like generic type parameters, but they denote the "owner" of the object. The basic rule is that an object cannot outlive its owner. This rule is enforced by the typechecker. Here's an example of how this can work:

class Vector<O@P> {
(O@P[])@this array;
int size;
Vector() { array = new (O@P[100])@this; }
void add(O@P o) { array[size++] = o; }
};
class Iterator<O@P, Q> where Q <= P {
Vector<O@P>@Q inner;
int i;
Iterator(Vector<O@P>@Q inner) { this.inner = inner; }
O@P next() { return inner.array[i++]; }
};
void f() {
C@method c1 = new C@method(), c2 = new C@method();
C@method c3 = g(c1, c2);
}
C@R g<R>(C@R c1, C@R c2) {
Vector@method v = new Vector@method<C@R>();
v.add(c1);
v.add(c2);
h<R>(v);
return new C@R();
}
void h<R,S>(Vector<C@R>@S v) where S <= R {
Iterator@method iter = new Iterator@method<C@R, S>(v);
while (C@R c = iter.next()) { ... }
}

This code creates two objects, puts them into a standard growable vector class, then passes the vector to another function which uses a standard iterator to iterate over the vector. What's cool is that the compiler knows a lot about the lifetimes of these objects. In particular the objects which are @method, i.e., owned by "the current method", cannot outlive the method --- even though we pass these objects to other methods and even store references to them in other objects.

These properties are enforced in two ways. The main constraint is that code can only refer to objects whose owners are in scope. For example, you cannot declare C@R glob; and set glob = c1; in g to "leak" a reference to c1 --- the correct R is not in scope at the global level.

Similarly, an object like Vector that takes ownership parameters can only take parameters that "outlive" the object itself. This guarantees that fields of the object cannot become dangling references. In some cases, such as Iterator, we need to add "outlives" constraints to its ownership parameters; Q <= P would be read "P outlives Q". The semantics of ownership mean that "P outlives method" for all in-scope ownership parameters P.

Given the code above, the compiler could, using only local reasoning,


  • Stack allocate c1, c2, v and iter
  • Allocate space for g's result in f's stack
  • Inline the array object directly into Vector --- the compiler knows that the array object cannot outlive its Vector, and because there is only one assignment, it cannot die before Vector either.

Therefore the above code requires no heap allocations at all. ("Inlining" objects into their containing objects is another important optimization, much like stack allocation, that common safe OO languages do not support.)

It's important to note that although a really good compiler could discover these optimizations in Java, for example, using escape analysis, there are major problems leaving it up to the compiler. The primary problem is that escape analysis is expensive and fragile and compilers don't do it very well. A secondary problem is that if you change your program so that escape analysis no longer works, the only way to tell is that your performance gets mysteriously worse. With explicit type constraints, you get a compile error instead. That's especially useful here because quite often when you create objects in a method you expect them to be method-local, and it would be great for the compiler to check that for you and give you an error if you're wrong (instead of generating code that's slow in Java or crashy in C++).

Another note is that although the annotation burden appears large, the code can be slimmed down considerably using sensible default ownership parameters and other shortcuts such as local type inference:

class Vector<O> { // O contains @P
O[] array; // fields default to @this
int size;
Vector() { array = new O[100]@this; }
void add(O o) { array[size++] = o; }
};
class Iterator<O, Q> where Q <= O@ {
Vector<O>@Q inner;
int i;
Iterator(Vector<O>@Q inner) { this.inner = inner; }
O next() { return inner.array[i++]; }
};
void f() {
// local objects default to @method
var c1 = new C(), c2 = new C();
var c3 = g(c1, c2);
}
C@R g(C@R c1, C@R c2) {
Vector v = new Vector<C@R>();
v.add(c1);
v.add(c2);
h(v);
return new C@R();
}
// implicit type and ownership parameters
void h(Vector v) {
var iter = new Iterator(v); // infer type parameters
while (var c = iter.next()) { ... }
}

There are even more tricks we can pull! For example, we know a @method object can't be accessed by other threads, so we can easily remove locking from it. In a transactional memory system we would know not to log accesses to it. When a method containing a @method object returns, we know the object must die, so we can deterministically run a destructor at that point, giving us the handy C++ pattern of using destructors for general cleanup on method exit. And when an object is about to go out of scope, we can often easily prove that the method's reference to the object is the only outstanding reference ("uniqueness"). In that case we are allowed to safely change the owner of the object to something else (which is never normally allowed)! This may require copying, depending on what optimizations have been performed. For example:

C@R f<R>() {
var c1 = new C@method(), c2 = new C@method();
var c = ... ? c1 : c2;
// may copy the object from our stack to elsewhere
return (C@R)c;
}

Tragically, this stuff doesn't really exist yet in a usable language, nor has anyone proven it to work programming at large scales. Sigh...



12 comments:

  1. Another approach that I'm playing with (for safety reasons, not for performance) in Java is to annotate fields and method parameters with escape info e.g.
    public void xxx(@TakesRef char [] p);
    public void yyy(@DoesNotTakeRef char [] p);
    public class Class1 {
    private final int [] bigBuffer;
    @MayNotStoreRef
    public int [] getBuffer() {
    return bigBuffer;
    }
    }
    and then use a checker to verify that data structures do not "leak".

    ReplyDelete
  2. Robert,
    I'm not an expert in this area, but I think your approach makes the burden on the programmer is indeed too large.
    Its true that sometimes this ownership/region/escape information would be very helpful for the compiler, and in some of those cases the specific code is performance-critical. But it's only in the intersection of those two cases when you'd want to go to the trouble of specifying all the additional detailed information.
    I think a much better workflow would be something along the lines of:
    1) write plain code without any of this extra info
    2) run the compiler
    3) the compiler will prompt with a list of places where it believes this extra info would be most helpful.
    This would be even more useful if used with a profiling compiler (e.g. Hotspot in Java):
    1) write plain code without any of this extra info
    2) comile and run
    3) the profiler will prompt with a list of places where it believes this extra info would be most helpful based on the actual profiled usage of the code.
    Just my $0.02.

    ReplyDelete
  3. The second half of this article is about stack allocation in Java:
    http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html
    "Escape analysis" basically means inferring "@method" lifetime, right? If you could infer it, why wouldn't you? (I don't think having the compiler check my intuition about objects being method-local is worth it.)

    ReplyDelete
  4. Robert O'Callahan24 October 2007 20:55

    Jason: I answered this in my post. The truth is that escape analysis is expensive and fragile, and the escape analyses implemented by real VMs is weak. And then when it fails, your program slows down and it's impossible to tell what's going on.

    ReplyDelete
  5. re: Tragically, this stuff doesn't really exist yet in a usable language, nor has anyone proven it to work programming at large scales.
    While I agree with you on this particular example, if we take "stuff" to mean the whole space of static-analysis tools we find that (1) it is eminently usable, and (2) it is extensively used.
    E.g. Grammatech, Coverity, MS static tools, etc.
    Your particular example may very well be solved next.
    p.s. I'm not affiliated with any of the companies.

    ReplyDelete
  6. Jonathan Allen24 October 2007 21:28

    This seems to be a solution looking for a problem. Memory allocation on stack-based heap like C# is just as cheap as allocating it on the local stack. The cost of deallocating can be highly variable, but if you are using short-lived objects, required for this 'solution', it is likewise cheap.
    Meanwhile you are adding a huge amount of complexity to the language and compiler, complexity that may elminate the meager performance gains you are looking for.
    And you never did expalin the problem with C# value types.

    ReplyDelete
  7. Robert O'Callahan24 October 2007 23:44

    Jonathan: it's just *not* *true* that heap allocation is as cheap as stack allocation. People, notably Andrew Appel, and legions of followers have been pushing that line for years but they are wrong in practice. There are lots of reasons for this. For example, the analysis in the original Appel paper actually says that heap allocation is faster than stack allocation if you're willing to make the heap 7 or 8 times larger than the live objects. In real life no-one will do that.
    But the real data point is that you ask people who are optimizing real Java programs whether they see a big win when they manually recycle objects instead of letting the GC handle everything, even on state-of-the-art JVMs --- and they do.
    And then there are the object inlining optimizations, which no amount of intelligent GC can replicate.
    > Meanwhile you are adding a huge amount of
    > complexity to the language and compiler
    I'm adding some complexity to the language, but I'm actually reducing the complexity of an optimizing compiler, because I don't need it to do any global static analysis.
    C# value types can't be used to do the example I gave. For one thing, you can't store references to them in other objects --- only copies.
    Kefer: static analysis is a very very large field, containing many things that are very practical and many things that are completely impractical...

    ReplyDelete
  8. I'm looking forward to comparing Apples with Apples when we get quality implementations of C++0x's GC in a range of compilers.
    The two things I feel that C++ really has over other languages (acknowledging there are many places it falls behind) is stack based object lifetime management (which gives us things like RAII, and, as you say, certain allocation and deallocation efficiencies) and an advanced template system, that other static languages don't come close to. I think C++ gets a bad rap sometimes, which is unfair because it does do some things very well.
    As for GC - I once read a wise person who said "The problem isn't garbage collection, the problem is garbage creation", which makes sense I think - he was talking specifically about speeding up production Smalltalk code, and that culling unnecessary object creation was usually all that was needed. I imagine that is just as true in Java as well many other GC-capable systems.

    ReplyDelete
  9. Cyclone (cyclone.thelanguage.org) has an alternative slant on all of this based on region calculi. The notational overhead is quite a bit lower than what you sketched here.

    ReplyDelete
  10. Robert O'Callahan25 October 2007 06:57

    Hi Greg!
    Someone should study the comparison of regions with ownership types. There are obviously huge similarities, but also interesting differences. One interesting difference is that ownership types can give you a hierarchy (because one object can own others). Another interesting difference is that ownership types can be used for other things ... in fact, I think I'm the only person pushing ownership types for memory management.

    ReplyDelete
  11. Your assumptions are all wrong. Also, you don't need annotations for this - compilers can actually infer almost all of it.
    Fortunately, that's what java will be doing. It already infers 'escape detection' (the idea that a certain object is never ever released into any code outside the method scope), starting in java 1.6. From java 1.7 and up, escape detection will be used to stack-declare objects. Java 1.6 already uses escape detection, amongst other things, to eliminate void locks (locks on objects that don't escape are useless locks).
    However, this isn't nearly as much of a speed up as you seem to suggest. You should read up on 'eden' garbage collection systems.
    This is basically how it works:
    95%+ of all objects live for VERY short periods of time, unless you use outdated and these days BACKWARDS rules such as 'create a pool of objects and re-use them'. Please don't do that. Use immutables instead, java can aggressively inline those. The 'eden' system has one heap page (which should be in cache almost all the time) where ALL new objects are created. Everytime eden is full, each object in eden that is *STILL* accessible from some place is copied over to the real heap. This is maybe only 5% or so of all objects there. Then, the pointer into eden is reset to 0 and the process starts all over again. The heap page itself is never malloced or freed - it's always in use by java and rewritten over and over.
    End result: Most objects you just create as a helper live their entire lifetime in cached pages. This is just as good as a stack variable, really.
    I don't get why escape detection is 'expensive'. It's a purely compile-time construct; once the compiler is done, that's it. No time is spent during runtime to figure out if things escape. The compiler itself realizes a certain object is never ever stored in any place that outlives its method.

    ReplyDelete
  12. Robert O'Callahan25 October 2007 20:13

    Reinier: you clearly haven't read what I wrote. And why don't you use the term "copying collector" like the rest of the world?

    ReplyDelete