Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Thursday 7 July 2016

Ordered Maps For Stable Rust

The canonical ordered-map type for Rust is std::collections::BTreeMap, which looks nice, except that the range methods are marked unstable and so can't be used at all except with the nightly-build Rust toolchain. Those methods are the only way to perform operations like "find first element greater than a given key", so BTreeMap is mostly useless in stable Rust.

This wouldn't be a big deal if crates.io had a good ordered-map library that worked with stable Rust ... but, as far as I can tell, until now it did not. I didn't want to switch to Rust nightly just to use ordered maps, so I solved this problem by forking container-rs's bst crate, modifying it to work on stable Rust (which meant ripping out a bunch of "unstable" annotations, fixing a few places that required unstable "box" syntax, and fixing some test code that depended on unboxed closures), and publishing the result as stable_bst. (Note: I haven't actually gotten around to using it yet, so maybe it's broken, but at least its tests pass.)

So, if you want to use ordered maps with stable Rust, give it a try. bst has a relatively simple implementation and, no doubt, is less efficient than BTreeMap, but it should be comparable to the usual C++ std::map implementations.

Currently it supports only C++-style lower_bound and upper_bound methods for finding elements less/greater than a given key. range methods similar to BTreeMap could easily be added, using a local copy of the unstable standard Bound type. I'm not sure if I'll bother but I'd accept PRs.

Update I realized the lower_bound and upper_bound methods were somewhat useless since they only return forward iterators, so I bit the bullet, implemented the range/range_mut methods, removed lower_bound/upper_bound and the reverse iterators which are superseded by range, and updated crates.io.

FWIW I really like the range API compared to C++-style upper/lower_bound. I always have to think carefully to use the C++ API correctly, whereas the range API is easy to use correctly: you specify upper and lower bounds, each of which can be unbounded, exclusive or inclusive, just like in mathematics. A nice feature of the range API (when implemented correctly!) is that if you happen to specify a lower bound greater than the upper bound, it returns an empty iterator, instead of returning some number of wrong elements --- or crashing exploitably --- as the obvious encoding in C++ would do.

Another somewhat obscure but cool feature of range is that the values for bounds don't have to be exactly the same type as the keys, if you set up traits correctly. ogoodman on github pointed out that in some obscure cases you want range endpoints that can't be expressed as key values. Their example is keys of type (A, B), lexicographically ordered, where B does not have min or max values (e.g., arbitrary-precision integers), and you want a range containing all keys with a specific value for A. With the BTreeMap and stable_bst::TreeMap APIs you can handle this by making the bounds be type B', where B' is B extended with artificial min/max values, and defining traits to order B/B' values appropriately.

Comments

Oliver Goodman
Nice one Robert! Thanks for the tip about the range bounds not needing to be in the (same) ordered set. This is a useful stop-gap while the rust team figure out what they need to do to stabilise BTreeMap, and preferable to switching to the nightly build for anyone who wants to get something published.