Sunday, 19 January 2020

Static Customization Of Function Signatures In Rust

Sometimes I have a big function that does a lot, and in new code I need to do almost the same thing, but slightly differently. Often the best approach is to call the same function but add parameters (often described as "options") to select specific variations. This can get ugly when the function has optional outputs — results that are only produced when certain options were passed in — because typically there is the possibility of an error when code looks for an output (e.g. unwraps a Rust Option) at a site that did not request it. It would be great if the compiler could check that you only use an output when you passed in the option to enable it. Fortunately, some simple Rust coding patterns let us achieve this.

Here's an example simplified from some Pernosco code. We have a function that pretty-prints a data structure at a specific moment in time, which takes an optional parameter specifying another time to render the change in values between the two times. If that optional parameter is specified, the function returns an additional result — whether or not there was any difference in the two values. The naive signature would look something like this:

fn print(&self, moment: Moment, other: Option<Moment>)
-> (String, Option<Difference>) {
  ...
  let difference = other.map(|moment| ...);
  ...
  (..., difference)
}
...
let (string1, difference1) = value.print(moment, None);
let (string2, difference2) = value.print(moment, Some(other_moment));
println!("{:?}", difference1.unwrap()); // PANIC
println!("{:?}", difference2.unwrap());
It is possible to misuse this function by passing None in other and then expecting to find a meaningful value in the second part of the result pair. We'd probably catch it in tests, but it would be good to catch it at compile time. There is also a small efficiency issue: passing None for other is a bit less efficient than calling a customized version of print that has been optimized to remove other and the Difference result.

Here's a way to avoid those problems:

trait PrintOptionalMoment {
  type PrintOptionalDifference;
  fn map<F>(&self, closure: F) -> Self::PrintOptionalDifference
     where F: FnOnce(Moment) -> Difference;
}
impl PrintOptionalMoment for () {
  type PrintOptionalDifference = ();
  fn map<F>(&self, closure: F) -> Self::PrintOptionalDifference
     where F: FnOnce(Moment) -> Difference {}
}
impl PrintOptionalMoment for Moment {
  type PrintOptionalDifference = Difference;
  fn map<F>(&self, closure: F) -> Self::PrintOptionalDifference
     where F: FnOnce(Moment) -> Difference { closure(*self) }
}
fn print<Opt: PrintOptionalMoment>(&self, moment: Moment, other: Opt)
-> (String, Opt::PrintOptionalDifference) {
  ...
  let difference = other.map(|moment| ...);
  ...
  (..., difference)
}
let (string1, ()) = value.print(moment, ());
let (string2, difference2) = value.print(moment, other_moment);
println!("{:?}", difference2);

Rust playground link.

This cleans up the call sites nicely. When you don't pass other, the "difference" result has type (), so you can't misuse it or cause a panic trying to unwrap it. When you do pass other, the "difference" result is not an Option, so you don't need to unwrap it. The implementation of print is basically unchanged, but now Rust will generate two versions of the function, and the version that doesn't take other should be optimized about as well as a handwritten function that removed other. (Unlike in C++, in Rust a () value does not take any space in a struct or tuple.)

If you're not familiar with Rust, the intuition here is that we define a trait PrintOptionalMoment that we use to mean "a type that is either nothing, or a Moment", and we declare that the "void" type () and the type Moment both satisfy PrintOptionalMoment. Then we make print generic over type Opt, which can be either of those. The PrintOptionalMoment trait defines an associated type PrintOptionalDifference which is the result type associated with each Opt that satisfies PrintOptionalMoment.

This approach easily extends to cover more complicated relationships between options and input and output types. In some situations it might be more trouble than it's worth, or the generated code duplication is undesirable, but I think it's a good tool to have.

Friday, 3 January 2020

Updating Pernosco To Rust Futures 0.3

The Pernosco debugger engine is written in Rust and makes extensive use of async code. We had been using futures-preview 0.2; sooner or later we had to update to "new futures" 0.3, and I thought the sooner we did it the easier it would be, so we just did it. The changes were quite extensive:

103 files changed, 3610 insertions(+), 3865 deletions(-)
That took about five days of actual work. The changes were not just mechanical; here are a few thoughts about the process.

The biggest change is that Future and Stream now have a single Output/Item instead of Item and Error, so if you need errors, you have to use an explicit Result. Fixing that was tedious, but I think it's a clear improvement. It encouraged me to reconsider whether we really needed to return errors at all in places where the error type was (), and determine that many of those results can in fact be infallible. Also we had places where the error type was Never but we still had to write unwrap() or similar, which are now cleaned up.

I mostly resisted rewriting code to use async/await, but futures 0.3 omits loop_fn, for the excellent reason that code using it is horrible, so I rewrote our uses of loop_fn with async/await. That code is far easier to read (and write) now. Now that the overall conversion has landed we can incrementally adopt async/await as needed.

It took me a little while to get my head around Pin. Pin adds unfortunate complexity, especially how it bifurcates the world of futures into "pinned" and "unpinned" futures, but I accept that there is no obviously better approach for Rust. The pin-project crate was really useful for porting our Future/Stream combinators without writing unsafe code.

A surprisingly annoying pain point: you can't assign an explicit return type to an async block, and it's difficult to work around. (I often wanted this to set an explicit error type when using ? inside the block.) Async closures would make it easy to work around, but they aren't stabilized yet. Typically I had to work around it by adding explicit type parameters to functions inside the block (e.g. Err).

Outside the main debugger engine we still have quite a lot of code that uses external crates with tokio and futures 0.1. We don't need to update that right now so we'll keep putting it off and hopefully the ecosystem will have evolved by the time we get to it. When the time comes we should be able to do that update more incrementally using the 0.1 ⟷ 0.3 compatibility features.

The really good news is that once I got everything to build, we had only two regressions in our (fairly good) test suite. One was because I had dropped a ! while editing some code. The other was because I had converted a warning into a fatal error to simplify some code, and it turns out that due to an existing bug we were already hitting that situation. Given this was a 4K line patch of nontrivial complexity, I think that's a remarkable testament to the power of Rust.