Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Monday 11 February 2019

Rust's Affine Types Catch An Interesting Bug

A function synchronously downloads a resource from Amazon S3 using a single GetObject request. I want it to automatically retry the download if there's a network error. A wrapper function aws_retry_sync based on futures-retry takes a closure and automatically reruns it if necessary, so the new code looks like this:

pub fn s3_download<W: Write>(
    client: S3Client,
    bucket: String,
    key: String,
    out: W,
) -> io::Result<()> {
    aws_retry_sync(move || {
        let response = client.get_object(...).sync()?;
        if let Some(body) = response.body {
            body.fold(out, |mut out, bytes: Vec| -> io::Result {
                out.write_all(&bytes)?;
                Ok(out)
            })
            .wait()?;
        }
    })
}
This fails to compile for an excellent reason:
error[E0507]: cannot move out of captured variable in an `FnMut` closure
   --> aws-utils/src/lib.rs:194:23
    |
185 |     out: W,
    |     --- captured outer variable
...
194 |             body.fold(out, |mut out, bytes: Vec| -> io::Result {
    |                       ^^^ cannot move out of captured variable in an `FnMut` closure
I.e., the closure can execute more than once, but each time it executes it wants to take ownership of out. Imagine if this compiled ... then if the closure runs once and writes N bytes to out, then the network connection fails and we retry successfully, we would write those N bytes to out again followed by the rest of the data. This would be a subtle and hard to reproduce error.

A retry closure should not have side effects for failed operations and should not, therefore, take ownership of out at all. Instead it should capture data to a buffer which we'll write to out if and only if the entire fetch succeeds. (For large S3 downloads you need parallel downloads of separate ranges, so that network errors only require refetching part of the object, and that approach deserves a separate implementation.)

Ownership types are for more than just memory and thread safety.