Thursday 25 October 2018
We have 85K lines of Rust code implementing the backend of our Pernosco debugger. To impose some modularity constraints and to reduce build times, from the beginning we organized our code as a large set of crates in a single Cargo workspace in a single Gitlab repository. Currently we have 48 crates. This has mostly worked pretty well but as the number of our crates keeps increasing, we have hit some serious scalability problems.
The most fundamental issue is that many crates build one or more executables — e.g. command-line tools to work with data managed by the crate — and most crates also build an executable containing tests (per standard Rust conventions). Each of these executables is statically linked, and since each crate on average depends on many other crates (both our own and third-party), the total size of the executables is growing at roughly the square of the number of crates. The problem is especially acute for debug builds with full debuginfo, which are about five times larger than release builds built with debug=1 (optimized builds with just enough debuginfo for stack traces to include inlined functions). To be concrete, our 85K line project builds 4.2G of executables in a debug build, and 750M of executables in release. There are 20 command-line tools and 81 test executables, of which 50 actually run no tests (the latter are small, only about 5.7M each).
The large size of these executables slows down builds and creates problems for our Gitlab CI as they have to be copied over the network between build and test phases. But I don't know what to do about the problem.
We could limit the number of test executables by moving all our integration tests into a single executable in some super-crate ... though that would slow down incremental builds because that super-crate would need to be rebuilt a lot, and it would be large.
We could limit the number of command-line tools in a similar way — combine all the tool executables a super-tool crate that uses the "Swiss Army knife" approach, deciding what its behavior should be by examining argv. Again, this would penalize incremental builds.
Cargo supports a kind of dynamic linking with its dylib option, but I'm not sure how to get that to work. Maybe we could create a super-crate that reexports every single crate in our workspace, attach all tests and binary tools to that crate, and ask Cargo to link that crate as a dynamic library, so that all the tests and tools are linking to that library. This would also hurt incremental builds, but maybe not as much as the above approaches. Then again, I don't know if it would actually work.
Another option would be to break up the project into separate independently built subprojects, but that creates a lot of friction.
Another possibility is that we should simply use fewer, bigger crates. This is probably more viable than it was a couple of years ago, when we didn't have incremental rustc compilation.
I wonder if anyone has hit this problem before, and tried the above solutions or come up with any other solutions.