Saturday 18 December 2021
mold looks pretty cool, and a faster drop-in ld replacement is obviously extremely useful. But having no link step at all would be even faster. Why do native-code compilers write out temporary object files which then have to be linked together, anyway? Could we stop doing that and have compilers emit compiled translation units directly into a final executable file that the OS can execute directly --- a "zero-link" approach? I think we could ... in many cases.
The basic idea is to treat the final executable file (an ELF file, say) as a mutable data structure. When the compiler would emit an object file it instead allocates space in that executable file using a shared memory allocator, and writes the object code directly into that space. To make this tractable we'll assume we aren't going to generate optimal code in size or space; we're going to build an executable that runs "pretty fast", for testing purposes (manual or automated).
An obvious problem is how to handle symbol resolution, i.e. what happens when the compiler emits code for a translation unit that uses symbol A from some other unit --- especially if that other unit hasn't been compiled yet? Here's an option for function symbols: when A is used for the first time, write a stub for A to the final binary and call that. When a definition for A is seen, patch the stub to jump to the definition. Internally this will mean maintaining a parallel hashtable of all undefined symbols that all compiler instances can share efficiently.
For data symbols, instead of using a stub, we can emit a pointer that can be patched to point to the data definition. For complex constants, we might need to defer initialization until load time or emit code to initialize them lazily.
To challenge the design a bit more, let's think about why object files are useful.
Sometimes compilers emit object files for a project which are then linked into multiple different output binaries. True, but it's more efficient to link them once into a single shared library which is then loaded by each of those output binaries, so projects should just do that.
Compilers use object files for incremental compilation: when a translation unit hasn't changed, its object file can be reused. We can capture the same benefits with the zero-link approach: reuse the final executable and keep around its symbol hashtable; when an object file changes, release the object file's space in the final executable, and allocate new space for the new object file.
You can combine multiple object files into a static library and the linker will select the object files that are needed to satistfy undefined symbols. In many projects this feature is only used for "system libraries" --- a project's build system should avoid building project object files that will not be used in the final link. System libraries are usually dynamically linked for sharing reasons. When we really need to subset static libraries, we could link objects from those libraries into our final executable on-demand when we first see them being used.
Another issue is debuginfo (particularly important to me!) Supporting debuginfo would require extending DWARF5 debuginfo sections to allow their data to be scattered over the final executable.
There are lots of unresolved questions, enough that I wouldn't bet money on this actually being practical. But I think it's worth questioning the assumption that we have to have a link step for native binaries.
Update Zig can do something like this.