Saturday, 22 January 2005

Sharing binaries across multiple Mozilla trees with ccache

As I explained earlier, I want to have multiple Mozilla source tree checkouts each with a different set of patches applied, and build Mozilla in each of these trees. Usually the patches are small so most of the build products are the same in each tree. Doing regular builds in each tree is therefore wasted work and disk space.

ccache can partially overcome this problem by caching the results of compiler invocations so that when a compilation is performed that is "identical" to a previous invocation, the resulting object file will simply be fetched from the cache. So when you use ccache with multiple trees, compiling the same source file in different trees will do the compile in the first tree to be built, and then the other trees will simply get that object file from the cache. The theory's good but there are a few problems.

The first problem is that ccache will only reuse an object file when the preprocessed source texts are identical. The preprocessed text includes the paths of all #include files in the compilation unit. Mozilla's build process usually specifies a number of #include files using absolute paths, which of course will differ per tree, causing ccache to treat the compilations as different. Most of the problems are eliminated if you specify --srcdir=.. in your .mozconfig, so that all sources are accessed relative to the build directory. The remaining problem is that currently the NSPR library is accessed via an absolute path; a patch to fix this is in bug 275790. With these changes, ccache works across trees.

Another problem is deeper. The most efficient way to use ccache is to have it hardlink the cache file to the output, so that there's only one file on disk that's used everywhere (instead of having to make separate copies in each tree). Unfortunately, as the ccache documentation explains, hardlinking doesn't work well with multiple trees.

An example will illustrate. Suppose we have a source file F.c, two trees 1 and 2, each of which builds F.c to make F1.o and F2.o respectively. Suppose that we then link F1.o and F2.o to make final binaries B1 and B2. Suppose we modify F.c at 8am, then build tree 1 at 9am to make F1.o and B1. Then suppose we build tree 2 at 10am. Essentially, ccache makes F2.o by creating a hardlink to F1.o. To stop "make" from getting confused, it must update the last-modified-time on F2.o to be later than when F.c was modified. It does this by stamping it with the current time, 10am. But because this is the same file as F1.o, F1.o's last-mod-time is now also 10am. If you go back to tree 1 and run "make" again, it will unnecessarily relink F1.o to make B1 because it looks like F1.o has changed, even though it hasn't. This is documented as a limitation of ccache with multiple trees, and it would be a fairly serious problem with Mozilla trees because unnecessary linking can take a long time.

But I've found a way to resolve this. In the example, note that it's not actually necessary to update F2.o's timestamp because F1.o is already older than F.c, which F2.o depends on. Put it another way, make will be happy as long as ccache ensures the object file's last-mod-time is newer than the last-mod-time of all the files the object depends on. So I've hacked gmake to take a new option --last-dep-mtime; when this option is set, whenever gmake rebuilds a target, it sets the MAKE_LAST_DEP_MTIME environment variable to the timestamp of the most recently modified dependency (in seconds since the epoch). I've also hacked ccache to check for this environment variable, and when it's present, just ensure that the object file is newer than that time. In most of my cases it won't need to touch the object file at all.

I will try to push these patches upstream to the gmake and ccache maintainers; they're very small code changes.

1 comment:

  1. Yay! My patch got checked in with me asking for review! That's a first ;)
    Thanks for the tip on hard linking, I'll try it when I find time to compile stuff again :P