Tuesday 18 January 2005
Over the years I've been hacking on Mozilla, I've wasted a lot of time grappling with the limitations of CVS. There are a few main issues for me:
- Frequently I need to collect all the changes I've made to a tree into a patch file, by diffing my tree against the trunk. CVS contacts the server and takes a long time.
- CVS updates are slow.
- Managing many trees is painful, especially creating a tree with an initial checkout (very slow), building it, and keeping the trees up to date. This means that I tend to use only a few trees, usually just one. This is bad because various pieces of work get mixed up in a tree, and my patches or even checkins sometimes contain fragments from logically separate changes. Ugh.
Since Mozilla is my full time job now, I'm investing some energy in tackling these problems. I looked at various "CVS replacement" programs, especially the version control systems that claim to interoperate with CVS and provide local branching. But for various reasons they're not suitable; the main reason is that none of them have been proven on codebases as large as Mozilla.
So I've decided to use rsync to maintain a mirror of the entire Mozilla CVS repository on my local machine. Checking out and updating many trees from it is very fast. Likewise, doing CVS diffs is now a local and very fast operation. But how costly is it to maintain a synchronized copy of the entire repository, given that it records all the changes ever made to 5 million lines of code over six years? Surprisingly, it's not bad at all. It's using about 2.6GB of disk space. It took about 30 minutes to pull down the first time, using rsync with compression. My office Internet link is pretty good, but I am in New Zealand. It takes about 20 seconds to resynchronize on a day when there aren't many checkins. Altogether I'm extremely pleased. My only fear is that a lot of people will start doing it and mozilla.org will have to restrict rsync access...
Another thing I'm doing is using ccache to speed up builds and share build products across trees. Since in most trees only a few files are modified, building in each tree will mostly produce the same object files. ccache does a good job of recognizing when the same file is being compiled in different places and just copying the object file from its cache into the output directory. In fact you can do better just by hardlinking the file from the cache to the output, so you only have one copy of the output file shared by all your trees. There is one problem with ccache currently that I believe I have fixed... I'll write about it soon, once I've verified that my fix works.