Friday 15 January 2010
More On Patch Division
Kyle Huey asked me to comment on my approach to breaking up work into a large number of small patches.
First a big disclaimer: I've only been doing this for a short time, but I know other communities have been doing it for a long time. I bet Linux kernel developers know far more about this than I do. Probably many Mozilla developers do too.
As I always have, I try to plan out the work into incremental steps in my head before I actually start writing code. This plan always needs modification after contact with the enemy but it generally works. Each step is of course a new mq patch.
Sometimes I find that additional steps are needed. If I need to add new steps after the point I'm currently up to, there's nothing to do. If I need to add new steps before the point I'm currently up to, it's easy to pop off mq patches and do the work in the right place.
When I think I've completed a step, I often (but not always) do a build to make sure it compiles. However, I rarely test my changes until I've finished the work. Perhaps this makes me a bad person, but I find that test effort before that point is often wasted since I frequently go back and change previous patches as I learn things from writing later patches.
When I'm testing and fixing bugs, I accumulate many fixes as a diff in my working copy. Periodically (once a day?) I take the changes in that diff and distribute them to the patches that each change logically belongs to. I.e., if I fix a bug in new code that was added by a patch, I apply the fix to that patch.
Once the code works, I then reread the patches (a sort of self-review). One thing I look for is large patches that could be broken up into smaller ones consisting of logically independent changes. (I especially focus on the largest patches, to try to minimize the maximum patch size.) Wherever breakup is easy to do (e.g. because the smaller patches don't overlap), I'll do it. If it's hard because patches overlap, I generally won't do it unless the gain in clarity seems large.
When patches can be broken up into really tiny fragments, I'm not sure what the correct minimum patch size is ... a patch consisting solely of many instances of the same mechanical change probably isn't worth breaking up, unless different reviewers are going to review each piece.
It's important for bisection searches that at each stage in the patch queue, the project at least builds. To verify this for large patch queues I just run a script overnight that does a loop of hg qpush ; make.
When the code is ready to submit, I upload all the patches to the bug, numbering them with "Part NNN" to make it easier to refer to a particular patch. For large projects it's also helpful to publish the patches as a Mercurial repository using qcommit.
If reviews take a while, as they often do for large projects, every so often (once a week to once a month) I'll refresh the patches to trunk (see below) and push the results to tryserver to make sure nothing's broken.
One tip: at least with Mercurial 1.1, DO NOT refresh to trunk using rebasing (e.g. hg pull --rebase) while you have mq patches applied. Instead, hg qpop -a, hg pull -u, and then hg qpush -a. For me, rebasing screws up my patches in various ways. Also I find that fixing conflicts during rebasing is far more tedious than fixing conflicts during qpush, partly because I have no idea what rebasing is actually doing, but mostly because as of 1.1 rebasing with mq seems to force me to fix the same conflicts over and over again.
Comments
(Don't get me wrong --- I love Mercurial!)
hg diff > ~/tmp/patch
hg revert --all
... open ~/tmp/patch in editor ...
... repeatedly cut out hunks and paste them into "patch -lp1" ...
... when I've accumulated all the hunks for a patch into my working copy, "hg qnew -f ..." ...
... repeat until all the hunks have been removed from ~/tmp/patch ...