In this third article, I will present some benchmark results for the systems I discussed before.
In order to get at least some measure of a VCS's performance, two synthetic benchmarks were used. Of course, benchmark results cannot be transferred directly to real-life scenarios, but they still provide valuable information on how a system acts under stress. An important point, of course, is also comparability of values between the systems, which means that all candidates had to be perform the same tests under the same conditions. The test system was a VM running Ubuntu 8.10 (“Intrepid”), and the software versions used were SVK 2.0.2, darcs 2.1.0, monotone 0.42, Bazaar 1.10, Mercurial 1.1.2, and Git 1.6.1. The reason for the somewhat older version 2.0.2 of SVK was simply that 2.2 proved so resistant to my installation tries that I gave up after a while and just used the version provided by the Ubuntu software repository. However, I do not expect that 2.2 shows a noticeably different behavior than the tested version.
The first benchmark simulates a linearly growing repository. A directory is sequentially filled with 4000 files, and every file is changed five times. Every change is recorded by the revision control system, leading to 24,000 revisions in the repository. Ideally, the time for each check-in is constant. However, no candidate achieved this goal, although Git comes very close. Apart from timing, the repository size is also measured. Since during the file changes data is only appended, there is little compressible redundancy and the repository size is expected to grow linearly. However it should not be significantly larger than the checkout size at any given point (ideally smaller). In the second test, the time taken for a rather large check-in is measured. Since here a list of files has to be processed, at least a linear correlation to the number of files is expected. However, the correlation coefficient should be as small as possible, and the VCS should absolutely not show any super-linear (e.g., quadratic or even exponential) behavior.
Dependence on Repository Growth
The first figure shows the results of the first benchmark concerning check-in time.

All systems show some kind of increase during repository growth. SVK and darcs show the worst scaling behaviour, with darcs showing the strongest dependence on repository size. It starts as one of the fastest systems and then becomes rapidly slower as the repository grows. SVK is somewhat more stable, but the check-in time increases significantly, as well. When the repository contains 24,000 revisions, adding a change in a single file takes darcs as much as five seconds, and SVK still more than two. Bazaar and monotone are somewhat better, but still a serious slowdown can be seen. Mercurial shows a very weak dependence, with check-in time increasing by 180 milliseconds over the whole range. Git shows almost no dependence at all: the mean time taken to record a change only increases by 20 milliseconds during the whole test. A truly impressive feat.
If one looks at the repository size, the differences between the contestants become even stronger than already seen.

As I have already said, both SVK and darcs use quite a lot of space on the disk. However, I was still very surprised to see just how much space SVK really wasted: the repository size at the end of the benchmark test was nearly 1.8 gigabytes. Remember, this test created just 4000 files (each with a size of 4050 bytes). Thus, the checkout in the end is about 15 megabytes in size. Now compare this number to the SVK repository size again.
darcs performed much better than SVK, but still quite badly: about 140 megabytes of diskspace were used to record all changes. I strongly suspect its excessive memory and space usage are the main reasons why darcs performs this badly when managing larger repositories. I did not really check for this in my tests, but I did notice that darcs used an awful lot of memory during later parts of the benchmark (200 megabytes and more), which could not be mitigated even with regular optimisation and creation of checkpoint commits. Bazaar and monotone show again a quite similar behaviour, both using about 50 megabytes for their repositories with Bazaar again being a bit better than monotone.
Mercurial and Git once again show that they belong in a different league than the others. Mercurial's repository size was only about 10 percent larger, and Git's even about 10 percent smaller than the checkout. The reason for the peculiar “saw tooth” shape of the Git curve is the fact that the system relies on regular repository optimisation. Therefore every 500 commits a repacking and pruning of the object database was performed, resulting in the noticeable bumps. However, even though these optimisation steps of course take time the overall time taken for the benchmark still decreased: disabling the optimisation steps results in Git taking about twice as long for the whole test (and allocating about 100 megabytes more on the harddisk).
Dependence on Check-in Size
In the next benchmark test, the candidates' dependence on the size of a check-in was tested. I did not suspect many surprises here, but SVK managed it nonetheless.

I actually had to do not only an inset to show the SVK result, but a logarithmic one at that. Adding 2000 files to an empty repository took SVK nearly a whopping 1400 seconds, or over 23 minutes. Among the rest, darcs was again the slowest with about 5.5 seconds, while Git managed the task in about 300 milliseconds. monotone took about a second, Bazaar 1.9 and Mercurial 2.5 seconds. None of these values is truly catastrophic, but Git shows that there is much room for improvement. I was somewhat surprised at the comparably bad result of Mercurial, which is a bit of a contrast to the rest of its benchmark results.
Dependence on File Count
In a last test I checked the time it took to add a single file right after the multi-file commit from the previous test. Thus in every step the only difference was the number of files already recorded in the previous step, while the history length was always the same (i.e., 1).

Mercurial and Git showed more or less no dependence on the size of the previous commit, and the other systems only small changes. The SVK measurement did not produce a meaningful result, so I did not include it here. Interestingly, the increase is more or less the same for darcs, monotone and Bazaar (about 250 milliseconds). The result indicates that the strong slowdown which darcs showed in the first test is mainly correlated to the number of revisions in the repository, while in the cases of monotone and Bazaar the time increase seems to be caused in roughly equal parts by the number of files and the number of revisions.
Conclusion
Although the benchmarking tests I ran are rather simple, they showed some interesting results. First of all, they confirmed that SVK and darcs both are not well-suited for larger projects, whereas monotone and Bazaar scale at least well enough to handle medium-size projects (featuring a few thousand files and maybe 10,000 to 20,000 revisions) without major problems. However, only Git and Mercurial show (almost) flat curves with respect to repository and history size, meaning that for large projects they are definitely the best suited. Furthermore, the measurements showed that Git's legendary speed is no myth either: I have never seen a faster system, although Mercurial shows that other projects can at least come very close.

