DVCS Round-up: One System to Rule Them All?--Part 3

12 comments

In this third article, I will present some benchmark results for the systems I discussed before.

In order to get at least some measure of a VCS's performance, two synthetic benchmarks were used. Of course, benchmark results cannot be transferred directly to real-life scenarios, but they still provide valuable information on how a system acts under stress. An important point, of course, is also comparability of values between the systems, which means that all candidates had to be perform the same tests under the same conditions. The test system was a VM running Ubuntu 8.10 (“Intrepid”), and the software versions used were SVK 2.0.2, darcs 2.1.0, monotone 0.42, Bazaar 1.10, Mercurial 1.1.2, and Git 1.6.1. The reason for the somewhat older version 2.0.2 of SVK was simply that 2.2 proved so resistant to my installation tries that I gave up after a while and just used the version provided by the Ubuntu software repository. However, I do not expect that 2.2 shows a noticeably different behavior than the tested version.

The first benchmark simulates a linearly growing repository. A directory is sequentially filled with 4000 files, and every file is changed five times. Every change is recorded by the revision control system, leading to 24,000 revisions in the repository. Ideally, the time for each check-in is constant. However, no candidate achieved this goal, although Git comes very close. Apart from timing, the repository size is also measured. Since during the file changes data is only appended, there is little compressible redundancy and the repository size is expected to grow linearly. However it should not be significantly larger than the checkout size at any given point (ideally smaller). In the second test, the time taken for a rather large check-in is measured. Since here a list of files has to be processed, at least a linear correlation to the number of files is expected. However, the correlation coefficient should be as small as possible, and the VCS should absolutely not show any super-linear (e.g., quadratic or even exponential) behavior.

Dependence on Repository Growth

The first figure shows the results of the first benchmark concerning check-in time.

Commit time vs. repository size

All systems show some kind of increase during repository growth. SVK and darcs show the worst scaling behaviour, with darcs showing the strongest dependence on repository size. It starts as one of the fastest systems and then becomes rapidly slower as the repository grows. SVK is somewhat more stable, but the check-in time increases significantly, as well. When the repository contains 24,000 revisions, adding a change in a single file takes darcs as much as five seconds, and SVK still more than two. Bazaar and monotone are somewhat better, but still a serious slowdown can be seen. Mercurial shows a very weak dependence, with check-in time increasing by 180 milliseconds over the whole range. Git shows almost no dependence at all: the mean time taken to record a change only increases by 20 milliseconds during the whole test. A truly impressive feat.

If one looks at the repository size, the differences between the contestants become even stronger than already seen.

space on disk vs. repository size (number if commits)

As I have already said, both SVK and darcs use quite a lot of space on the disk. However, I was still very surprised to see just how much space SVK really wasted: the repository size at the end of the benchmark test was nearly 1.8 gigabytes. Remember, this test created just 4000 files (each with a size of 4050 bytes). Thus, the checkout in the end is about 15 megabytes in size. Now compare this number to the SVK repository size again.

darcs performed much better than SVK, but still quite badly: about 140 megabytes of diskspace were used to record all changes. I strongly suspect its excessive memory and space usage are the main reasons why darcs performs this badly when managing larger repositories. I did not really check for this in my tests, but I did notice that darcs used an awful lot of memory during later parts of the benchmark (200 megabytes and more), which could not be mitigated even with regular optimisation and creation of checkpoint commits. Bazaar and monotone show again a quite similar behaviour, both using about 50 megabytes for their repositories with Bazaar again being a bit better than monotone.

Mercurial and Git once again show that they belong in a different league than the others. Mercurial's repository size was only about 10 percent larger, and Git's even about 10 percent smaller than the checkout. The reason for the peculiar “saw tooth” shape of the Git curve is the fact that the system relies on regular repository optimisation. Therefore every 500 commits a repacking and pruning of the object database was performed, resulting in the noticeable bumps. However, even though these optimisation steps of course take time the overall time taken for the benchmark still decreased: disabling the optimisation steps results in Git taking about twice as long for the whole test (and allocating about 100 megabytes more on the harddisk).

Dependence on Check-in Size

In the next benchmark test, the candidates' dependence on the size of a check-in was tested. I did not suspect many surprises here, but SVK managed it nonetheless.

Commit time vs. commit size

I actually had to do not only an inset to show the SVK result, but a logarithmic one at that. Adding 2000 files to an empty repository took SVK nearly a whopping 1400 seconds, or over 23 minutes. Among the rest, darcs was again the slowest with about 5.5 seconds, while Git managed the task in about 300 milliseconds. monotone took about a second, Bazaar 1.9 and Mercurial 2.5 seconds. None of these values is truly catastrophic, but Git shows that there is much room for improvement. I was somewhat surprised at the comparably bad result of Mercurial, which is a bit of a contrast to the rest of its benchmark results.

Dependence on File Count

In a last test I checked the time it took to add a single file right after the multi-file commit from the previous test. Thus in every step the only difference was the number of files already recorded in the previous step, while the history length was always the same (i.e., 1).

Commit time vs. number of files under VCS control

Mercurial and Git showed more or less no dependence on the size of the previous commit, and the other systems only small changes. The SVK measurement did not produce a meaningful result, so I did not include it here. Interestingly, the increase is more or less the same for darcs, monotone and Bazaar (about 250 milliseconds). The result indicates that the strong slowdown which darcs showed in the first test is mainly correlated to the number of revisions in the repository, while in the cases of monotone and Bazaar the time increase seems to be caused in roughly equal parts by the number of files and the number of revisions.

Conclusion

Although the benchmarking tests I ran are rather simple, they showed some interesting results. First of all, they confirmed that SVK and darcs both are not well-suited for larger projects, whereas monotone and Bazaar scale at least well enough to handle medium-size projects (featuring a few thousand files and maybe 10,000 to 20,000 revisions) without major problems. However, only Git and Mercurial show (almost) flat curves with respect to repository and history size, meaning that for large projects they are definitely the best suited. Furthermore, the measurements showed that Git's legendary speed is no myth either: I have never seen a faster system, although Mercurial shows that other projects can at least come very close.

 

5
Average: 5 (2 votes)
minor clarifications on Darcs
Submitted by kowey on Thu, 02/25/2010 - 13:22.

I did notice that darcs used an awful lot of memory during later parts of the benchmark (200 megabytes and more), which could not be mitigated even with regular optimisation and creation of checkpoint commits

I just thought I should clarify about optimisation and the creation of checkpoints. The darcs optimize command does some operations on the inventory (list of patches) which makes pulling and pushing patches faster and also reduces the size of patch bundles created during darcs send. Checkpoint patches are now obsolete, but when they were still in place, they were used for creating (the also obsolete) "partial" repositories for a faster darcs get operation. Neither would have helped much with memory use, I'm afraid. But thanks for taking the trouble to explore these options!

While I'm at it, we're hard at work on improving Darcs performance and scaling. We're starting to develop a benchmarking infrastructure that lets us track our progress. It's going to take a lot more feverish hacking for it to become really informative but I think we're starting to get a toehold on the problem.

The imminent Darcs 2.4 release should make some repository-local operations faster, but I suspect we'll need at least one or two more releases before we can make some real headway in other areas. We're really excited about the prospect of a much faster Darcs in the future (one step at a time..).

Missing Statistics
Submitted by d3x0r on Wed, 11/04/2009 - 04:33.

You missed an important statistic. In the case of Mercurial and Monotone, these are decentralized, and information needs to be propagated to peers also. If we were in a solitary cell, the above statistics would be meaningful, but when you have to interchange data with either your peers or a server, there are some painful benchmarks you didn't mention.

  • Mercurial is sensitive to the amount of files in a repository when checking for changes to synchronize between two systems.
  • Monotone is sensitive to the number of revisions in a repository when checking for changes to synchronize between two systems.

    A mercurial repository with 2700 some files takes 22 seconds, even if there are only 3 revisions. A monotone repository with 29,000 revisions takes only around 15 seconds to synchronize (and this is across 208 branches, file count irrelavent [but it's a LOT]).

    (And I have to comment somewhere, might as well be here)


    There are features that monotone has that no other version system has, such as the ability to merge a branch into a sub-directory of another branch. You may then maintain smaller branches which merge into the tree as a whole, an advantage of having a single repository for all workspaces. Revisions history made on the limbs is kept intact during such propagation, rather than cherry-picking changes from one and dropping them in another.

  • Re: Missing Statistics
    Submitted by rmfendt on Wed, 11/04/2009 - 10:04.

    A mercurial repository with 2700 some files takes 22 seconds, even if there are only 3 revisions.

    Well, I am sorry, but I cannot reproduce your results. A quick test with Mercurial 1.3.1 in my test VM (with the host dealing with quite some load at the same time) yielded the following results:

    • Less than 4 seconds for committing 5000 completely random 1k files.
    • Less than 2 seconds for registering/committing 1000 file changes.
    • about 3 seconds for creating a local clone.
    • about 7-11 seconds for creating a remote clone (which is of course more expensive due to network protocol and actually copying data instead of linking).
    • 1 second for (locally) pulling the aforementioned 1000-file changeset.
    • 1.3 seconds for a remote pull.
    • 2-3 seconds for pulling/updating in one go.

    So, the most expensive operation is the remote clone, which is of course expensive (and would be in monotone as well), since it involves the actual creation of 5k files. Depending on machine load and cache state, the time for this varies quite a lot of course, but is never unreasonably high, considering that even just creating 5k files using "dd" takes quite a bit longer than that.

    Pulling changes remains (on my machine at least) more or less an O(1) operation as far as file and/or change count is concerned, being more or less only dependant on the actual amount of bytes transfered. So my mileage differs quite a lot from yours, though I currently have no idea why.

    Edit: Actually, I have to amend my previous results. Mercurial 1.3.1 on Windows seems to be noticeably slower than on Linux/Unix systems. A remote clone takes up to 30 seconds, up to 50 if it includes a simultaneous working directory update. Registering/committing 1k changes takes 4-5 seconds, pulling/updating about 5 seconds as well.

    This behaviour seems to be due to file creation being significantly more 'costly' on Windows systems, which could be the reason for monotone being much faster in this case (since the repository is one monolithic file as far as the operating system is concerned). This would also explain that pulling a set of changes remains more or less O(1) (albeit taking 2-4 times longer than on Linux), and that cloning time nearly halves when disabling the automatic working directory update. It might also be worth noting that creating a 'naked' local clone (no checkout, and using file links instead of copies) only takes 4-5 seconds, which is actually a case in point.

    There are features that
    Submitted by jnareb on Wed, 11/04/2009 - 08:26.

    There are features that monotone has that no other version system has, such as the ability to merge a branch into a sub-directory of another branch. You may then maintain smaller branches which merge into the tree as a whole, an advantage of having a single repository for all workspaces. Revisions history made on the limbs is kept intact during such propagation, rather than cherry-picking changes from one and dropping them in another.

    That is not true. Git also has ability to merge a branch into a subdirectory of other branch: it is called subtree merge in Git.

    It is used by Git itself to merge git-gui development, which is done in separate repository.

    file count dependence in Bazaar
    Submitted by jelmer on Wed, 01/28/2009 - 15:56.

    The file count dependency in Bazaar is caused by the fact that it stores an "inventory" for each revision that lists what versions of what files are present in that revision. At the moment, the inventory is always parsed/created in its entirety, which obviously has consequences for performance.

    There is some work going on to fix this, code-named "brisbane-core" that will hopefully land in one of the next few versions of Bazaar.

    Primary source data, if convenient...
    Submitted by kfogel on Wed, 01/28/2009 - 15:33.

    It would be great if you could post the scripts you used to set up and run these experiments. (Some systems' performance can vary quite a bit depending on how the initial repository or checkout was set up, for example.)

    Thanks!

    -Karl

    Scripts used
    Submitted by rmfendt on Sat, 01/31/2009 - 16:30.

    The script I used was primarily a quickly-hacked-together Python script. It is not well documented but should be readable. If you are really interested, I can send you a copy the next time I am in reach of the respective machine (i.e., Monday).

    could we (darcs team) have those scripts too?
    Submitted by kowey on Thu, 02/25/2010 - 13:11.

    Those are really nice (informative) graphs.

    Could we have a copy too? I'm on gmail as eric.kow, but maybe it'd be worthwhile to just post them on the web or a wiki somewhere, in whatever state they are). I'd like to forward these to the darcs-users mailing list too if you're OK with this.

    We've been interested in measuring things in terms of scaling factors and also doing comparative benchmarks (issue1538) for a while, and you've done it!

    Scripts used
    Submitted by nagyv on Tue, 02/10/2009 - 22:09.

    Hi,

    could you send me your scripts as well, please

    you can find me at viktor dotted nagy { T } gmail

    thanks

    Re: Scripts used
    Submitted by kfogel on Sat, 01/31/2009 - 16:42.

    Yes, thanks. My email is "kfogel {_AT_} red-bean.com".

    Terminology
    Submitted by jnareb on Wed, 01/28/2009 - 00:46.

    By check-in you mean the act of creating new commit (i.e. action named 'commit' in most distributed version control systems), isn't it?

    Re: terminology
    Submitted by rmfendt on Wed, 01/28/2009 - 06:41.

    Yes, by "checking in" a change I meant the process of saving a changeset in the VCS. In most systems this is called a "commit", in darcs it's called "record". Sorry for the confusion.