A recent development in the field of version (or revision) control software is the emergence of distributed systems. These discard the notion of a central repository where all project history is kept, and replace it by a network of several repositories that are synchronised regularly. In this model, a developer has a complete repository on his hard disk, which is synchronised with the outside
whenever deemed necessary. This way one can commit early and often, without fear of breaking the others' code, and it also reduces network load as it does not need a network connection for every small change. In practise, of course there will be a central repository after all, but this is a decision made by the developers and not dictated by the system.
Working distributed has other advantages, too. For example, backups become a no-brainer, since every developer has the complete project history. The classic dilemma of who should have write access to the main project repository becomes significantly less problematic, since the maintainers simply pull the changes from a few persons they trust, which in turn (hopefully) have done likewise beforehand. Linus Torvalds referred to this as a network of trust
, which is actually the same principle for any security and encryption software.
So, while distributed version control has several advantages, one problem remains: we have to choose one that (hopefully) fits our needs. At the time of this writing, at least half a dozen systems compete for the developers' affections, and this series will hopefully provide you with a basis for your decision.
Synopsis:
- The BitKeeper Story
- SVK: The Odd Kid in the Playground?
- monotone: Security First
- Outlook: What's Coming Up Next
In this review, we will take a look at six different revision control systems. Namely these are Git, Mercurial, darcs, Monotone, Bazaar (which is used by the Ubuntu project), and SVK (which is based upon Subversion). All six systems are distributed, and we will take a look at the different workflows supported (or enforced) by them. One problem of CVS that is definitely a thing of the past are binary files: none of the systems tested had any problems with them. Also, renaming files (one of SVN's former selling points) is a commodity these days. Interestingly, the contestants can be divided into two groups according to their age. SVK, monotone and darcs where all started in 2003, whereas Bazar, Git and Mercurial are all projects that started in early 2005. Therefore, in a certain sense the latter three are second-generation
DVCSs.
However, one important property is scaling, which is no moot point as e.g., the Linux kernel consists of tens of thousands of files, and its main repository shows well over a hundred thousand changesets between April 2005 and early 2009. I work in a company with one main product, and even our repository contains nearly 50,000 revisions and literally hundreds of thousands of files. So, every software project can become larger than you initially expected, and this is especially true for open source projects. So it is vital that the revision control system is able to keep up. Unfortunately, benchmarking revision control systems is no easy task. However, I will provide at least some synthetic benchmark figures in a later part of this article series.
The Birth of Very Different Twins: The BitKeeper Story
In 2002, Linus Torvalds caved in to the pressure from the community to finally adopt some sort of source control mechanism for the Linux kernel. Since the existing alternatives (namely CVS and SVN) did not fit his requirements, Torvalds finally settled on a closed source solution: BitKeeper by BitMover Inc. This decision was controversial at best, and many argued that BitKeeper's advantages were not worth the risk of becoming dependent on a piece of closed source software. Notable criticism came from GNU founder Richard Stallman, arguing that especially a flagship project like Linux should avoid using proprietary tools. However, Torvalds did not give in, essentially asserting that he would use either BitKeeper or no source control at all.
BitMover CEO Larry McVoy tried to calm down the developers by offering a free
version of BitKeeper to all those who wanted to participate in free software projects. This version was only free in the sense of free beer
, but many developers actually started using it. Others, including prominent kernel developer Alan Cox, did not. Things pretty much stayed this way until in 2005, Andrew Tridgell from the Samba project tried to write his own BitKeeper client which would allow him to view the complete history of a BK project. This feature, albeit essential to any form of revision control software, was one that BitMover reserved for its commercial license. Larry McVoy claimed that Tridgell had reverse engineered
his software, and announced that BitMover would withdraw the free
version of BitKeeper. Tridgell later demonstrated his technique (namely, connecting to a BK server and typing help
), but the horse had already left the barn. Even had BitMover not actually revoked the free client license, it would not have made much of a difference, since the problem of risk and dependence on proprietary components was very obvious once more. Thus the predictions by Stallman and others came true, and the Linux kernel developers were left without version control again.
Torvalds later explained that he examined several revision control systems looking for a BitKeeper alternative, but once again none could satisfy his needs. The only candidate he deemed worth a second look was the monotone system, but it did have significant performance and scaling problems at the time, which pretty much disqualified it for a project as huge as the Linux kernel. Torvalds decided he could write something better than anything out there in two weeks
and later claimed, and I was right.
The result of this was Git, one of the systems we will take a closer look at in this review.
More or less at the same time another kernel developer, Matt Mackall, came to pretty much the same conclusions as Torvalds, and also decided to write his own revision control software, resulting in the Mercurial system. Although the implementation details of the two systems are very different (first versions of Git depended very much on POSIX and consisted of many small tools written in C, while Mercurial was implemented in Python), the concepts about how revision control ought to work are eerily similar. One could view Git and Mercurial as an example for parallel software evolution, and had Mackall started only a few weeks earlier, maybe Git would not exist nowadays. However, the shared starting point, similar concepts and similar age definitely mark the two systems as special.
SVK: The Odd Kid in the Playground?
Let's start our VCS roundup with a relatively unknown version control system: SVK. This software was created in 2003 by a developer called Chia-Liang Kao, and is currently maintained by Best Practical Solutions (who are also Kao's current employer). SVK is based upon Subversion, und inherits more or less all of its strengths and weaknesses. It is advertised as an independent distributed revision control system that only uses SVN for its infrastructural needs. However, the dependence goes a lot further, and one should perhaps rather call SVK an addon for Subversion, albeit a relatively powerful one.
In fact, SVK is not even really a completely distributed system, after all. It is a SVN mirroring system with offline capabilites. So, it does enable you to work 'locally' even when not connected to a network. However, a central SVN repository is still needed in order to share your work with the outside. While such a repository will usually exist anyway, there is another catch here: SVK is not capable to synchronise against another SVK repository directly. In fact, SVK repositories are always local to the user, and the only way to make your changes available to another developer is by publishing them on a SVN repository. Therefore, one of the most attractive properties of really distributed
systems, namely the ability for groups of developers to work together and easily share their work with each other, is not readily provided.
One can emulate this work mode by using a server-based SVN repository (rather than the usual local one) as SVK's backend, but everyone who has set up and maintained a Subversion server in the past will probably agree that this is neither an easy
nor a quick
solution. So, most of the time, a small group of developers wishing to work on a branch will just create that branch on the central SVN server, which is then mirrored and tracked by SVK. On the developers' systems, SVK makes this branch available using a short-cut name in its own depot
. Every developer can then make a local copy, work and commit in this local branch, pull changes from the SVN source and merge back on occasion. However, this has only few advantages over just using the SVN repository directly.
One definite advantage is the possibility to work offline. Another is the possibility to use short-cut names for long SVN addresses, which is somewhat lessened by the fact that most people do not use SVN directly, anyway. There's integration in various IDEs, and under Windows there's TortoiseSVN, of course. None of these is available when using SVK, so you are stuck with the command line interface. SVK also offers advanced merging (e.g., star merge) and, starting with 2.2, commands to work with named branches. However this is poorly documented and requires that you adhere precisely to Subversion's default directory conventions.
Which brings me to the disadvantages. First of all, SVK feels rather slow, at least when the responsiveness of the frontend is concerned. Although it is claimed that SVK is actually faster than SVN when working with larger changesets, I cannot imagine working with large repositories will be a really joyful experience. Rename tracking is as poor as it is in SVN, so one should be very careful when moving things around in the repository tree. To call the documentation poor
would not be a mere understatement, it would be a downright lie: the documentation is nothing less than atrocious. You have somewhat helpful short texts integrated in the software, but other than that, you are more or less on your own. The website is a poorly managed wiki, in which the task of finding answers ranges from difficult to impossible. Last but not least SVK is implemented in Perl, with lots of package dependencies. Installation of 2.0 in Ubuntu 8.04 installed a total of about 30 packages, and the 2.2 binary would not work at all (the win32 version worked, though).
In summary, if you are already working with a central SVN repository and are in dire need of a solution to work offline on your laptop, SVK might be of interest to you, although other systems offer interoperability solutions too (e.g., git-svn). Apart from that, I cannot really recommend it to anyone. It is an interesting idea, but unfortunately not much more.
monotone: Security First
monotone is a very interesting DVCS. It is one of the older systems in this test, having been around for roughly the same time as SVK and darcs (i.e., since early 2003). Each user of monotone has his or her own private monotone database
which functions as the repository, much like in SVK. This means that one has to check out the data into a workspace area to do any actual work. However, in contrast to SVK all developers can synchronise their databases against each other. This is especially useful e.g. for a small group working on a new feature that might break the main branch, allowing them to easily share work with each other without going through the main repository. A distinguishing feature of monotone is its security model: every developer is required to create a RSA key pair, which allows precise control over access rights. The other systems rely more on external security measures like firewalls and VPNs to make sure only authorised people can access the repository.
monotone has a small but very active community and a few larger open source projects are using it apparently quite happily, most notably Xaraya and the Pidgin messenger. Performance problems appear to be largely a thing of the past, making monotone a viable choice at least for small to medium-sized projects (although I would still probably not put the Linux kernel under monotone control). Throwing large binary files at it was no problem at all, and it felt fast and responsive when I worked with it. Monotone is also pretty space efficient, producing a database size even slightly less than the sum of files added (most of which were already compressed). This is a stark contrast to e.g., darcs, which blew up repository size by over 100MB when adding a 50MB file to the workspace.
In general, monotone is a joy to work with, if you accept the additional complexity of databases and RSA keypairs. Especially for really small projects this can be a bit of a nuisance. Synchronisation is network-based in general, so one monotone instance always has to act as server
for the second to synchronise against it. Usually this won't matter since one sets up the system only once. However, in a truly distributed workflow every developer has to receive the public key of every other developer, which can become tiresome.
The system itself is very powerful. Directories are first-class objects and good rename tracking is supported as well as automatic creation of unnamed branches. Especially this last feature sets monotone apart from e.g. darcs, and deserves a little bit of explanation. Considering two or more developers working on the same tree, commiting to their respective monotone databases, each one technically creates a separate branch. Monotone preserves this structure during synchronisation, which often results in two or more strands
of revisions in the same database. The ends of these strands are called heads,
and as long as you follow a strictly sequential development model there will always be just one.
However, in the case of parallel development, you will end up with several heads in your monotone database after a sync, which then usually need to be merged. This allows for a much safer merge process since every developer commits every single change before the merge. The merge can therefore, should conflicts arise, be completely reversed. Other systems work directly in the working directory, most notably CVS and Subversion. But their update, then commit
principle comes with the risk that the merging might not be completed automatically and cannot be reverted either, leaving you stuck. This is definitely no problem with monotone which places all revisions in a so-called directed acyclic graph
(DAG). The technique has proven to be so powerful that all the newer systems in this review (Bazaar, Git, Mercurial) work the same way.
Monotone is available in pre-packaged form for several Unix and Linux flavors, for MacOS and for Windows. Support for several different merge tools is available, and at least one graphical user interface (guitone). Work on several other interfaces is underway. All things said, the only real gripe I have with monotone is the relative complexity of its setup, albeit it is still an absolute joy compared to SVK (and well documented, in contrast to the latter).
Outlook: What's Coming Up Next
In the next part of the article series, I will complete the round-up by taking a closer look at darcs, Bazaar, Git and Mercurial. Especially the latter three are quite popular, being used by Ubuntu (Bazaar), the Linux kernel (Git) and the Mozilla project (Mercurial). The third article will include a detailed performance comparison of the systems in different scenarios. So I hope things will remain interesting.

