Keith Packard discusses why robust repository formats are important for any version control software. This is really an important point when evaluating the robustness of any version control system, as you want to avoid the corruption of all your data due to a disk failure and an unfortunate way to store the tracked information.

One sad point, as he’s a person I certainly admire for his work on free software, is that, while praising the way git faces that problem, he unfairly bashes Mercurial (besides other VCSs). The fact is that the comparison done is not correct. While the comparison used to measure store size is between packed git and standard Mercurial, the robustness of the repository format comparison uses the unpacked git and standard Mercurial.

I believe this is more of a handwaving issue than any real concern about how git or Mercurial are doing the job.

Mercurial uses a compact representation of data with separate revlog files for each tracked file, manifest and changelog, which are all append-only. Due to the append-only nature of those writes, the changes in each new revision don’t affect previous revisions. You are that way as safe as you can be in any other system with respect to writes and the space usage is very good.

To achieve similar space efficiency git needs to pack the repository data. This is done rewriting the repo, and the operation has to be done from time to time (repack).

IF the atomic append-only writes to the manifest and revlog files in Mercurial can be considered dangerous, then repacking is even more so, as it forces a rewrite of all the repo data, multiplying the chance of a failure.

So, if any corruption can happen on a faulty write it will hit git (unpacked) or Mercurial in the same way, but anytime you pack your repo in git you’re risking your data and the write fails you can corrupt its repository.

That said, one can meditate whether this unlikely situation of failure is of so high importance, as, in distributed systems, a lot of other repos will act as a full backup of the canonical repo. Furthermore, both use atomical operations and safety should be more than good enough.

A short reply to Keith’s post has been written by Matt Mackal, Mercurial‘s main author, and detailed information about the repository format and other design details for Mercurial can be found on the project’s wiki.