You could have invented git (and maybe you already have!)

I believe I may have said this before, but it bears saying again: a big thank you to all of you who have posted insightful comments, both here and on Hacker News, in response to my recent article on git and its followup.  I say without a shadow of exaggeration that I’ve learned more about git from these comments than from anything else I’ve ever read about it.  (Yes, a couple of the comments were borderline abusive, but I think three bad ‘uns out of 300 is beating the averages pretty handily.)

Anyway: among many excellent comments, this long one by Weavejester @ Hacker News was the best of them all.  As I read this, I felt a grin creeping slowly across my face, and at one point I literally laughed out loud as I realised not just what he was doing, but how well he was doing it.  With his explicit permission, I am reposting it here in its entirety, because it deserves a much wider audience.  (The title of this post is also’s Weavejester’s, based on the title of an article about monads.)

Here we go:

Mike, Git seems unintuitive because you don’t have a good grasp of what it does behind the scenes. Imagine trying to get to grips with a Unix shell, if you had no concept of files or directories. In such a scenario, even a simple command like “cat” would seem incomprehensible.

If you’ll indulge me, I’d like to propose a thought experiment.

Designing a patch database

Consider you’re responsible for administering a busy open source project. You get dozens of patches a day from developers and you find it increasingly difficult to keep track of them. How might you go about managing this influx of patch files?

The first thing you might consider is how do you know what each patch is supposed to do? How do you know who to contact about the patch? Or when the patch was sent to you?

The solution to this is not too tricky; you just add some metadata to the patch detailing the author, the date, a description of the patch and so forth.

The next problem you face is that some patches rely on other patches. For instance, Bob might publicly post a patch for a great new scheduler, but then Carol might post a patch correcting some bugs in Bob’s code. Carol’s patch cannot be applied without first applying Bob’s patch.

So you allow each patch to have parents. The parent of Carol’s patch would be Bob’s patch.

You’ve solved two major problems, but now you face one final one. If you want to talk to other people about these patches, you need a common naming scheme. It’s going to be problematic if you label a patch as ABC on your system, but a colleague labels a patch as XYZ. So you either need a central naming database, or some algorithm that can guarantee everyone gives the same label to the same patch.

Fortunately, we have such algorithms; they’re called one-way hashes. You take the contents of the patch, its metadata and parents, serialize all of that and SHA1 the result.

Three perfectly logical solutions, and ones you may even have come up with yourself under similar circumstances.

Merging patches

Under this system, how would a merge be performed? Let’s say you have two patches, A and B, and you want to combine them somehow. One way is to just apply each in turn to your source, fix any differences that can’t be automatically resolved (conflicts), and then produce a new patch C from the combined diff.

That works, but now you have to store A, B and C in your patch database, and you don’t retain any history. But wait! Your patches can have parents, so what if you created a ‘merge’ patch, M, with parents A and B?

A   B
 \ /
  M

This is externally equivalent to what you did to produce C: patches A and B are applied to the source code, and then you apply M to resolve the differences. M will contain both the differences that can be resolved automatically, and any conflicts we have to resolve manually.

Having solved your problem, you write the code to your patch database and present the resulting program to your colleague.

A user tries to merge

“How do I merge?” he asks.

“I’ve written a tool to help you do that,” you say, “Just specify the two patches you want to combine, and the tool will merge them together.”

“Um, it says I have a merge conflict.”

“Well, fix the problem, then tell the system to add your file to the ‘merge patch’ it’s making.”

Your colleague dutifully hacks away, and solves the conflict. “So I’ve fixed the file,” he says, “But when I tell it to ‘commit file’ it fails.”

“Remember, this is a patch database,” you reply, “We’re not dealing with files, we’re dealing with patches. You have to add your file changes to your patch, and then commit the patch. You can’t commit an individual file.”

“What? That’s not very intuitive,” he grumbles, “Hey! I’ve added the file to the patch, but it tells me the merge isn’t complete!”

“You need to add all of the files that have differences that were automatically resolved as well.”

“Why?!”

“Because,” you explain patiently, “You might not like the way those files have been changed. It needs your approval that the way it’s resolved the differences is correct.”

“Why to I have to re-commit everything my buddy has made?” he complains, “Seriously, I want to just commit one file. What the hell is up with your system?”

So that’s it — sneaky old Weavejester has not only tricked me into designing git, but got me defending its design to my dumb-ass colleagues who don’t Get It.

Where do I go from here?  I am not truly sure.  I need to give this some time to sink in, and blog about something else for a while.  But I think one distressingly likely outcome is that I’m going to buy the book [amazon.com, amazon.co.uk, free online version], learn git properly and then start alienating all my friends by telling them all, in the most patronising possible manner, that they’re thinking about version control all wrong and it’s really change control.

Ah, poop.  I feel like C. S. Lewis must have felt when he famously wrote “In the Trinity Term of 1929 I gave in [...] perhaps that night, the most dejected and reluctant convert in all England.”

About these ads

27 responses to “You could have invented git (and maybe you already have!)

  1. Once again, I applaud you for your integrity in this journey. Expressing your frustrations and learning from the responses is a trait that most software developers don’t have these day. Most of us just want to bitch and moan and get pissy when someone doesn’t like our complaining and tries to help us.

    I’m truly interested in seeing your journey continue to unfold – even is (or especially if) you decide that git isn’t the right tool for you. This series of posts and all of the follow ups and responses from the blogging community is a tremendous wealth of knowledge and insight into the trials and tribulations of learning a powerful, yet cryptic tool.

    Keep it up! :)

  2. Carlos Licea

    Hey, you finally got to the paradigm change I faced when I first got into Git, once you grasp that is so much better. In fact Git is a game changer.

    Wait till you get to git rebase –interactive. You’ll love the power it gives you. It’s freaking awsome.

  3. I was still hoping to read how mercurial fared against your scrutiny.

  4. Well done Weavejester, you should expand on that and get it into the foreward for one of the Git books. Seriously.

    I always kinda could see this about Git, that there was some underlying symmetry to it all.

    What confirmed it for me was the core tutorial, but I never quite got around to getting this all the way in my head, just a bit much content:

    http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/core-tutorial.html

    What I’ve just found recently, in my usual habit of needing to turn to Google every time I want to do something with Git I haven’t done in the last week or so is Git Ready. It’s all the same stuff, but in more digestible little chunks with extensive explaining:

    http://www.gitready.com/

    I fear though that this won’t help either. My memory simply doesn’t hold that much. But maybe, maybe I’ll get the core architecture wedged in there a bit better.

    I think I’ll start with refspecs….

  5. Mike, as a relative newcomer to Git, I’m finding this series particularly useful. Personally I’ve found Git to be a breath of fresh air, but have learnt much from your posts and the various comments so far.

    Keep going!

  6. Mike, if you are on a mac and want to try a nice GUI for Mercurial take a look at MacHg. http://jasonfharris.com/machg. There is a video screen cast of how it works on the site so you should be able to get up and started really quickly. If you are familiar with CVS it should be pretty straightforward…

  7. Pingback: Good explanation abour Git. | markjeee.com

  8. Funny that I’m coming across this series of posts just as I’m starting to seriously learn git. I agree that the problem is really just a terrible interface. It was designed by an engineer after all.

    It always feels to me like there needs to be some sensible defaults or shortcuts for the common cases. Are there any tools out there that take his approach?

  9. Carlos Licea

    @poloteg the http://www.gitready.com/ site takes that approach. (Only telling you how to do things, with tips on the mix)
    Also, I’ve heard about tig, never used it though.

    http://www.gitready.com/advanced/2009/07/31/tig-the-ncurses-front-end-to-git.html

  10. There are two things about git: one is the different philosophy in “change control vs version control”, and that seems reasonable and indeed an enhancement over e.g. SVN. This explains some quirks, but not all.

    The other thing is that IMHO it exposes way too much internal behaviour to users, and I think unnecessarily so. Why can I even bring my repository into a state where it has a “decapitated HEAD” (paraphrasing)? Why does it have those ~140 different commands?

    When I first used Subversion, I somehow had the impression it was a nice versioned file system, and someone might build a really nice version control system on top of that. I feel similar about git – it is a really nice patch database, and someone could build a really nice distributed version control system on top of it.

  11. re: Martin Probst

    I feel exactly the same way. I love the git core concepts, repository model, etc.

    But the UI / working copy model, starting with index and other leaky abstractions needs a redesign from scratch.

    For me, it should be much closer to
    Mercurial and SVN.

  12. Nathan Myers

    I guess you could say that git is the C++ of change management systems. Everybody hates a zillion things about it, most especially including the people who use it most, but there’s no substitute when you have something serious and difficult to do.

  13. Marco Rogers

    It seems there is a clear opening for someone to create a nice interface wrapper for git. I’ve picked up some nice one-off scripts along the way like git-info http://inquirylabs.com/blog2009/2008/06/12/git-info-kinda-like-svn-info/ and this one that makes my prompt respond to git repos http://asemanfar.com/Current-Git-Branch-in-Bash-Prompt

    But it would awesome to get a well designed unified layer over the low-level raw git commands. Suitable for streamlining everyday work. But you could still drop down into the nitty-gritty when necessary. The trick is that it takes someone with a deep understanding of git to pull that off correctly.

  14. Marco Rogers said “But it would awesome to get a well designed unified layer over the low-level raw git commands.”

    I think there are already several of these out there. The problem is that, well, there are several of them. In principle, I applaud the idea that git provides a toolset that anyone can use to build a version-control application — just as I applaud the way X11 provides a toolset that anyone can use to build a desktop. But just as the X11 situations has led to a balkanisation between Gnome, KDE and some lesser contenders, so I fear we’re going to end up with a swathe of kinda-similar-but-different git front-ends; and that anyone who learns one of them will know just enough to be dangerous in any of the others.

    The bottom line is that for most practical purposes, the responsibility for creating a UI lies with the party that creates the tool. And of course Linus and his acolytes did create a UI — just a rather nasty one. From here on out, I think layers on top can only be a complement to what’s beneath: in other words, you can’t learn a third-party git UI instead of git, not really; but you may find it useful to learn one as well as git — as a complement to, and a set of shortcuts for, what lies beneath. (My guess is that most long-term git users end up building a bunch of their own private wrapper scripts whether they initially intend to or not.)

  15. I learned gitand got very interested in it by watching the Tekpub videos on the subject. Maybe that would give you a great deal of insight into how git works. I wouldn’t have touched it unless I had watched those.

  16. Thanks for the ProGit book link. It was really helpful.

    After having read more than half of it, I still have the following questions:

    1. The book talks about what the staging area is and claims that it is useful. I haven’t yet discovered at all ‘what’ makes it useful.

    If there is no clear argument for it, I would be using the command-line option “-a” mentioned at ProGit that stages and commits in one shot. Some clouds remain though. In this case how do I diff between modifications that are not staged with the last commit. The book only talks about comparing mods to staged files, and comparing staged files to the last commit, but no diff going all the way from mods in your working tree to the last commit!

    2. I am looking to find a good GUI for GIT. The link below talks about some 20+ of them (also gives the feature matrix).

    https://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools

    Here is what the tool must support:

    a. Be available on all major platforms including Windows.

    b. Must support diff /merge and across file and directory renames. TortoiseGit does not. I could make GitExtensions do it either, even though it is supposed to work.

    Plus, such a tool should preferably be able to compare trees as well as individual files.

    Any recommendations??

    Per the feature matrix at the above link, Git-Cola sounds to be good. I’ll definitely give it a try. Cannot test-drive all 23 of them!

  17. Mike, please delete my previous copy of this comment because I wasn’t signed into WordPress when I submitted it.

    The Git Parable is another retro-origin story with a different funny twist. It’s by Tom Preston-Werner, who wrote Jekyll, the software that “powers” the blog that the Pro Git book is in.

    Most people try to teach Git by demonstrating a few dozen commands and then yelling “tadaaaaa.” I believe this method is flawed. Such a treatment may leave you with the ability to use Git to perform simple tasks, but the Git commands will still feel like magical incantations. Doing anything out of the ordinary will be terrifying. Until you understand the concepts upon which Git is built, you’ll feel like a stranger in a foreign land.

  18. Carlos Licea

    @Alok I’m sure there are other uses, but is so usefull to untangle your commits:
    http://tomayko.com/writings/the-thing-about-git

    http://web.elctech.com/2009/01/21/untangle-your-git-commits-with-git-add-patch/

  19. @Carlos,

    Thanks for the links. This was helpful.

    I found myself convinced about the utility of staging for untangling. Even though it still seems that most of the time you do not need staging but in some cases you may.

    But then I found the comment from “Mark on Tuesday, April 08, 2008 at 08:25 AM” on the first article very interesting.

    Anyways thanks again for the links.

  20. Hey, thanks for that link on CS Lewis which brought me to theonology blog. Cool stuffs.

  21. My pleasure, r2d2! There are lot of different things that I love (including programming, Buffy, sushi, Sushi and C. S. Lewis) and I sometimes worry that by writing about all of them I might be turning some people off. Then I think, what the heck, it’s my blog, I’ll write about what I like. But when I find that my favourite combination of flavours works for someone else, too, that is particularly gratifying.

  22. Pingback: Links for 2010-05-17

  23. Pingback: Länksprutning – 26 October 2010 – Månhus

  24. The Git Parable (cited by Steve above) really should have been title ‘You could have invented git’ – in fact Google brought me here while I was looking for it.

    I see git as a ‘patch building’ tool. Rebase is the best thing since sliced bread.

  25. I was looking for git tutorials and as the author of the monad tutorial you cite I thought I’d search on “You could have invented git…”. I didn’t actually expect to get a hit. But I did, and it’s pretty useful. Thanks!

  26. It strikes me that people are being intimidated into using a product which is not the right choice for their setting.

  27. To elaborate on that, this is a story of a guy for whom Git is solving a problem he didn’t have, who raises serious and genuine concerns that are also shared by many others, and has had a group of people round on him to push him into using it.

    Some have been downright abusive. Bullying. Others have been passively aggressive while some have bent over backwards with help, but with the goal in mind that their efforts will recruit another Git convert.

    What is this? Church of Gitentology? It’s very worrying.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s