Git Branches Considered Harmful

pieterhpieterh wrote on 09 May 2012 21:49

forbidden.png

One of git's great features is how easy it makes branches. Almost all git projects use branches, and the selection of the "best" branching strategy is like a rite of passage for an open source project. Vincent Driessen's git-flow is maybe the best known. It has 'base' branches (master, develop), 'feature' branches, 'release' branches, 'hotfix' branches, and 'support' branches. Many teams have adopted git-flow, which even has git extensions to support it. However, in this article I'll argue that public git branches are harmful, based on experience and evidence, and propose a branch-free approach, based on forks.

Background

Let me start with my credentials. My first open source project was Libero from 1991. I wrote Xitami, a popular open source web server, and killed that in 2001. I wrote most of OpenAMQ, the first AMQP implementation. I founded and steered the ZeroMQ community and have maintained its stable releases for years. If there is one thing I know really well, it's how to build excellent software.

Git is a revolution, especially when combined with github. In the last year or two, the github/git combination has become a key tool for organizing teams, and building processes like C4 and PC3 that are (as far as I know) the first reusable contracts of their kind.

Here is a section of PC3 that will shock some people:

  • The project SHALL have one branch ("master") that always holds the latest in-progress version and SHOULD always build.
  • The project SHALL NOT use topic branches for any reason. Personal forks MAY use topic branches.
  • To make a stable release someone SHALL fork the repository by copying it and thus become maintainer of this repository.

To be clear, it's public branches in shared repositories that I'm talking about. Using branches for private work, e.g. to work on different issues, appears to work just fine.

The PC3 text is not accidental. This section came from trial-and-error, mainly in the ZeroMQ community. Originally, when Martin Sustrik and I (the pragmatic core developers) started using forks instead of branches for ZeroMQ's stable versions, many people reacted with shock and horror. Today, people have less emotional response. Tomorrow, I think it'll be clear that branches were, in fact, an entirely wrong approach inherited from the days of Subversion and monolithic repositories.

More profoundly, the branches vs. forks argument is really a wider design vs. evolve argument about how to make software optimally (both PC3 and C4 fully embrace the "evolve" approach). I may address that wider argument in a future article.

To make my argument here, I'll look at a number of criteria, and compare branches and forks in each one.

Complexity

The simpler, the better.

There is no inherent reason branches are more complex than forks. However, git-flow uses five types of branch, whereas PC3 uses two types of fork (development, and stable) and one branch (master). Circumstantial evidence is that branches lead to more complexity than forks. For naive users, it is definitely easier to learn to work with many repositories and no branches.

Learning Curve

The smoother the learning curve, the better.

Evidence definitely shows that learning to use git branches is complex. For some people this is OK. For most developers, every cycle spent learning git is a cycle lost on more productive things. I've been told several times, by different people, that I do not like branches because I "never properly learned git". That is fair but it is a criticism of the tool, not the human.

Cost of Failure

The lower the cost of failure, the better.

Branches demand more perfection from developers since mistakes potentially affect others. This raises the cost of failure. Forks make failure extremely cheap since nothing that happens in a fork can affect others not using that fork.

Upfront Coordination

The less need for upfront coordination, the better.

You can do a hostile fork. You cannot do a hostile branch. Branches depend on upfront coordination, which is expensive and fragile. One person can veto the desires of a whole group. In the ZeroMQ community for example we were unable to agree on a git branching model for a year. We solved that by using forking instead. The problem went away.

Scalability

The more you can scale a project, the better.

The strong assumption in all branch strategies is that the repository is the project. But there is a limit to how many people you can get in agreement to work together in one repository. As I explained, the cost of upfront coordination can become fatal. A more realistic project scales by allowing anyone to start their own repositories, and ensuring these can work together. A project like ZeroMQ has dozens of repositories. Forking looks more scalable than branching.

Surprise and Expectations

The less surprising, the better.

People expect branches and find forks to be uncommon and thus confusing. This is the one aspect where branches win. However, it's also a reason for sticking to FORTRAN and COBOL. We do not refuse innovation just because it's surprising.

Economics of Participation

The more tangible the rewards, the better.

A fully free process like PC3/C4 lets people organize around problems. Most organizations are not ready for such a radical management approach. But even a top-down approach needs people to feel rewarded for their work. Branches don't act like "product" but like "discrete variations of product". People have less interest in contributing to a discrete variation. Whereas everyone wants their name on a successful product. So the economics of branches are worse than the economics of forks.

Robustness in Conflict

The more a model can survive conflict, the better.

Like it or not, people fight over ego, status, belief. If your organizational model depends on agreement, you won't survive the first real fight. Branches do not survive real arguments and fights. Whereas forks can be hostile, and still benefit all parties. And this is indeed how free software works. Score one for forks, zero for braches.

Guarantees of Isolation

The stronger the isolation between production code and experiment, the better.

People make mistakes. I've seen experimental code pushed to mainline production by error. I've seen people make bad panic changes under stress. But the real fault is in allowing two entirely separate generations of product to exist in the same protected space. If you can push to random-branch-x you can push to master. Branches do not guarantee isolation of production critical code. Forks do.

Visibility

The more visible our work, the better.

Forks have watchers, issues, a README, a wiki. Branches have none of these. People try forks, build them, break them, patch them. Forks sit there until someone remembers to work on them. Forks have downloads and tarballs. Branches do not. When we look for self-organization, the more visible and declarative the problems, the faster and more accurately we can work.

Conclusions

Git branches are, in my experience and in shared repositories, harmful. It is better to work with a branch-free process that uses forks for stabilization. This comes from some years of trial and error on a wide range of projects. We have systematically found forks to be cheaper and safer and easier than branches. Branch-free processes like C4 and PC3 are real, and they work, in anger, both on closed source and open source projects. The only downside of a branch-free process seems to be that it shocks people with previous git experience. This is a passing effect, in our experience.

Comments

Add a New Comment
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License