pieterh wrote on 25 May 2012 17:01
The C4 process we adopted some time back for ZeroMQ (except our dependency on Jira), CZMQ, and some other projects, looks like it's working as planned. But the drama of science lies in the extremes. If we really can reduce change latency to almost zero using C4, how does this affect how we deliver stable releases?
First, what do I mean by "change latency" and why is this important? Without diving too deep into the theories of collective intelligence and problem solving, C4 asserts that the optimal design for a software development team is two halves, one that sets challenges, and one that solves them. While pair programming puts two drivers behind the wheel, it gives no guarantee that they drive in the right direction. The challenge/solution pairing gives one person, or group, the task of defining the right directions, and another the task of driving as fast as possible in that direction.
C4 drives a constant stream of such challenges, written as issues that capture problems. As long as the people writing the issues have some empathy with real users, or are real users, the product moves accurately in the right direction.
So each answer to a well-expressed problem comes in as a change to the software product. About 10-20% of these changes will be wrong, badly done, or solving problems that are themselves "wrong". The usual approach with software development is to accumulate a large number of changes, and then weed out the errors one by one, until the code is stable enough for production use.
Even the very best programmers will make significant numbers of mistakes. They don't always appear as bugs; they can be too-complex designs, solutions to problems no-one cares about, features that don't work as expected, functionality that people want which is "impossible" due to other design decisions, and so on. Even a really excellent library like ZeroMQ has had many of these mistakes in it. Often we end up choosing a piece of software just because the good outweighs the bad. None, it seems, can be perfect.
However, with C4 we apply changes directly to the master version and we look for a fast yes/no answer to the question, "was that a good change?" We don't use branches because they slow down the time it takes to get this answer. We don't over-discuss pull requests for the same reason. In fact we're doing whatever it takes to reduce the round-trip time from "I have identified a problem and propose solution X" to "Yes, solution X makes sense and it seems to work fine" or "solution X is insane and I've make a patch that reverts it".
With ZeroMQ we're now getting change latencies of a few hours.
What does this mean for stable releases? I'm not yet certain but what seems to be emerging is that the git master is perfect 80-90% of the time, and one patch away from perfection the remaining 10-20% of the time. The remaining issues in ZeroMQ are all in old code, none in new code. This is highly significant.
Let's think this through again. Historically, the only reason for making stable releases was that we had lots of change that took months or years to fully validate. I'm fully sure now, with evidence, that making large changes is both unnecessary, and sub-optimal. Instead, with C4 we aim for a steady stream of small iterative changes, where each can be validated or rejected very rapidly. Thus, we have removed the need for stabilization.
C4 is also a formal contract for ongoing interoperability. That is, we do not break old APIs or protocols. In the old days we happily broke these, and bumped the version number. That didn't help anyone, in fact. People stayed away from ZeroMQ/3.0 and ZeroMQ/4.0 because interoperability is more important than new features.
We will make a stable release candidate of ZeroMQ/3.1 soon. But I'm curious to see how far that stable candidate diverges from master. We'll continue to make formal releases for obvious reasons: people expect this, they do not trust github masters, they need well-labeled packages for internal deployment. But really, over time, I think the whole concept of "stable releases" will disappear, and be replaced by "latest master that passed all tests".
Comments