Parallel Development Strategies for Software Configuration Management
Tom Bret, Confluence Systems Ltd, in association with MKS Inc (www.mks.com)
Software project managers routinely face the challenge of developing parallel configurations of software assets. If notidentified and planned for in advance, the complexity introduced by paralleldevelopment can derail even an otherwise well-managed project.
This article describes business situations where parallel development is necessary and examines strategies forconfiguration management in each situation. "Patterns" are identifiedwhich allow different projects with similar characteristics to be managed in arepeatable way, and make it possible to carry forward a body of knowledge andexperience from one project to another. Patterns may be combined or nested todeal with complex scenarios. Parallel development potentially needs to bemanaged at multiple levels within a software asset repository, recognising thatdifferent products and components may have their own independent development lifecycles.
What is Parallel Development?
Many software projects, especially in their early stages, follow a strictly linear development progression in which eachsuccessive version of the software is derived from, and increments, the previousversion. The configuration management of such a project is straightforward,since in general there is just one "latest and greatest" version ofthe software, which forms the context in which all development work takes place.
Parallel development occurs when there is a need for separate development paths to diverge from a common starting point, sothat there is no longer a single "latest and greatest" version, butinstead two or more concurrent "latest" configurations (often called variants)where new development is carried on. Also implicit is the potential need for thedivergent development paths to converge again. This means that any strategy fordevelopment branching should also take into account the process for merging.
Why Parallel Development?
There are many reasons why parallel development may be necessary for a project. These include:
- Release preparation
- Post-release maintenance (segregated from new development)
- Tailored or customer-specific software
- Segregation of work by different development teams or individuals
- Segregation of work on different features
- Deployment of different software variants into different environments
Macro and Micro Parallelism
Parallel development can occur at the level ofa single file or other configuration item (micro level) or at the level of anoverall project configuration (macro level).
Consider the case cited above regarding"segregation of work by different development teams or individuals".If different developers need to make changes to the same configuration item fordifferent purposes, this can often be handled by short-term branching at themicro level. The second and subsequent developers to complete their work wouldbe responsible for merging to maintain the integrity of all changes to thatconfiguration item, but still within a single overall development context.
In the case of changes involving more than asmall number of configuration items and/or timescales longer than a few hours,parallelism at the micro level would rapidly become unworkable due to the largenumber of ad-hoc merges; teams or individuals waiting for completion of eachother’s merges; and confusion over the merge status and correct"latest" version of each configuration item. In this situation itmakes much more sense for each team to work with its own different"latest" configuration of an entire project (macro-level parallelism).Behind the scenes, branching of individual configuration items still takesplace, but a good software configuration management (SCM) tool should handlethis (more or less) transparently to the user. Merging would be deferred until asuitable development milestone, when a batch merge of all the desired changesmay be performed to result in a single "latest and greatest" versiononce more, forming the basis for the next development cycle.
Brad Appleton  refers to this as "file-oriented" and "project-oriented" branching, and summarises as follows:
"Most VC [version control] tools supporting branches do so at the granularity of a lone file or element. … This is called file-oriented branching."
"But branching is most conceptually powerful when viewed from a project-wide or system-wide perspective; the resultant version tree reflects the evolution of an entire project or system. We call this project-oriented branching. "
Fundamental for the capability of a tool to support macro-level branching is the capability to save and reproduce successiveversions of the entire project configuration on any branch, known as baselines or checkpoints.
The remainder of this article will deal with macro-level branching and the business situations to which it can be applied,using a modern SCM tool with good support for macro or project-oriented branching.
Frequently Experienced Problems
The assertion, made earlier, that "the complexity introduced by parallel development can derail even an otherwisewell-managed project" is backed up by anecdotal evidence of problemsexperienced on software projects where branching and merging has been poorly planned or poorly controlled.
- Project granularity: failure to allow for a separate lifecycle of sub-components
Product released with immature component code, or component changes interfere with product release cycle
- Failure to segregate maintenance fixes from new feature development
New features "leak" into production code when not fully QA’d and/or not paid for by customers
- Failure to segregate custom code from base product
Custom features accidentally released when not paid for and/or in violation of confidentiality
- Over-complex branching model
Excessive amount of effort spent in merging and maintaining branches; developer confusion about which branch they should work on
- Failure to track compatible sets of changes and apply them consistently during merge
Broken build or non-working features after merge
- Failure to anticipate the need for parallel development and to plan early
Inappropriate tool selection and/or repository design becomes "locked in" in the early stages of the project and only becomes a problem when it is too late to remedy without excessive cost, effort and/or delay
Careful and appropriate design of the branching model is critical to success – as expressed by Mario Moreira :
"A branch and merge strategy should be designed prior to creating branches to first ensure that branching is needed and secondly to ensure that the branch structure under consideration will work for the project."
Tool selection can also be critical, since development teams may be forced into one or more of the above problem situationsby constraints of the configuration management tools they are using. Tool selection is typically done early in the development project lifecycle, and itis important to think ahead, since branching requirements may typically onlyarise later in the lifecycle: for example, once post-release maintenance comes into play after the first production release.
Introducing SCM "Patterns"
A key factor to help achieve a successful design is the use of patterns. A pattern is an establishedengineering solution for a certain class of problems: for example, in civilengineering, the "river crossing" class of problems may lend itself tosolution patterns such as "bridge", "ferry" or"tunnel". The engineer will be guided to a choice of pattern byparameters such as width, depth and flow speed of the river, type and density ofthe expected traffic, and so on. In both choosing a pattern and subsequentexecution of the solution, the engineer can draw on a body of prior experience with these patterns.
Most readers will be familiar with the well-established concept of patterns in software engineering. The use ofpatterns is becoming established in Software Configuration Management  thoughless mature than in software systems analysis and design, it is still a very useful and instructive approach.
We now discuss various patterns that can be applied in different situations where parallel development occurs. It is worthnoting that these patterns may – indeed should, be adapted and/or combined(with each other or with other SCM patterns) to achieve the best solution fit with a particular set of development project constraints and requirements.
The treatment of patterns in this article is somewhat simplified compared with Berczuk & Appleton , ; whereappropriate, cross-references are given to the Berczuk & Appleton nomenclature .
Pattern: Branch per Release
[Berczuk & Appleton: Release Prep Codeline]
Probably the most frequently encountered motivation for parallel development, mentioned several times already in thisdiscussion, is the need to segregate post-release maintenance from ongoingdevelopment targeted on future releases. Taking this a logical step further, wecould (or indeed should) segregate the Pre-release fixes in the same way. Theoptimum point at which to diverge a release configuration from the ongoing"mainline" of development would be at the point of "featurefreeze", when the decision is taken not to include any more features in therelease being planned. From this point onward it becomes important that nofurther feature-enhancement code leaks into the release and that only changes necessary to fix existing features are applied to that configuration.
A convenient simplification is to continue post-release maintenance on the same development path as the release preparation(rather than using a separate release branch) – the successive releasecandidates, releases and patch releases simply being incremental checkpoints along the same development path.
In the "project history" diagram above, we see that all the main feature changes are implemented on the project"trunk". (This corresponds to the Berczuk & Appleton Mainlinepattern.) A checkpoint has been taken on completion of each change and labelled with the change number (change 101, 102, 103).
In parallel with this, fixes during release preparation and post-release maintenance have been made on release branches thatdiverge from the trunk at each feature freeze checkpoint. Each checkpoint on thebranches would correspond to a release candidate, beta or patch build. (Theillustration has been simplified and in real life there would probably be many more checkpoints, fixes and changes!)
A criticism sometimes levelled at this pattern is that fixes need to be made in more than one place: any fixes made on arelease branch are likely also to be needed on the trunk (mainline), andpossibly on other release branches if multiple releases are active. This is true but is mitigated by the following factors:
- The fixes which need to be merged will generally be relatively small code changes and therefore easy to merge
- The (generally much larger) feature changes do not need to be merged at all
- In practice, some of the fixes will have been rendered obsolete by other changes on the trunk and will not be required
Usually it is better (both for quality and productivity) to design for smaller merges, since merging can be atime-consuming process and bears the risk of introducing errors.
Pattern: Branch per Change
[Berczuk & Appleton: Task Branch]
The motivation for using a "Branch per Change" pattern has already been discussed above under "Macro andMicro Parallelism". To summarise, when different development teams areengaged in large and complex changes that may involve conflicting changes to thesame configuration items, it may be desirable to isolate them from each otherusing different development paths. This means they are each able to check-ininterim versions of their code without adversely affecting each other (as mightoccur if they were both using the same "mainline"). Merging back tothe trunk, or mainline, is deferred until each change is complete and the mergeis "batched", meaning that all the related code changes are merged atthe same time so that the mainline remains consistent. A good SCM tool shouldhave the capability to track which code changes are related to each other and also to initiate the batch merge of those changes.
This time in the project history diagram we see, diverging from the "Release 1.0" checkpoint on the trunk, twoseparate branches for "Change-101" and "Change-102". Oncompletion of each change, the code is merged back to the trunk, resulting in anew checkpoint ("merged 101" etc). After feature freeze and use of arelease preparation branch, the cycle restarts from "Release 2.0" and "Change-103".
The main difficulty with using this pattern is that the amount of code to be merged is much larger than with "Branch perRelease"; this is because each merge involves the entire code for some newfeature, rather than just a bug fix where the code changes are often quite small.
Also, if this pattern is rigidly applied – for example, by restricting developer access on the trunk so that developers areforced to work on branches – then it imposes a large procedural overhead oneven a very small and simple change. This could have an adverse effect onproductivity (due to the disproportionate amount of time spent performing mergesand verifying the merge results) as well as causing a proliferation of branchesthat could lead to confusion. Arguably, for most development teams, it would bepreferable to create task branches only for changes of long duration or whichare known to conflict with other changes already in progress, and to allow simpler changes to be done on a single shared "mainline".
Anti-Pattern: Good Code on Trunk
If a pattern describes a known engineering solution to be emulated, an "anti-pattern" could be defined asdescribing a known engineering pitfall to be avoided! Consider the –superficially quite convincing – concept of only allowing "goodcode" (i.e. passed by some QA process) on the project trunk. This is oftenfound in conjunction with the "Branch per Change" pattern and in fact is illustrated in the "Branch per Change" diagram, above.
The problems here are:
- The excessive merge workload and the effect this has on developer productivity;
- Over-restrictive process interfering with progress of the development team;
- Additional QA overhead; and
- The fact that merge outputs are generally not reliable anyway, certainly requiring verification and often necessitating rework.
Looking again at the diagram, we see that Change-101 has been completed first and may be merged straightforwardly back tothe trunk. Change-102, when completed, cannot be merged without eitherlosing the Change-101 updates or introducing non-validated merge resultsonto the trunk. To comply with the "Good Code on Trunk" principle, theonly option is first to merge the Change-101 code to the Change-102 branch,repeat the QA process (verifying that both the Change-101 and Change-102features still work) and then to perform a second merge onto the trunk. Worsestill, no other changes may be merged to the trunk while this second merge ispending, because this would invalidate the QA for Change-102, so the projectprogress could be held up for possibly several days while this is ongoing. If weimagine a large project with say fifteen parallel change branches instead of two, we can see that the extra effort and complexity would become overwhelming.
Pattern: Branch per Environment
[No direct equivalent in Berczuk & Appleton]
A situation often encountered, especially with software systems being developed for "in-house" use, is therequirement to migrate software into successive different environments.Typically this starts with a development environment (sometimes severaldifferent ones, allowing for segregation between work by different teams) andmoves on into system test, UAT (user acceptance test) and production environments.
By modelling each environment with a parallel "development" branch in the software repository, it is possible toharness the CM tool to assist with:
- Packaging and deployment of migrations between environments;
- Status accounting of current configuration in each environment;
- Access control to different environments; and
- Traceability of changes as they move through the environments (for instance, so that testers which fixes and features are available for test).
Composite and Nested Patterns
The patterns presented here are abstractionsand simplifications of situations encountered in real-life software developmentprojects, and represent only a fraction of the possible patterns that could beidentified. When designing a configuration management implementation, it isimportant to maintain flexibility to match the needs of the project, to usepatterns for guidance but not to be constrained by them, and to adapt or combinethem as appropriate. The following are some examples of how this might be approached.
Branch per Release nested at component level. A code repository is organised as a number of products each made up from somecombination of components. Products and components have their own independentrelease cycles and each is controlled using a Branch per Release pattern.Different products (or successive releases of the same product) may utilise different releases of the same component.
Branch per Release adapted for tailored customer code. A similar structure to the above couldbe used to manage the delivery of tailored versions of a core product. In thiscase the core product would be treated as a component (or set of components),and customer-specific code would also be segregated into components. A"container" project for each customer would not hold any actual code,but just a set of pointers to the correct versions of the core and customcomponents. (This depends on the use of an SCM tool with appropriate support forcomponent inclusion or "sharing".) The design objective here should beto avoid a proliferation of customer-specific branches within the actual components themselves.
Composite of Branch per Release with Branch per Environment. The development of a software systemfollows a Branch per Release pattern and each release is deployed intosuccessive environments using a Branch per Environment pattern. Each release branch spawns its own sub-tree of environment branches.
Composite of Branch per Environment with Branch per Change. A Branch per Environment pattern isused, but additionally a Branch per Change pattern is superimposed on thedevelopment environment. Development is not a single branch, but a sub-tree of change branches.
It is hoped that this article has encouraged software project managers and SCM practitioners to take a fresh look at theissues surrounding parallel development, and the engineering solutions by whichthey can be addressed. Careful analysis of the project requirements at an earlystage, coupled with appropriate tool selection and a disciplined developmentprocess, can avoid many of the difficulties which have traditionally beset branching and merging within software repositories.
To conclude - some key points to keep in mind:
- Plan early
- Use patterns
- Get the right tools
- Merge little & often
- Batch merges of related changes
- Avoid excessive complexity
- Be creative!
- Brad Appleton (Motorola Network Solutions Group) et al
Streamed Lines: Branching Patterns for Parallel Software Development
- Mario Moreira (Fidelity Investments Systems Company)
ABCs of a Branching and Merging Strategy
CM Crossroads News, November 2003
- Steve Berczuk with Brad Appleton
Software Configuration Management Patterns: Effective Teamwork, Practical Integration
- Steve Berczuk with Brad Appleton
Software Configuration Management Patterns Reference Card
MoreConfiguration Management Knowledge
- Lean Configuration Management
- Build Patterns to Boost your Continuous Integration
- Continuous Delivery Using Build Pipelines With Jenkins and Ant
- Open Source Software Configuration Management Tools