Click here to view the complete list of archived articles
This article was originally published in the Spring 2009 issue of Methods & Tools
When Good Architecture Goes Bad
Mark Dalgarno, mark @ software-acumen.com
Software Acumen, www.software-acumen.com
Every developer eventually encounters it at some stage in his or her career – the code that no one understands and that no one wants to touch in case it breaks. Sound familiar?
But how did the software get that bad? Presumably no one set out to make it like that? The answer is that the software is suffering from Software Erosion – the constant decay of the internal structure of a software system that occurs in all phases of software development and maintenance
At the architectural level, Software Erosion is seen in the divergence of the software architecture as-implemented from the software architecture as-intended. Note that when talking about the architecture as-intended I’m not speaking here about the initial planned architecture of the software system. Software architectures should evolve over time – this is to be expected as new requirements emerge – so the intended architecture is what your current conception of the architecture is. With software erosion what we’re talking about are unintended modifications or temporary violations of the software architecture.
The problem with software erosion is that its effects accumulate over time to result in a significant decrease in the ability of a system to meet its stakeholder requirements. Unless you take steps to actively pinpoint and stop software erosion it will gradually creep up on you and make changing the software further significantly harder and less predictable. In the worst case it could lead to the cancellation of the project or, for particularly significant projects, the closure of the business.
Types of Software Erosion
To begin to tackle software erosion you need an understanding of how it typically shows itself. Common types of software erosion include:
A well-known example of software erosion was highlighted in a reverse-engineering experiment on two separate versions of ANT some years ago. ANT V1.4.1 (11 October 2001) and ANT V1.6.1 (12 February 2004) were reverse-engineered and the results were compared.
At the time ANT was built in three layers, from the top-down these were taskdefs, ant, utils. In the earlier version these layers were well separated and the ant layer was monolithic but small. In the later version the ant layer was still monolithic but had now become very large – making it harder to understand and work with. More problematically a new upward dependency from the lower-level ant layer to the top-level taskdefslayer had been introduced.
These types of erosion problems lead to code that is hard to understand, hard to modify and hard to test. But how do you know whether you’re suffering from software erosion?
Are you suffering from Software Erosion?
Perhaps the first thing to observe is that most projects will suffer from software erosion at some stage unless there is a conscious effort to pinpoint and stop such erosion. Even projects that are relatively short-lived can suffer from it. One example I have heard about involved a software project that had to be scrapped after only 6 months because it had already eroded badly.
There are some common things you can look out for when deciding how badly your software is suffering from software erosion:
At a detailed level, software erosion results in problems such as code living in the wrong place, layering violations (as seen above in the ANT example), complex cycles insufficient decomposition, big packages etc.
Costs of Software Erosion
It can be hard to measure the cost of software erosion and convey this cost to non-technical people who often have to sanction work to stop software erosion. Even though software erosion causes reduced productivity, reduced quality and increased time-to-market, no one specific point of erosion causes these effects in isolation, rather it is the effect of multiple points of erosion that combine and reinforce each other to cause them.
However, a study by the US Air Force Software Technology Support Centre (STSC) attempted to put some rough measure on the costs of software erosion. The researchers took two versions of a mature software system (50k LOC) and asked two different teams to perform the same maintenance task (adding approx. 3k of code) on their respective version. Version 1 was an existing system suffering software erosion. Version 2 was the same system but with the architecture restructured to remove erosion.
The results were staggeringly different. Team 1, working on Version 1, needed over twice as long as team 2 to complete this relatively short task. Furthermore, Team 1’s results contained more than eight times the number of errors than the work submitted by team 2, working on version2. Erosion in a small system such as this still had the potential to lead to significant problems when the software was maintained.
Causes of Erosion
By now you should have some clues as to how software erosion comes about. It does not arise purely spontaneously. Software Erosion comes about through change.
Pressure for change comes from a variety of sources. The need to add new features to a product to help persuade people to buy it, changes to the environment within which the software is deployed e.g. to support different networking or GUI standards and technical changes, such as the desire to adopt new coding standards all have an impact on the software. Where the initial vision for the software doesn’t allow for change, such erosion effects will be seen very quickly.
Software Erosion is also known as software decay or code rot and by similar terms. However, these don’t adequately capture the notion that it is forces external to the software that are ultimately the cause of problems within the software. Erosion is not something that just happens to the code without someone actively making such changes. This is why I feel that notion of software erosion more adequately describes this gradual wearing down of the ability to work effectively with the software.
The needs of the business can also contribute to software erosion. Even though deliberately eroding your software causes bigger problems down the line it may be in the best interest of the business to do this for some short-term gain. The problems build up quickly however if the business does this repeatedly without spending time to refactor the eroded code. Every developer is familiar with the ‘quick-fix’ that becomes a permanent feature.
Real-World Examples of Software Erosion
How bad is this problem in practice? In 2007-08 I decided to investigate this question by running a number of workshops at different software events in the UK and by engaging in some discussions with some software practitioners further afield.
At every workshop I ran participants spoke about many different examples of systems suffering software erosion:
In every workshop all but a few people either were working on projects that had eroded quite badly or had worked on such projects in the past.
Case Study - Outsourcing of a 1MLOC C/C++ system
I outline below a real-world case study in order to get you thinking further about software erosion. My recommendation is to spend 10-15 minutes (either on your own or with a colleague who is also reading this article) thinking about the questions before proceeding to the discussion.
Case Study Project History
A company developed a software system over a number of years. Six years ago the software was transferred to a company-owned outsourcing centre in India where it has been developed since that time. At the time of the transfer the organisation believed that the architecture of the system as intended was well documented and matched what was implemented.
The software is critical and cannot be thrown away easily.
Over time more staff were added to the project to maintain a steady flow of new features. The company has a similar product that is maintained and evolved by 5 developers whereas the Indian department now has 50 developers.
The company recently compared the amount of work done by these two teams and assessed that they delivered roughly the same amount of work.
Present Situation
Acting on this difference in productivity the company compared the architecture from 6 years ago (as the outsourcing took place) against the architecture of the current code and found that many parts of the system have dependencies that are not intended.
The intended architecture was documented, so in theory all involved personnel could have compared actual to as-intended architecture. The initial architecture was probably appropriate for the current system (so it's a good architecture that has gone bad).
The company now intends to bring part of the software back under control in Germany while leaving part under control in India.
Questions
Think about whether it is credible that software erosion led to this significant decrease in productivity? What do you think of the company's proposed solution?
Discussion
The software has been developed over a number of years; the team and their development processes; tools and technologies may have changed during this time. Given we can probably reason that the software has probably been modified a lot before it was handed over and so conclude that it’s likely that the architecture at the time of handover may have eroded.
There was a major personnel change 6 years ago when the project was handed over. The two different organisations will have different cultures, knowledge & skills. It is not clear that these differences will be lessened just because both organisations are part of the same multi-national. This could lead to further erosion.
We also have to consider the reasons for the switch and the way the switch took place. Did the organisation cut costs on the project when the software was handed over? Was there a backlog of work on the project that it was felt the new team could tackle sooner or better? How was the handover done? Did they redeploy the existing team elsewhere or did they fire them? Were people from the old team made available to help people from the new team get up to speed? How much time was the new team given to learn about the software before having to start modifying it? If there was no effective handover and insufficient time allowed for the new team to learn the architecture and the code base then erosion is more likely to have occurred.
We’re told that the ‘Software is critical and cannot be thrown away’. We’re also told that there’s been a steady flow of new features Both of these indicate that changes have and will take place implying that erosion could be present. This is confirmed by the assessment that there are a lot of unintended dependencies in the architecture as-implemented.
My belief is that it is credible that architectural decay contributed to the team’s problems but that it cannot be untangled from other issues.
Stopping Software Erosion
Stopping software erosion requires management commitment. If managers are only interested in the short-term viability of their software projects then it is hard for developers to get the time and make the effort to tackle the problem. This does not excuse developers from doing what they can to fight erosion but will inevitably make their struggle less effective.
If management commitment is present then the following outline pattern can be used to stop software erosion. How you implement the pattern depends on what tools you have available, what domain your project lies in, how mature the erosion problem is etc.
Stopping Software Erosion – a Pattern
Stopping Software Erosion – Cultural Factors
As noted above, if top management doesn’t support the fight against software erosion then developers have their work cut out to stop erosion. With management support you can create a culture where stopping erosion is valued. This culture is likely to have characteristics such as – an emphasis on regular refactoring, clear assignment of responsibilities, sharing of architectural knowledge and work, frequent communication between the whole group.
In Designing Maintainability in Software Engineering: a Quantified Approach Tom Gilb describes one team’s ‘Green Week’ – one week set aside each month to focus on improving their software’s maintainability. This proved more successful for the team than their earlier one day a week approach and had the added benefit of making the development team feel empowered.
A few words on rewriting
Before I wrap up I’d like to say a few words about software rewrites. As I noted earlier, pressure from development teams to rewrite software commonly manifests itself when that software has eroded. In the worst case the development team uses the excuse of a possible future rewrite to delay refactoring work to the software. When this occurs, the software continues to erode until it reaches a state where working with it becomes very difficult. Even if a rewrite may once have been avoidable if action had been taken the result is that a rewrite becomes inevitable due to the negligence of the team.
As a developer, when faced with a decision about rewriting some software you should always ask yourself whether you are planning to rewrite it for the right reasons. Is it because you cannot make the software maintainable or is it to get rid of code you haven’t tried hard enough to refactor or code that someone other than you has written? Worst still, is it just to get some hot new technology onto your CV?
As a manager ask yourself whether you can afford a rewrite? Do you have the right people with the right skills available for the right length of time? Do you understand the risks of new tools and technologies? Do you understand what you have to build? Are you rewriting the software or building something brand new? Worst still, how long will it be before your competitors catch up? In the Doomsday scenario, can your organisation handle the total failure of the rewriting project?
If you’re about to risk an expensive and lengthy rewrite of your software, are you really sure that you’ve exhausted every approach to fighting software erosion in your current code base?
Summary
Any successful software system is likely to evolve. Unless preventative work is undertaken the software will erode. As the software erodes the cost and risk of further development rises. It’s rarely too early to start fighting software erosion. The costs of software erosion start to bite very quickly once it sets in.
There are lots of different things that can be done to stop software erosion – you (just) need to work out what the best value approach is for your particular project. If you are a manager then create a culture where fighting software erosion is encouraged and supported. If you don't do this then no one will care about erosion. If you are an architect or developer then educate yourself about the different causes of erosion and the different approaches for fighting it. If you’re interested in finding out more, or sharing your ideas on stopping software erosion, then please get in touch.
References
See http://www.stsc.hill.af.mil/crosstalk/2005/11/0511SangalWaldman.html for more information on the Ant case study and http://codefeed.com/blog/?p=98 for a brief early Ant project history.
General Background Reading:
Lehman's laws of software evolution: M M Lehman, J F Ramil, P D Wernick, D E Perry, W M Turski, "Metrics and Laws of Software Evolution The Nineties View," metrics, p. 20, Fourth International Software Metrics Symposium (METRICS'97), 1997
Refactoring in Large Software Projects: Performing Complex Restructurings Successfully, Martin Lippert, Stephen Roock, Wiley 2006