Septic code: why some large IT projects never go into production
A common pattern in the failure of large IT software projects is “the Never-Ending Story”, which I described back in 2000 (PDF) as follows:
The client contracts with the manufacturer to develop and install a system. The project starts. The completion date slips. It keeps slipping. Each time the adjusted delivery date approaches, the project slips yet again. At some point, one of three things happens: the manufacturer/vendor abandons the project; the client cancels the project; or the manufacturer delivers a system that the client terms wholly inadequate and unacceptable. In some cases, the effort has gone on for years, with millions of dollars spent and little to show for it.
As an IT consultant and as a consulting/testifying expert, I have been asked many times to review such projects. Several factors can cause this pattern, but there is one in particular that I don’t see discussed that much: septic code. By that, I mean that some portion of the source code created to date is so bad and has such a negative impact on other code that relies upon it that the project will never stabilize until that source code is cut out of the project and thrown away, and brand new source code meeting professional standards is created in its place.
Now, I’m not just talking about poorly written or poorly designed code — I’ve seen that all my professional career, and in some cases have had to rewrite such code to fix bugs or add features. I’m talking about code that is so bad and yet so critical for the project that the project will never stabilize, will never move towards production, until that code is removed wholesale. The first project where I encountered that was a major critical-systems re-engineering effort at a large corporation, back in the late 1990s. I actually reviewed the project three times, roughly once a year, and each time my findings were pretty much ignored. The last time I reviewed the project, I wrote the following, among other things, in a memo to the corporate executive over the project (I’ve changed the names here):
QUALITY OF WORK AND EFFORT
ISSUE: Several consultants said — and the rest pretty much agreed — that far too many of the deliverables, artifacts, and activities (e.g., algorithms, source code, system configuration, design/architecture documents, testing, defect tracking, scheduling etc.) are substantially below any acceptable professional standards and represent a profound threat to FUBAR ever going into production.
EXAMPLES: The code base is very fragile. A lot of it is bad old code that BigFirm didn’t have time to rewrite two years ago, but now is five times its original size and even worse. One consultant said he took a code listing, picked pages at random, and found problems on every page he selected. There is pervasive hard coding of what should be adjustable parameters or at least meaningfully named constants (e.g., # of [key items] hard-coded throughout with the literal value ’3′, a constant named ‘ninety_eight’). Builds take all night. App releases don’t run acceptably, if at all, in a production environment. Developers check in files that won’t even compile.
RISKS: The FUBAR project keeps being touted as a world-class development team, but it is not producing world-class, or even minimally-professional, results. This already shows up in the project delays and quality issues of the releases to date. What the team is producing will not only be very difficult to support and modify, it will in all likelihood be unusable, resulting in a complete failure of the FUBAR project.
My findings and recommendations were again ignored, and the project was shut down completely a year or so later, at a lost of over $10 million.
A more dramatic example was a project where I led a team that spent three months assessing the production readiness of its code from a quality assurance (QA) point of view. The project — at a Fortune 50 corporation, and involving two big-name consulting firms — had been originally forecast at two years and $180 million. When I was brought in, it had been going on for four (4) years, and the client had spent over $500 million. And yet, as it turned out, there was still not a single application or system that was ready to go into production.
What are the causes of septic code? Largely, it is the use of unqualified software engineers and architects. I believe that one of the reasons there is — and has always been — violent disagreement over whether there is a shortage of IT engineers is because a lot of people are looking at it the wrong way. They tend to assume that the issue is simply one of numbers: how many people are working or seeking work as IT engineers, perhaps with some established level of education or certification. What they should be asking is: how many of those people are actually qualified and talented? The answer: not nearly enough, and too small a fraction to meet industry needs.
I wrote about this in BYTE Magazine back in January 1996:
The conclusion I have reluctantly come to after more than 20 years of software
development is this: Excellent developers, like excellent musicians and
artists, are born, not made. The number of such developers is a fixed (and
tiny) percentage of the population. Thus, the absolute number of such
developers grows very slowly. At the same time, the demand for them expands
rapidly due to the world’s increasing use of, and reliance on, software.The situation is worse than it appears. Some of these innately talented people
never go into the computer industry. Many who do never develop their full
potential. Others become prima donnas, demanding large salaries and extreme
benefits. Or they become “cowboy programmers,” shooting from the hip and
holding teams, projects, or entire companies hostage. A few burn out and leave
the field. Of those left, only a fraction meets the requirements for your
project.This is not to slight the decent, talented software engineers, the ones who
study hard and work hard at developing and maintaining their skills. Indeed,
if not for them, we wouldn’t have a software industry at all. But even they
can’t meet the demand, and their efforts are undermined by the mediocre (or
worse) programmers. (“The Real Software Crisis”, BYTE, January 1996).
In other words, the IT engineer gap is filled by less-qualified and unqualified personnel who are able to find and keep jobs. But in the end, bad coders produce septic code, code bad enough that, as with the FUBAR project above, it can never really be massaged enough to be put into production, but instead needs to be excised and discarded.
That brings up the next question: why are unqualified IT engineers hired in the first place? Several reasons, actually. First, it is a challenging and often slow-moving process to find, assess, and hire top-notch engineers. People who look great on paper and interview strongly, and who even can answer brain-teasers well, can turn out to be a bust during actual development cycles. My own hiring/screening experience led me to one simple heuristic: the best predictor of success for a given candidate is the strong recommendation of someone whose opinion you trust and who has worked with that person. Sadly, that option is not available for many of the candidates you might interview.
Second, human resources (HR) departments often eliminate talented engineers because of check-list screening (“Must have an MS in CS”, “Must have X years of language Y”, etc.). This eliminates some of the best people out there, while letting through people whom you just shouldn’t hire.
Third, those doing the hiring may not be all that qualified themselves. As Steve Jobs used to say (as summarized by Guy Kawasaki):
…Steve believed that A players hire A players—that is people who are as good as they are. I refined this slightly—my theory is that A players hire people even better than themselves. It’s clear, though, that B players hire C players so they can feel superior to them, and C players hire D players. If you start hiring B players, expect what Steve called “the bozo explosion” to happen in your organization.
Fourth, some organizations just have trouble attracting talented and qualified people, due to lower salaries, less interesting work, less attractive work environment, and so on. I think this is why so many failed or troubled projects occur in state and federal government agencies (cf. here, here, and here for very recent examples). Even (or especially) in those cases when the bulk of development is being done by an outside firm, the lack of in-house talent stymies an accurate assessment of the work being produced by the outside consultants.
Fifth, I believe our entire approach to staffing and running IT development teams is misguided and counter-productive. I think we should approach it more from the perspective of a professional sports team rather than the “infinite number of interchangeable code monkeys”-approach favored by much of industry and most of government.
Solutions for your own organization? Do a better job of assessing the people you already have. Keep the great ones; get rid of the bad ones. Hire more great ones. No, it’s not easy, simple, or cheap — there’s a reason why my BYTE article nearly 20 years ago was titled “The Real Software Crisis”. But building and keeping the right team beats having a multi-year, multi-million-dollar project failure.
Comments (4)
Trackback URL | Comments RSS Feed
Sites That Link to this Post