Remember Conway’s Law
Some years ago, I was called in to lead a team of three other people in reviewing a major project at a Fortune 50 corporation. This project, which I’ll call QUBE, was a major end-to-end re-engineering of that firm’s mission-critical systems, intended to replace all the existing legacy systems. The QUBE project was supposed to have taken just two (2) years and cost under $200 million; when I was called in, it had been going on for four (4) years and had reportedly consumed around $500 million at that point. My initial charge was to determine what minimal quality assurance (QA) efforts would be needed to get this system in to production. We spent three months full-time on site, reviewing documents and (where it existed) source code, and conducting extensive interviews with personnel on both the IT and business sides of the organization.
There’s a lot I could (and may yet) write about what that review uncovered, but what I want to touch on here is the following observation, quoted verbatim (but redacted) from the executive summary of our final report (which I wrote):
The root cause [of project difficulties] lies in QUBE’s organization, which is divided up into numerous vertical and horizontal silos within and among the major streams (…) and cross-stream efforts (…), leading to a complex balkanization of testing efforts. The resulting barriers to communication, coordination and cooperation will almost certainly undermine any effort to impose an improved CORE test strategy (including the recommendations in this report) unless an appropriate test/QA structure is laid on top of QUBE.
While the focus of my review was on QA, my concurrent finding was that the architecture of QUBE was likewise balkanized. In fact, the image that came to my mind again and again was that of the old Windows 3-D pipes screensaver, shown above. There was no conceptual unity, no natural divisions of functionality — it was a lot of intersecting pipes and silos, but even the silos were often divided up among different groups.
Which gets us to Conway’s Law, which I knew of for decades because of Fred Brooks mentioning it in The Mythical Man-Month:
Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.
Put more simply, a system’s architecture is a reflection of the organization that built it. Brooks, after citing (and naming) Conway’s Law, goes on to explain:
It is a consequence of the fact that two software modules A and B cannot interface correctly with each other unless the designer and implementer of A communicates with the designer and implementer of B. Thus the interface structure of a software system necessarily will show a congruence with the social structure of the organization that produced it.
The byzantine architecture of QUBE was a direct reflection of how responsibilities had been divided up (or seized upon, for political reasons) by various individuals, groups, departments, and divisions within the organization.
The logical inference from Conway’s Law is that once your have determined the architecture of the system to be development, you need to organize the development team to reflect that architecture. And since (as Conway notes) the first architecture is seldom the best, the development organization may need to change as the architecture is refactored and modified. On the other hand, if the overall project gets chopped up and parceled out based on politics, turf wars, and external bidding, the system you build — if you can get it working — will reflect that.
Food for thought.
About the Author: bfwebster
Webster is Principal and Founder at at Bruce F. Webster & Associates, as well as an Adjunct Professor for the BYU Computer Science Department. He works with organizations to help them with troubled or failed information technology (IT) projects. He has also worked in several dozen legal cases as a consultant and as a testifying expert, both in the United States and Japan. He can be reached at 303.502.4141 or at bwebster@bfwa.com.Comments (11)
Trackback URL | Comments RSS Feed
Sites That Link to this Post
- Why US government IT fails so hard, so often « WORDVIRUS | October 10, 2013
- Why US government IT fails so hard, so often | Gadgets and Technology Blog | Gadgetship.com | All About Gadgets | Softwares | Mobiles | Tablets | Smartphones | October 12, 2013
- Obamacare and the Long Bomb : And Still I Persist… | October 26, 2013
I’ve been reflecting on Conway’s law recently, as a result of a discussion about Agile principle #11: “The best architectures, requirements and designs EMERGE from self-organizing teams.”
So, I’ve been asking, “Given an Agile self-organization, what kind of systems can we expect?”
I think the dominant pattern will resemble reality TV, where coalition-building trumps technical integrity. Interestingly, this is not new, as your case study (and Brooks’ observation) indicates – plus ca change…
My good friend Phil Armour has long advocated that any large system development project should first start with an organizational design which should temper technical commitments. Bill Curtis’ 1988 report on how this plays out is still the single best analysis of big teams that I’ve read.
A key insight is that a viable and articulated vision of the problem and solution must be reached early on and sustained as the organization and target evolve – without that, failure is highly likely. “A field study of the software design process for large systems” http://dl.acm.org/citation.cfm?id=50087.50089&coll=DL&dl=GUIDE&CFID=252274394&CFTOKEN=64678477
Great post – thanks.
Bob
Thanks for the insightful comment and the link, Bob — always a pleasure to have you come visit. ..bruce..
I have seen Conway’s Law in action all too many times. I was in a similar situation in the latter 1990s. I was at the time an independent consultant on retainer to Microsoft out of their NYC office. A major client, who shall remain nameless (its initials were ML before being bought up in 2008) had been working on a project for five years and it was in complete disarray, a billion dollars over budget and three years overdue. It was not a “rocket-science” type of project and competitors had implemented similar systems with no trouble.
Each development group worked independently and never communicated with others. They took great pride in using tools, languages, hardware, and methodologies nobody else was using (there were, for example, grid widgets from seven different vendors being used by different teams). I had actually been offered a contract to manage a “reusability lab” on the project years earlier. Part of my job would have been to standardize all third-party tools, etc., and provide for re-use of internally developed classes and libraries. When I was interviewed by a VP, my first question was what kind of support could I expect from above in order to force these cowboy departments to go along with a single set of standards. She looked crestfallen and said “none.” It seems nobody actually owned the project and each year the director would use the sheer size of the project on his CV to get an executive VP job with Smith Barney or Deutsche Bank, so there was nobody at the top with any skin in the game. My response to the first offer was to thank her for lunch and take a contract in Florida. Little did I know…
After 5 years, ML appealed to MS for help and mentoring (they were a big MS shop and thought that every problem would be solved by the next release of Exchange!?!?). I led a team of 8 who joined the project to reverse engineer the existing midden heap and try to refactor it into a working system – one that would at least work long enough to be rolled out and give the project managers to change their names and leave the state.
This project was Conway’s Law QUBED (pardon the pun, I couldn’t resist). We wrote up a lengthy analysis and I had to stand in front of a conference table full of executive wallahs. I started out (off the record) by saying that my best advice is that they go to the CEO, CIO, and CFO and say “Boy. This was a wonderful proof-of-concept demo. Now that we have learned so much, its time to sit down and start putting together coherent requirements for the actual system.” In the end, the project dragged on for two more years, consuming $2 billion, 800 developers, and 7 years. It was only ended when an external auditing company, whose name I won’t mention (AA) told them either kill the project or roll it out as is. They chose to roll it out and, after a few months, it faded into legend.
I have seen this (though not quite to this degree) time and again, particularly over the past 15 to 20 years, and have concluded that the entire American corporate (and political) structure and culture from end to end has so decompensated as to become one giant example (or at least a REALLY bad caricature) of Conway’s Law.
I am lucky that I am have been consulting remotely with an advanced digital video team in Shanghai for the past year or so (they are about 10 to 15 years ahead of cable and FiOS – Can you say 1440p, children? I knew you could.), working with some extraordinary developers, improving my Shanghainese pronunciation, and comforting myself with the fact that, should worst come to worst in the next couple of months, I already have an offer of a full-time position with them either here or moving to China at a compensation more than double that which Comcast and Verizon have offered me.
????????? (Zuì chéngzhì de wènhòu hé h?o yùn). Okay, okay, Google Translate helped me get the Chinese characters right.
John(Yu?hàn)Novack
Bob,
I have the sneaking suspicion that Scrum represents the final victory of Marketing in their internal war with Development. Now, they can keep changing the specs right up until rollout with much less kickback from Development. I have spent time on forums and talking to some of the best ScrumMasters in existence and they seem to have several common talking points.
1. Agile was originally intended for smaller, not rigidly time-limited, not mission-critical projects or for enhancing and maintaining existing systems.
2. I can count on the fingers of one hand (with the thumb left to spare) the number of ScrumMasters and other Agile experts who believe that anybody but a true ScrumMaster guru can scale up the methodology to enterprise system level without some disastrous results – especially since Scrum requires that every team member be an exceptional developer, cross-trained in the skills of her teammates (“Soon, Good, Cheap – pick any two.” Guess which.) and communication degrades at higher levels as the number of Scrum teams increases.
3. As projects become larger, with a more and more complex communications “tree” of ScrumMasters, they often break down into classic stovepipe development that only becomes obvious when they try to integrate.
4. Far less documentation (or at least “accurate” documentation) of the final product is produced as each sprint can modify the ultimate product, further inhibiting later developers’ ability to maintain, debug, and enhance the project.
5. There are advantages with at least some baselined BDUF, combined with iteration, a strong change control policy, and staged deliveries that seem to produce a higher overall quality system. If nothing else, QA can become involved at the start with baselined requirements Use Cases and early reviews of architecture and design – and begin to draw up their test plan and test scripts right from the beginning.
John: great comments. And, yes, I ended up in a very similar situation with the ‘QUBE’ project, viz., presenting my findings to a group of about 25 people: the executive VP over the project and two or three layers of project management from the various groups and organizations. The EVP asked me what I’m sure he thought were some crushing questions, all of which I had answers for (I had been doing expert witness work for several years and had been cross-examined by excellent attorneys in open court; he couldn’t lay a finger on me). The EVP ended up walking out about halfway through my presentation and never returning to the room; I was told later that he was trying to find and delete all printed and electronic copies of my report.
The remaining two dozen or so people could be divided roughly into three groups by their reactions: a small number who, like the EVP, were very angry with what I was saying; a number who were quietly nodding their heads in agreement; and the remainder, who mostly looked shell-shocked and horrified to find out just how bad things were.
Very interesting discussion. Got me thinking… The evaluation of a project in progress (using “progress” loosely) is itself a project — a one-time-only “system”, if you will — that is also subject to Conway’s law. An evaluation project differs from a “system” in that, being OTO, there are no maintenance considerations; and there is no testing. Sometimes the evaluation team is hired to implement their recommendations, perhaps to prove the efficacy thereof on pain of losing their fee.
I think the design of a system must include a recognition of data that is given, fixed, externally defined, not subject to alteration or redefinition versus data that is internally defined. The “given” data is available to any “module” in the system (subject to security) without any “communication” considerations — just read it from the database. The internally defined data should be clearly noted as such. If only one module wants it, there is no communication problem. If multiple modules want it, then they either agree to compute it themselves (repetitious but safe), or they agree to let one module compute it and everyone else uses it, even if the computation is changed by the master module. (Is it standard now to automatically notify all owners of accessing modules when the definition (computation) of the data being accessed has changed?) The point is to isolate and minimize the “communication” required.
The issue of simplification for the end-user (or customer) can be pushed all the way to the end-user interface at which point the EU can actually specify his desired presentation. The internal modules need not agree on data definitions solely for the purpose of consistency in the end-user view.
Finally, the issue of “ownership” of data by a certain business/systems segment (e.g. silo) should be confined to “responsibility” for the existence and accuracy of the data. Security and accessibility should be controlled elsewhere. Grouping several related modules (subsystems) under one manager has advantages that top out at about 50 staff and disappear with a larger team. It is best to have the organizational communication structure be flexible enough to be dictated by the system design, not the other way around.
Very timely, to come across this post just as HHS is adding Verizon to the team because the health care web site is late / broken / confusing / etc. What have we come to as a nation when the President has to go make a Rose Garden speech about a web site? But that musing aside, the best distillation of Conway’s law may be from Edward Tufte: “Design recapitulates bureaucracy.”
Hi guys,
I know I’m late to the game here; but I would love to know what kind of systems were being built in the stories you’ve been telling. It would give me some context as to how complex the projects were. Especially the project that John mentioned that was not “rocket science”.