Latest column up: problems with distributed development

Sorry I haven’t posted much lately; I actually have a few posts in draft status, but I’m currently in Dallas, pouring over hundreds of pages of source code listings (Z8 assembler, anyone?) and haven’t had a chance to finish up any of them. In the meantime, here’s my latest Baseline column on the challenges of a (geographically) distributed software development project. Part II will be on techniques to help make such an effort successful; feedback is always welcome.

[Response to comments -- WordPress for some reason won't recognize that I'm signed in and let me post directly in comments myself]

Yurri writes:

It’s true that managing a distributed team is much more challenge than having all your crew at the one office. It’s definitely true but obvious also.

To you and me, perhaps, but not to to many organizations, large and small. Such organizations still operate — consciously or not — on the assumption that IT engineers are interchangable components, which includes a naive belief that it really doesn’t matter where all the IT engineers are located as long as you have enough of them. If you don’t believe me, consider how many organizations still consider it perfectly feasible to have a joint offshore/domestic software project.

The only thing that i can’t agree in this article is that oil prices play main role in distributed software development expansion. Tickets cost still remain minor part of relocation expenses as I can expect.

I must not have been clear enough. The sharp rise in gas prices encourages telecommuting — have IT engineers work from home, rather than driving into work each day. The rise in airline ticket prices also discourages having distant engineers fly for meetings as often as they should. I actually fully agree that the rise in airline ticket prices is relatively minor compared to (a) hotel and meals costs, and (b) the benefits of having all the engineers getting together — but I also know that many corporations often use minor expenses as a reason to deny something. Think about it: how many organizations refuse to buy their IT engineers up-to-date development systems and tools, despite the fact that the costs of such computers and tools is a tiny fraction of the engineers’ salaries and the lost-opportunity cost of having IT projects delayed?

John writes:

[key factors include team size, talent of the engineers, team cohesion -- go read his comment below]

I agree with all your observations. I’ve had distributed development work, and I’ve also had it cause real problems — and those factors pretty much were the difference. And, yes, I’ll be writing about that in the next column. ..bruce..

Bookmark this page: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Google
  • Live
  • StumbleUpon

The decline in computer science students (part 2)

I previously discussed the up-and-down cycle of college enrollment in computer science and related fields. More accurately put, there have been two large peaks in computer science enrollment: one in the mid- to late 1980s (which happens to be when I was teaching CS at Brigham Young University) and another right around the turn of the 21st century.  Here’s the CRA chart I included in that previous post (click on the chart to see a larger version):

Back in 1985-87, while I was teaching at BYU, I mentioned to my friend Wayne Holder — one the finest software engineers I’ve ever known — that students at BYU could no longer simply declare their major to be Computer Science; instead, they had to take certain prerequisites, apply to the CS department, and be accepted. Wayne thought that was too complicated. He suggested that the prospective candidate be put into a room with (a) a bowlful of money and (b) some really nifty hardware and software. The candidate could then choose either to grab a handful of money and leave or to hang out and play with the computer gear; those who chose the latter would be admitted to the program.

I think Wayne was dead on, and this article in Computerworld (hat tip to Slashdot) tends to support that, though the survey quoted in from the United Kingdom rather than the United States:

Responses from nearly 2,000 undergraduates across the UK showed that most students think the IT sector has a bright future with good prospects for highly paid jobs.

But over 60% of non-computing students do not wish to enter the sector because they think it will be boring.

I’ve written before that talent is a key factor in IT personnel issues, and only a small portion of the general population appears to be talented in IT. People who have little or no aptitude for IT are likely to find it boring at best and confusing at worst.

However, that natural aversion to IT has been overcome at least twice in the last 30 years. The first time was in the mid-1980s and was largely a response to the explosive growth of the personal computer industry, led by Apple, IBM and Microsoft, but including many, many firms making both hardware and software. I wrote for BYTE Magazine back then, and individual issues of BYTE ran anywhere from 300 to nearly 600 pages, due to the sheer volume of ads. My observation as a CS instructor at BYU was that many of our students had come into the program thinking they were going become rich and/or famous, like Steve Jobs or Bill Gates.  They viewed computer science the same way my fellow undergrads a decade earlier had looked at law or med school. Hence the tremendous run-up in CS enrollment, not just at BYU but all across the United States.

Then came the First Tech Crash, which hit around 1988 — helped along, if not outright triggered by the stock market crash in October 1987 — and lasted into 1991 or so. Large numbers of hardware and software companies went out of business, and the personal computer market pretty much narrowed down to IBM and a small number of IBM PC clone manufacturers, with Apple treading water (at best). The chart above shows how CS enrollment mirrored that crash. By the early 1990s, the joke in the IT industry was: “Do you know what the status symbol of the 90s is? A job.”

CS enrollment nationwide was pretty flat from 1991 to 1997, and down at a level that you’d have to go back to 1981 to match. Most likely, people going into computer science at that time were — like me, all the way back in 1974 — going into it because we liked the field, not because we thought we’d be rich.

By 1998, however, the “dot-com boom” had become visible enough to start driving CS enrollment up again. There was an enormous demand for software engineers, with a lot of venture capital to back it up — news articles reported programmers being recruited out of high school, and CS graduates were getting large salaries and signing bonuses. Beyond that was the vision of the “nerd lottery” (to use Bruce Henderson’s phrase): dot-com startups would go public, and many of the startup’s employees (right down to receptionists) would walk away multi-millionaires. Mainstream corporations tried to get in on the dot com boom as well, starting various e-commerce and internet intiatives.

In just about this same time period, the Year 2000 (Y2K) problem got everyone’s attention, and even those organizations, both commercial and governmental, that kept the dot-com craziness at arm’s length found themselves having to do exhaustive testing and remediation of their IT systems from top to bottom. Business and government in the United States would end up spending $110 billion on Y2K remediation, all in just a few years.

As the chart shows, CS enrollement skyrocketed again, nearly tripling from 1997 to 2003, largely due to the combination of these two factors.  Unfortunately for those students, Y2K remediation largely finished up almost at the same time the Second Tech Crash (or “Dot Com Crash”) started, namely March 2000. The NASDAQ stock index peaked at its all-time high value of 5048.62 on March 10, 2000, a 100% increase over what it had been just a year earlier. (Stop and think about that: what if the Dow Jones Industrial Average were to hit 24,000 a year from now?) It was a classic bubble, and now it was popping, or at least deflating; the NASDAQ index currently trades at less than half that value. (Note that the DJIA is up roughly 20% — and was up over 30% earlier this year — from its value on that same date eight years ago.)

This tech crash was far more brutal than the first one. The IT employment marketplace was flooded with massive numbers of IT engineers who were no longer needed, one way or the other, and even talented IT engineers had a hard time getting visibility over the sheer number of warm bodies out there.  But it took a while for that feedback to get back into the colleges and universities; enrollment continued to climb until about 2003 but appears to have been slumping since then (see the chart above) and could actually drop back nearly to where it was when I graduated with my own CS degree some 30 years ago.

In other words, the real issue isn’t why CS enrollment is declining; the question is why did it ever climb so high in the first place? And it’s pretty clear that it tracks the two major bubbles of the past 30 years: the personal computer boom in the mid-1980s and the dot-com/Y2K boom of the late 1990s. After each bubble deflates, CS enrollment sinks back to its “natural” level, based on the distribution of IT-related talent and inclination in the general population.

The problem, however — as I first noted over 12 years ago — is that this “natural” level isn’t enough to supply sufficient IT talent for successful IT develompment and deployment in all the businesses, vendors, government agencies and other organizations that need it.

In my opinion, there is no shortage of IT engineers — particularly not after the vast numbers drawn into the industry due to Y2K and the dot-com boom — there’s just a shortage of talented ones. This is why you get conflicting claims and statistics about “personnel shortages” in the IT industry (cf. here vs. here, as well as the battle over raising the limit on H-1B visas and the offshoring debate).

The various attempts to “boost” CS enrollment at colleges and universities will have only a small effect on that talent shortage; for the most part, it will likely bring additional people into the IT industry who lack the talent or inclination to do well there.  In other words, it won’t solve our IT problems at all.  ..bruce..

Bookmark this page: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Google
  • Live
  • StumbleUpon

Issue: metrics for tester productivity?

In response to my Baseline columns on metrics (Part 1, Part 2, and Part 3), I received the following e-mail:

I read your column with great interest as I’m involved on an IT project to measure productivity. May I ask you a quick question? Are there any mature metrics that can measure tester productivity improvement month by month and accurate to 1%?

Here’s the response I sent back:

Well, for starters you have to define what you mean by “tester productivity.” Number of test scripts run? Number of defects found?  Number of defects closed? Number of defects reopened? (And do you weight the “defects found/closed/re-opened” by criticality and/or severity?) Number of reported defects replicated? Number of hard-to-replicate, yet critical/severe defects that can now be replicated (and thus fixed)? Some combination (possibly a weighted function) of all of the above?

In other words, what is it exactly that you’re trying to accomplish? To make your testing team more effective? More efficient? To shorten the test cycle? To spend less on testing? To close more defects (and defer fewer open ones) for each system release? To have fewer defects discovered after a system release? Jerry Weinberg says that “quality is value to some person.” Who are the people you’re worrying about, what qualities — functionality, performance, reliability, etc. — do they value, and to what extent?

Once you’ve defined all that, there still remains the question as to whether you can measure that to a 1% accuracy (or even a 10% accuracy) month over month, and still preserve any meaning in that measurement. It’s possible (and common) in metrics to have “false accuracy”  — you believe you’re actually measuring something to a certain precision, but you’re mostly just reading random or insignificant noise at that level.

Finally, we come back (as always) to Weinberg’s law of metrics: that which can be measured can be fudged (or exploited). For example, read this story over at the Daily WTF:  The Defect Black Market.

Hope this is of some help, though I tend to doubt it. :-)

Thoughts from the rest of you?  ..bruce..

Bookmark this page: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Google
  • Live
  • StumbleUpon

Latest column: IT project metrics, part 3

My newest Baseline column is up: “Lies, Damned Lies, and Project Metrics (part 3)“. In it, I wrap up my discussion on IT project metrics, outlining a possible approach using instrumentation and heuristics.  Go check it out.  ..bruce..

Bookmark this page: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Google
  • Live
  • StumbleUpon

Anatomy of a runaway IT project

The following document is the actual text — carefully redacted — of a memo I wrote some time back [i.e., several years ago] after performing an IT project review; names and identifying concepts have been changed to preserve confidentiality (and protect the guilty). The project in question was a major IT re-engineering effort for a mission-critical system; at the time I did this review, the project had been going on for several years and had cost millions of dollars; it would eventually be canceled and the work products abandoned. The memo itself provides an interesting glimpse into just how a major IT project can go so far off the tracks that nothing useful is ever delivered.

Note that the “ABC” consultants were a small part of the overall project team and had been brought in relatively late by “BigFirm” in an attempt to get the “FUBAR” project into production; they neither initiated nor managed the project. [NOTE for those of you who have written or done Google searches: "Bob Winsom", like all the other names in the memo as transcribed below, is a pseudonym.]

========================================

CONFIDENTIAL MEMORANDUM — EYES ONLY

Over the past two weeks, I’ve conducted confidential off-site group interviews with all of the ABC consultants working on the FUBAR project. I did this at [ABC manager's] request, after a few of these consultants spoke privately about FUBAR with him. The feedback was consistent and raises serious doubts about whether the FUBAR project, as currently pursued, can ever yield a successful production deployment.

This report groups those comments into several broad areas. It is relatively unfiltered and extremely direct (no withholding). It represents the private comments of ABC consultants who have little to gain and possibly much to lose by being so blunt. These are not the whinings of purists picking nits. These are the grounded assessments of top-notch IT professionals who have among them a century or two of experience bringing projects to completion — particularly those involving [specific IT] technology — and who are down in the FUBAR trenches every day.

QUALITY OF WORK AND EFFORT

ISSUE: Several consultants said — and the rest pretty much agreed — that far too many of the deliverables, artifacts, and activities (e.g., algorithms, source code, system configuration, design/architecture documents, testing, defect tracking, scheduling etc.) are substantially below any acceptable professional standards and represent a profound threat to FUBAR ever going into production.

EXAMPLES: The code base is very fragile. A lot of it is bad old code that BigFirm didn’t have time to rewrite two years ago, but now is five times its original size and even worse. One consultant said he took a code listing, picked pages at random, and found problems on every page he selected. There is pervasive hard coding of what should be adjustable parameters or at least meaningfully named constants (e.g., # of [key items] hard-coded throughout with the literal value ‘3′, a constant named ‘ninety_eight’). Builds take all night. App releases don’t run acceptably, if at all, in a production environment. Developers check in files that won’t even compile.

RISKS: The FUBAR project keeps being touted as a world-class development team, but it is not producing world-class, or even minimally-professional, results. This already shows up in the project delays and quality issues of the releases to date. What the team is producing will not only be very difficult to support and modify, it will in all likelihood be unusable, resulting in a complete failure of the FUBAR project.

PROJECT PLANNING AND EXECUTION

ISSUE: Project planning and execution is all to often poor or missing completely. Milestone dates, usually unrealistic if not impossible, are based on political considerations or wishful thinking, not bottom-up grounding. Necessary and useful activities are delayed or canceled with the justification “We have no time for that”, but the project phase ends up taking as long or longer than if the activities had been carried out. Dates are set, but nobody scrambles until the last minute. Risks are not actively tracked or managed.

EXAMPLES: Count how many times FUBAR ever produced a production-quality deliverable on anything close to a scheduled date. Even the current effort probably requires a year to get something into production, but the schedule says four months. Managers create work tasks, but then never track progress or completion. One ABC consultant created a risks document; Bob Winsom [BigFirm's FUBAR project manager] took it over, and no one has seen it since.

RISKS: FUBAR is massively late. You lose credibility and influence.

QUALITY ASSURANCE AND PROCESS

ISSUE: Quality assurance appears to be low-priority concept within the FUBAR project. In the opinion of several consultants, the current person in charge of it is not particularly strong or competent. There appears to be a systemic inability to establish good testing criteria and methods to gauge FUBAR’s progress toward production. What software lifecycle process is in place is often not followed. No independent group or person acts as the ‘gatekeeper’ to judge acceptability and control release into production.

EXAMPLES: [Key process] calculation — the core of BigFirm’s business and profits — was being (and may still be) done incorrectly in FUBAR; it had never been previously checked for correctness through all these years. Likewise, performance expectations have been based on the presumption of FUBAR distributed over multiple systems, processors, and threads, yet no one ever tested to see if those implementations would work until recently — and they didn’t. The build environment needs to be overhauled. The defect tracking process is poor, particularly the practice of writing up defects not against the current release but the release in which the defect is scheduled to be fixed — so as to keep the number of defects down for the current release.

RISKS: BigFirm leaves itself open to potential liabilities, not to mention crippling its own core business. In the meantime, the effort to transition directly into the Rational Unified Process (RUP) is not being given sufficient time and will likely grind development to a halt.

ARCHITECTURE

ISSUE: FUBAR doesn’t have a viable, consistent architecture. The irony is that FUBAR itself is not a big, complex problem; it is a relatively straightforward problem that has been made big, complex, and possibly unsolvable in the current implementation. Initiatives to rearchitect are started, abandoned due to “schedule pressure”, then restarted months later.

EXAMPLES: The project has gone through a series of architects, who have either left or been asked to leave. While they are here, they usually are neither listened to nor given the authority to be an architect. Technical decisions are made by people lacking the background, such as Bob Winsom, who may fancy himself an architect and was quoted as saying, “I haven’t found an architect I like yet.”

RISKS: FUBAR never stabilizes enough to go into production for any length of time, or if it does, proves to be extremely difficult to support or enhance.

APPLICATION PERFORMANCE

ISSUE: FUBAR was never properly architected and designed for the performance required. There is a current effort to increase performance after the fact, but the implementation makes that impossible. To make things worse, developers are having to scale the performance of and debug a seriously flawed application at the same time, making it very hard to stabilize the application.

EXAMPLES: Two consultants rewrote the 140,000 lines of [original obscure language] into 4200 lines of Java. The Java version runs as fast on a laptop PC as the original version runs on a high-powered UNIX server.

RISKS: Despite heroic efforts (that will probably make the application even more difficult to modify and support) and lots of hardware, FUBAR will probably reach some fraction of the currently-desired performance — possible as little as 15% to 20% of that required, possibly as much as 80% — and then go no further.

STAFFING

ISSUE: Many of the people involved in FUBAR — developers, testers, team leads, managers — lack the qualifications, insight, or experience to make FUBAR a success. The project is overstaffed for the actual complexity of FUBAR, possibly for political reasons (i.e., promotion/stature based on the number of people supervised).

EXAMPLES: Many of the examples listed above reflect this problem.

RISKS: This problem leads in part to all the issues previously listed: poor quality of work, poor quality assurance, poor scheduling and delivery, poor architecture, poor application performance. Besides the potential failure of FUBAR itself, this issue tends to be self-intensifying — that is, the qualified people become frustrated and leave (or are hard to recruit in as replacements), while those less qualified or capable stay around. [A reference to the "Dead Sea effect" written many years ago.]

[BrandName team management approach] PRINCIPLES

ISSUE: Mid-level management tells the developers that mood, sincerity, and commitment are everything, and that with them “we can accomplish anything.” At the same time, the principle of granting sincerity appears to be used to justify a lack of accountability and consequences.

EXAMPLES: Repeated statements in team meetings, one-on-ones, and so on.

RISKS: Loss of credibility. Such assertions don’t hold water. I can be in a great mood and have a team of very sincere and committed people, but if we try to build a commercial airliner without the proper expertise, requirements, engineering, materials, and testing, the plane will crash and people will die, assuming it ever gets built and off the ground (which is extremely unlikely). The fallacy that software is somehow different is just that — a fallacy, and one that costs corporations millions (if not billions) of dollars a year in missed schedules and failed projects. When it comes to engineering, sincerity and commitment, while important, can never substitute for expertise and quality of work.

INTELLECTUAL HONESTY

ISSUE: There isn’t enough intellectual honesty within the FUBAR project. Managers reject or explain away bad news and real problems, looking instead for people who will tell them what they want to hear.

EXAMPLES: Several developers and team leads have sought to escalate these issue and concerns up the management chain, but the issues appear to always get stopped, usually at Bob Winsom. [The "thermocline of truth", with a very discrete boundary.] The FUBAR project is represented as something that has never been done before, and the staff as a world-class development team.

RISKS: The lack of intellectual honesty in project management is a form of codependency and enabling that is all too easy to fall into. Unfortunately, reality eventually intrudes, and when it does, you run the very real risk of looking dishonest or incompetent.

CLOSING REMARKS

As I said, this is a very blunt (and very confidential) memo. It represents the opinions, experiences, and observations of these ABC consultants, and there are undoubtedly points with which you take issue or disagree. Do not let that blind you to the fundamental reality that there are some profound problems and flaws with the FUBAR project that will not go away until the project team acknowledges and addresses them. Indeed, it will be hard enough to make them go away even then.
========================================

Kind of scary, isn’t it? The interesting part was that BigFirm had implemented, corporate-wide, a “team management” methodology (from an outside firm) that was based on “mood, sincerity, and commitment”. As an overall corporate management approach, it might well have been effective; I just don’t know. But BigFirm thought that it would also solve their IT problems.

Nope, it didn’t. ..bruce..

[Speaking of project failure -- I have my first three "Surviving Complexity" columns up at Baseline, talking about IT project metrics, why they're so tough to define, and one possible approach.]

[UPDATED 06/25/08: If you think the project above is bad, take a look at this one.]

Bookmark this page: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Google
  • Live
  • StumbleUpon