October 2003

 search archive


reader feedback
  search by date
also in this issue
Career Focus: Circuits & Systems
Cogent Communicator: How to Listen
Backscatter: Toys for Techies
Lessons of the Internet Age: The International Telecommunications Union and the Internet Society
NCEES Model Law Revisions Impact Professional Licensure Education and Experience Requirements
Free IEEE-USA E-Books for Members in December 2014 and January 2015
Your Engineering Heritage: Which Stimulates Innovation More, War or Peace?
World Bytes: American Ingenuity Awards
Tech News Digest: December 2014




Accidents Waiting to Happen

by Donald Christiansen

E-mail this page
to a friend

Tell us what you thought of this article

Why do complex systems fail when they shouldn’t? Why did the Challenger explode? Why did Columbia disintegrate? Why were large areas of the northeastern United States blacked out in 1965 and again in 2003?

Technology historians offer a pair of explanations: 1) Designers are not cognizant of what caused predecessor systems to fail; or 2) They are aware, but are willing to accept risks based on a succession of prior successes.

When the Tacoma Narrows suspension bridge (“Galloping Gertie”) collapsed in 1940 from the effects of aerodynamic phenomena, it was widely reported that the bridge had been perfectly designed for static loads; it was simply that unanticipated effects of wind caused its demise. In fact, an engineering professor at the University of Washington recalled in 1949 that 10 suspension bridges had been destroyed or severely damaged by wind between 1818 and 1889.

Brooklyn Bridge designer John Roebling noted in 1841 that “storms are unquestionably the greatest enemies of suspension bridges.” Fourteen years later he wrote, “the catalogue of disastrous [suspension bridge] failures is now large enough to warn against light fabrics, suspended to be blown down, as it were, in defiance of the elements.” So, historians believe, the Tacoma bridge disaster should have come as no surprise and could have been averted. Roebling, they say, learned from studying failures; the designers of the Tacoma Narrows Bridge did not. Were they unaware of the suspension bridge failures of the 1800s, or did they consider them too remote for serious consideration?

Shuttle Faults

In the case of the Challenger, a potential failure mechanism in the form of a propulsion-rocket joint seal that would deteriorate during launch was well known. But successive launches in which NASA “got by” culminated in the 1986 explosion in which the Challenger and its crew were lost. For many NASA engineers, the cause was easy to guess. In her post-accident analysis, author Diane Vaughan labeled it a case of “normalizing deviation.” That is, NASA engineers and management were lulled into complacency and some hubris through a string of well publicized, successful launches. After each flight, they’d analyze the degree of joint degradation, but did not eliminate the fault mechanism. They accepted it as normal.

When the Columbia disintegrated in February 2003, the cause was found to be another known fault mechanism chunks of foam insulation from the external tanks impinging on the spacecraft. This time, 81 seconds into the launch, a chunk of foam destroyed enough of the vehicle’s thermal protection shield to cause burning and melting and ultimate break-up of the craft during re-entry. The lead accident investigator said his task force had determined that “NASA is not a learning organization. They do not learn from their mistakes.” There was no way to monitor any damage to the spacecraft during its mission, nor repair it in flight. The commission called for both in its final report.

Power Outages

After the great Northeast blackout of 1965, the public was assured that such an event could not happen again. Failures would be localized and the offending power company isolated quickly from the grid. That it happened again was a surprise to the public and grist for politicians and the media, but perhaps not that much of a surprise to power engineers. They may have known where the weaknesses lay, but felt powerless in the face of any risk-benefit analysis to do much about it. At this writing, the cause has not been fully defined. Some suggest the grid was not sufficiently automated, others that it was too automated.

There is a theory that catastrophes in large systems or major engineering projects occur cyclically. One study revealed a 30-year interval between major bridge disasters in the United States, an interval thought to be the result of “a communication gap between one generation of engineers and the next.” If valid, this suggests that only the information on successful designs is passed along from one generation to the next, while information about what didn’t work and why is not. We can guess why this might be so. We all like to be recognized for our successes, not our failures. Published papers seldom recount failed approaches to an ultimately successful design. Design and development today is commonly done in teams; the consensus vote may be to emphasize successes only. And corporate patent attorneys may prefer to suggest that the path to any successful invention be seen as straightforward, not strewn with miscues or failed approaches. Finally, a company may need to consider pending or potential lawsuits.

In his book Design Paradigms, Duke University engineering professor Henry Petroski encourages designers to study past failures. In contemplating new systems or even incremental steps forward in existing systems, the designer should ask how failure can occur and what design changes can obviate that failure mode without introducing another. Petroski warns that once a design becomes accepted, incremental design extrapolation in succeeding generations tends to be the norm and first principles tend to be overlooked. And design decisions that may prove critical are given to less experienced engineers, further fostering the generation gap mentioned earlier.

It strikes me that the likelihood of accidents waiting to happen and actually happening will only increase as our systems become more complex. There may be greater hope for “boundable” systems, like a bridge or a space vehicle, but I worry most about systems that are “universal” and expansive, yet consist of numerous, potentially fallible, semi-independent pieces like a power grid or, worse yet, the Internet.

The burden will rest on future hardware and software designers to prove my projection wrong. I hope they can.


For more about design failures, see:

J. A. Roebling, “Remarks on Suspension Bridges . . .,” American Railroad Journal and Mechanics Magazine, Vol. 6, 1841.

F. B. Farquharson, “Aerodynamic Stability of Suspension Bridges, with Special Reference to the Tacoma Narrows Bridge,” University of Washington Structural Research Laboratory, 1949.

D. Vaughan, The Challenger Launch Decision, The University of Chicago Press, 1996.

T. E. Bell (ed.), “Special Report: Managing Risk in Large Complex Systems,” IEEE Spectrum, June 1989.

T. E. Bell and Karl Esch, “The Fatal Flaw in Flight 51-L,” IEEE Spectrum, February 1987.

H. E. McCurdy, “The Decay of NASA’s Technical Culture,” Space Policy, November 1989.

Columbia Accident Investigation Board Final Report, August 2003 (

H. Petroski, Design Paradigms: Case Histories of Error and Judgment in Engineering, Cambridge University Press, 1994.



Donald Christiansen is the former editor and publisher of IEEE Spectrum and an independent publishing consultant. He can be reached at




Copyright 2003, The Institute of Electrical and Electronics Engineers, Inc.