,

Could Enterprise 2.0 Have Prevented Apollo 13?

“Houston, we have a problem…”

While many now consider the determination and ingenuity displayed during
the rescue of the Apollo 13 astronauts to be one of NASA’s finest hours, could it have been prevented altogether if NASA had had the benefits of the type of robust internal social network made possible by today’s technology?

The “Problem”

While many people may know from the film that the “problem” the Apollo 13 crew experienced was the explosion of an oxygen tank in the service module, most don’t know the full story of exactly why that tank exploded.

In this excerpt from “Apollo: The Race to the Moon”, the authors explain the series of events that actually led up to the explosion.

“…In the end, NASA would find, this is what happened…In October 1968, when O2 Tank 2 used in Apollo 13 was at North American, it was dropped. It was only a two-inch drop, and no one could detect any damage, but it seems likely that the jolt loosened the “fill tube” which put liquid oxygen into the tank.

In March 1970, three weeks before the flight, Apollo 13 underwent its Countdown Demonstration Test…which involved loading all the cryos. When the test was over, O2Tank 2 was still 92 percent full, and it wouldn’t de-tank normally—probably because of the loose fill tube. Because a problem in the fill tube would have no effect on the tank’s operation during flight, the malfunction was not thought to be relevant to flight safety.

After three unsuccessful attempts to empty the tank, it was decided to boil off the oxygen by using the internal heater and fan. This was considered to be the best procedure because it reproduced the way the system would work during flight: heating the liquid oxygen, raising its pressure, converting it to a gas, and expelling it through the valves and pipes into the fuel cells where, in flight, it would react with the hydrogen. So they turned on the tank’s heater.

A technician working the night shift on Pad 39A was assigned to keep an eye on the tank temperature gauge and make sure that it did not go over 85 degrees Fahrenheit. It was not really necessary that a human serve this function, because a safety switch inside the tank would cut off the heaters if the temperature went beyond the safety limit. And, in reality, the safety margin built into the system meant that the temperatures could go considerably higher than 85 degrees without doing any damage. But the precautions were part of NASA’s way of ensuring that nothing would go wrong.

After some time, the technician noticed that the temperature had risen to 85 degrees, but all he had been told was that anything in excess of 85 degrees was a problem, so he let the heater run—about eight hours, in all. No one had told him that the gauge’s limit was 85 degrees. That’s as high as it could measure. Thus the technician could not tell that the temperatures inside the tank were actually rising toward a peak of approximately 1,000 degrees Fahrenheit, because the safety switch had failed.

It had failed because of one small but crucial lapse in communication.

Eight years earlier, in 1962, North American had awarded Beech Aircraft a subcontract to build the cryo tanks for the service module. The subcontract specified that the assembly was to use 28-volt D.C. power. Beech Aircraft in turn gave a small switch manufacturer a subcontract to supply the thermostatic safety switches, similarly specifying 28 volts. In 1965, North American instructed Beech to change the tank so that it could use a 65-volt D.C. power supply, the type that would be used at KSC during checkout. Beech did so, neglecting, however, to inform its subcontractor to change the power specification for their thermostatic safety switches. No one from Beech, North American, or NASA ever noticed this omission.

On all the Apollo flights up through 12, the switches had not had to open. When the tanks were pressurized with cryogens hundreds of degrees below zero, the switches remained cool and closed. When, for the first time in the history of the cryo tanks, the temperature in the tanks rose high enough to trigger the switch—as O2 Tank 2 emptied—the switch was instantaneously fused shut by the 65-volt surge of power that it had not been designed to handle. For the eight hours that the heaters remained on, the Teflon insulation on the wires inside the cryo tank baked and cracked open, exposing bare wires.

On the evening of April 13…[when the cryo tank was stirred], some minute shift in the position of two of those barewires resulted in an electrical short circuit, which in turn ignited the Teflon, heating the liquid oxygen. About sixteen seconds later, the pressure in the O2 Tank 2 began to rise. The Teflon materials burned up toward the dome of the tank, where a larger amount of Teflon was concentrated, and the fire within the tank, fed by the liquid oxygen it was heating, grew fierce.

In the final four seconds of this sequence, the pressure exceeded the limits of the tank in about eleven microseconds, slamming shut the reactant valves on Fuel Cell 1 and Fuel Cell 3. Then the Teflon insulation between the inner and outer shells of the tank caught fire, as did the Mylar lining in the interior of the service module.

The resulting gases blew out one of the panels in the service module. That explosion also probably broke a small line that fed a pressure sensor on the outside of the O2 Tank 1, opening a small leak.

Once the service module panel blew out…”

Lessons Learned

One of the lessons that NASA learned from the Apollo 13 accident, and many other failures, was the importance of communication within such a complex organization. It was not the first time that the inherent complexity of such a large system, and the extended organization required to design, build and operate it, had led to a breakdown in communication that contributed to what was perceived as a “mechanical” problem.

In an effort to combat this complexity, over the last 50 years NASA has been at the forefront of developing and refining many of the best practices in a discipline known as Systems Engineering.

Systems Engineering

Among other things, systems engineers develop and implement formal organizational structures and processes that are specifically designed to improve the flow of information within a technical organization.

They organize “Integrated Product Teams” (IPT’s) that bring together specialists from different areas to work on designs from a more holistic perspective.

They organize and conduct formal design reviews that expose the technical work done by a small group of specialists to the wisdom and experience of a wider group of engineers.

They establish a structure of engineering review boards that use formal processes to ensure the technical integrity of the design.

NASA currently requires all of its programs to adhere to a set of formal requirements contained in NASA Procedural Requirement (NPR) 7123.1A, “NASA Systems Engineering Processes and Requirements” as part of its efforts to prevent the next Apollo 13. It also provides all of its engineers with a set of detailed guidelines NASA/SP-2007-6105 Rev1, “NASA Systems Engineering Handbook” for understanding and executing the processes required by the NPR.

Can We Do Better?

While these formal organizational structures and processes have been extremely successful in “flattening” (if not reducing the size of) NASA’s technical organization and fostering the kind of cross-functional communication required to prevent failures like Apollo 13, they are still far from perfect.

One problem is that they are costly to implement. The “technical bureaucracy” needed to foster this type of communication is not cheap and the formal processes take time to execute that can slow down the pace of a large project. Given the alternative (another Apollo 13), many of these additional costs, in both time and money, have become accepted at this point as a cost of doing business.

Another problem is that, despite the demonstrated effectiveness of this approach at reducing the potential for another Apollo 13-type failure, the possibility for a “process escape” always exists. In most cases it’s the things you didn’t think of that bite you.

Enterprise 2.0

While there will probably always be a need for the additional “rigor” provided by many of the formal organizational structures and processes associated with NASA’s approach to systems engineering, could the additional informal communication provided by Enterprise 2.0 have been enough to break the chain of events that led to Apollo 13?

NASA Lessons Learned Ltr (Feb 09).pdf

Leave a Comment

9 Comments

Leave a Reply

Joe Sanchez

“No one had told him that the gauge’s limit was 85 degrees.”

#e20 and its communication channels cannot guarantee that the right message gets to right people at the right time. It still comes down to knowing what information is needed by whom and when.

If anything, without identifying the EEOI (essential elements of information that are needed) and the requisite filters, #e20 could potentially results in an overload of information.

Matthew B. Strickland

@ Stephen – It is, there’s a link in the line before the excerpt starts.

@Joe – Good points. The formal systems that are in place are intended to do just that (get the right info to the right people at the right time).

The concern is that in such a complex environment there will always be details that either the conscious “planners” may not be aware of or even thought of (like the disconnect between the capability of the pressure gauge and the intent of the measurement that it was being used to make) or that just get missed by the formal processes (like the “ripple” effect of the requirements change).

The question is would the additional peer-to-peer communication associated with a technologically-enabled social network help supplement the formal processes to catch some of these problems. I don’t really know the answer.

I suspect it might help supplement and “fill in the gaps” of some of the more formal processes, but you bring up an excellent point in that additional things could be missed due to either “information overload” or the new informal channels “watering down” the effectiveness of the more formal channels.

Avatar photo Bill Brantley

Great post Matthew but I think the Challenger Accident and the Columbia Accident answered your last question. You can find elements of Enterprise 2.0 in both accidents but the real problem is not the informal communication but that NASA is not a learning organization. From Volume One of the Columbia Accident Investigation Report (p. 9):

“Cultural traits and organizational practices detrimental to safety were allowed to develop, including: reliance on past success as a substitute for sound engineering practices (such as testing to understand why systems were not performing in accordance with requirements); organizational barriers that prevented effective communication of critical safety information and stifled professional differences of opinion; lack of integrated management across program elements; and the evolution of an informal chain of command and decision-making processes that operated outside the organizationʼs rules.

This report discusses the attributes of an organization that could more safely and reliably operate the inherently risky Space Shuttle, but does not provide a detailed organizational prescription. Among those attributes are: a robust and independent program technical authority that has complete control over specifications and requirements, and waivers to them; an independent safety assurance organization with line authority over all levels of safety oversight; and an organizational culture that reflects the best characteristics of a learning organization.”

You are correct in that additional informal communication will help but the real benefit of an Enterprise 2.0 solution for NASA is to support it’s transformation into a learning organization. There are a lot of great folks in project management, program management, and knowledge management working in NASA so you have the makings of a great transformation into a learning organization. But (from someone looking from the outside in), the top management needs to elevate the work of these folks.

Matthew B. Strickland

@ Bill – Thanks for the comments, but I’m not sure I agree with your contention that NASA is not a learning organization.

There are numerous examples, including the Columbia accident, where NASA took the time to investigate the root causes and implement significant changes as a result (isn’t that learning?).

As an engineering-based culture this is a normal part of how we operate. We constantly design things, test them, analyze the results, and then learn from that experience. We capture “Lessons Learned” and attempt to apply them to other programs. We place high priorities on internal communication and professional development.

As someone who already considers themselves part of a “learning organization” I readily agree that we can always improve, but I’d be interested in your thoughts on what areas you think we need to work on?

Avatar photo Bill Brantley

@ Matthew – I base my conclusions on Mahler and Casmayou’s (2009) book Organizational Learning at NASA: The Challenger and Columbia Accidents. Like your contention, NASA discovered and implemented many lessons learned from the Challenger accident but these lessons seemed to be forgotten by the time of the Columbia accident (p. 6).

According to the authors, the obstacles to NASA’s learning abilities are: “time pressures on shuttle processing, managerial turbulence, a weak safety organization, gaps in information processing, center rivalries, and lack of scientific or technological novelty in the failed systems” (p. 167). NASA has the elements to be a great learning organization but these efforts are subject to incompatible priorities of the changes in administration (such as Goldin’s emphasis on “faster, better, cheaper” management (p. 187)).

The authors also point to a fundamental problem in NASA’s culture in that employees are not free to express concerns about safety or management policies (pp. 201-202). You remember the story of the NASA employees who pressed to have an in-orbit inspection of Columbia’s tiles but were ultimately turned down because of the publicity concerns of such a request?

I don’t believe this is an issue with NASA but with all Federal agencies. The political and budget constraints plus the constant shifting of priorities prevents any agency from becoming a learning organization. The authors chose to study NASA because it is so open and well-documented but I bet that if they chose to study any other agencies they would have found similar issues.

I am a big fan of NASA’s APPEL Program which is a model program for knowledge sharing. As NASA winds down the Shuttle program and moves to whatever is next, it is going to suffer another shock to its organizational memory. I hope programs like APPEL are there to capture the knowledge because if NASA does go to a heavy-lift vehicle using Shuttle technology, a lot of that knowledge is walking out the door right now.

Reference:
Mahler, J.G., & Casamayou, M.H. (2009). Organizational learning at NASA: The Challenger and Columbia accidents. Washington, DC: Georgetown University Press.

Matthew B. Strickland

@ Bill – Thanks for the reference. I’ll definitely have to check it out.

If I understand it correctly then, your main point is that while NASA is certainly capable of learning from its experience, it isn’t always successful at actually applying them to the problem at hand. In other words, they’re better at capturing the knowledge than applying it. Fair point.

One of our Engineering Managers actually refers to them as “experiences” rather than “lessons learned” to highlight the fact that they’re never really “learned” until they get you to actually do something differently in your daily work.

I agree that Federal agencies have some unique qualities that present different challenges to developing and maintaining themselves as effective learning organizations.

Any thoughts (from anyone) on how to get better at applying lessons learned (especially the organizational ones) in the context of a Federal agency?

Avatar photo Bill Brantley

“Any thoughts (from anyone) on how to get better at applying lessons learned (especially the organizational ones) in the context of a Federal agency?”

I have two beginning steps for you:

The first step is to build a culture of trust where people can point out problems and offer solutions without fear of reprisal. Stop blamestorming and shooting the messenger while encouraging everyone to freely express their concerns no matter their rank or position in the organization.

The second step is to build a culture of innovation where people are recognized and rewarded for helping to improve the processes and procedures of the organization. That requires top management to be completely open about how the organization works and encouraging employees to learn more than just their part in the processes.