“Houston, we have a problem…”
While many now consider the determination and ingenuity displayed during
the rescue of the Apollo 13 astronauts to be one of NASA’s finest hours, could it have been prevented altogether if NASA had had the benefits of the type of robust internal social network made possible by today’s technology?
While many people may know from the film that the “problem” the Apollo 13 crew experienced was the explosion of an oxygen tank in the service module, most don’t know the full story of exactly why that tank exploded.
In this excerpt from “Apollo: The Race to the Moon”, the authors explain the series of events that actually led up to the explosion.
In March 1970, three weeks before the flight, Apollo 13 underwent its Countdown Demonstration Test…which involved loading all the cryos. When the test was over, O2Tank 2 was still 92 percent full, and it wouldn’t de-tank normally—probably because of the loose fill tube. Because a problem in the fill tube would have no effect on the tank’s operation during flight, the malfunction was not thought to be relevant to flight safety.
After three unsuccessful attempts to empty the tank, it was decided to boil off the oxygen by using the internal heater and fan. This was considered to be the best procedure because it reproduced the way the system would work during flight: heating the liquid oxygen, raising its pressure, converting it to a gas, and expelling it through the valves and pipes into the fuel cells where, in flight, it would react with the hydrogen. So they turned on the tank’s heater.
A technician working the night shift on Pad 39A was assigned to keep an eye on the tank temperature gauge and make sure that it did not go over 85 degrees Fahrenheit. It was not really necessary that a human serve this function, because a safety switch inside the tank would cut off the heaters if the temperature went beyond the safety limit. And, in reality, the safety margin built into the system meant that the temperatures could go considerably higher than 85 degrees without doing any damage. But the precautions were part of NASA’s way of ensuring that nothing would go wrong.
After some time, the technician noticed that the temperature had risen to 85 degrees, but all he had been told was that anything in excess of 85 degrees was a problem, so he let the heater run—about eight hours, in all. No one had told him that the gauge’s limit was 85 degrees. That’s as high as it could measure. Thus the technician could not tell that the temperatures inside the tank were actually rising toward a peak of approximately 1,000 degrees Fahrenheit, because the safety switch had failed.
It had failed because of one small but crucial lapse in communication.
Eight years earlier, in 1962, North American had awarded Beech Aircraft a subcontract to build the cryo tanks for the service module. The subcontract specified that the assembly was to use 28-volt D.C. power. Beech Aircraft in turn gave a small switch manufacturer a subcontract to supply the thermostatic safety switches, similarly specifying 28 volts. In 1965, North American instructed Beech to change the tank so that it could use a 65-volt D.C. power supply, the type that would be used at KSC during checkout. Beech did so, neglecting, however, to inform its subcontractor to change the power specification for their thermostatic safety switches. No one from Beech, North American, or NASA ever noticed this omission.
On all the Apollo flights up through 12, the switches had not had to open. When the tanks were pressurized with cryogens hundreds of degrees below zero, the switches remained cool and closed. When, for the first time in the history of the cryo tanks, the temperature in the tanks rose high enough to trigger the switch—as O2 Tank 2 emptied—the switch was instantaneously fused shut by the 65-volt surge of power that it had not been designed to handle. For the eight hours that the heaters remained on, the Teflon insulation on the wires inside the cryo tank baked and cracked open, exposing bare wires.
On the evening of April 13…[when the cryo tank was stirred], some minute shift in the position of two of those barewires resulted in an electrical short circuit, which in turn ignited the Teflon, heating the liquid oxygen. About sixteen seconds later, the pressure in the O2 Tank 2 began to rise. The Teflon materials burned up toward the dome of the tank, where a larger amount of Teflon was concentrated, and the fire within the tank, fed by the liquid oxygen it was heating, grew fierce.
In the final four seconds of this sequence, the pressure exceeded the limits of the tank in about eleven microseconds, slamming shut the reactant valves on Fuel Cell 1 and Fuel Cell 3. Then the Teflon insulation between the inner and outer shells of the tank caught fire, as did the Mylar lining in the interior of the service module.
The resulting gases blew out one of the panels in the service module. That explosion also probably broke a small line that fed a pressure sensor on the outside of the O2 Tank 1, opening a small leak.
Once the service module panel blew out…”
One of the lessons that NASA learned from the Apollo 13 accident, and many other failures, was the importance of communication within such a complex organization. It was not the first time that the inherent complexity of such a large system, and the extended organization required to design, build and operate it, had led to a breakdown in communication that contributed to what was perceived as a “mechanical” problem.
In an effort to combat this complexity, over the last 50 years NASA has been at the forefront of developing and refining many of the best practices in a discipline known as Systems Engineering.
Among other things, systems engineers develop and implement formal organizational structures and processes that are specifically designed to improve the flow of information within a technical organization.
They organize “Integrated Product Teams” (IPT’s) that bring together specialists from different areas to work on designs from a more holistic perspective.
They organize and conduct formal design reviews that expose the technical work done by a small group of specialists to the wisdom and experience of a wider group of engineers.
They establish a structure of engineering review boards that use formal processes to ensure the technical integrity of the design.
NASA currently requires all of its programs to adhere to a set of formal requirements contained in NASA Procedural Requirement (NPR) 7123.1A, “NASA Systems Engineering Processes and Requirements” as part of its efforts to prevent the next Apollo 13. It also provides all of its engineers with a set of detailed guidelines NASA/SP-2007-6105 Rev1, “NASA Systems Engineering Handbook” for understanding and executing the processes required by the NPR.
Can We Do Better?
While these formal organizational structures and processes have been extremely successful in “flattening” (if not reducing the size of) NASA’s technical organization and fostering the kind of cross-functional communication required to prevent failures like Apollo 13, they are still far from perfect.
One problem is that they are costly to implement. The “technical bureaucracy” needed to foster this type of communication is not cheap and the formal processes take time to execute that can slow down the pace of a large project. Given the alternative (another Apollo 13), many of these additional costs, in both time and money, have become accepted at this point as a cost of doing business.
Another problem is that, despite the demonstrated effectiveness of this approach at reducing the potential for another Apollo 13-type failure, the possibility for a “process escape” always exists. In most cases it’s the things you didn’t think of that bite you.
While there will probably always be a need for the additional “rigor” provided by many of the formal organizational structures and processes associated with NASA’s approach to systems engineering, could the additional informal communication provided by Enterprise 2.0 have been enough to break the chain of events that led to Apollo 13?