Transterrestrial Musings  


Amazon Honor System Click Here to Pay

Space
Alan Boyle (MSNBC)
Space Politics (Jeff Foust)
Space Transport News (Clark Lindsey)
NASA Watch
NASA Space Flight
Hobby Space
A Voyage To Arcturus (Jay Manifold)
Dispatches From The Final Frontier (Michael Belfiore)
Personal Spaceflight (Jeff Foust)
Mars Blog
The Flame Trench (Florida Today)
Space Cynic
Rocket Forge (Michael Mealing)
COTS Watch (Michael Mealing)
Curmudgeon's Corner (Mark Whittington)
Selenian Boondocks
Tales of the Heliosphere
Out Of The Cradle
Space For Commerce (Brian Dunbar)
True Anomaly
Kevin Parkin
The Speculist (Phil Bowermaster)
Spacecraft (Chris Hall)
Space Pragmatism (Dan Schrimpsher)
Eternal Golden Braid (Fred Kiesche)
Carried Away (Dan Schmelzer)
Laughing Wolf (C. Blake Powers)
Chair Force Engineer (Air Force Procurement)
Spacearium
Saturn Follies
JesusPhreaks (Scott Bell)
Journoblogs
The Ombudsgod
Cut On The Bias (Susanna Cornett)
Joanne Jacobs


Site designed by


Powered by
Movable Type
Biting Commentary about Infinity, and Beyond!

« Maybe Brits Don't Make Suicide Bombers After All | Main | Could Weather Succeed Where The CIA Failed? »

Airlines Are Fail Operational--NASA Is Only Fail Safe

As I pulled into Titusville last week to the news that the launch had been scrubbed due to a sensor failure, I had similar thoughts to the following from George William Herbert, posted at sci.space.policy today, but he wrote them down, and I didn't:

"Something has been nagging me since the current round of hydrogen depletion sensor problems started on Discovery's launch attempt, and I haven't seen any good comments come up on the newsgroups or other commentary, so I'm going to launch it out there.

The Shuttle design was intended to be highly reliable and to have multiple redundant sensors and systems in most key areas. By and large, other than structural items where it's hard to have another whole heatshield under the first one, they have had good success with redundancy covering flight faults and avoiding nasty aborts and the like.

There is a key difference to be seen between the behaviour last week trying to launch Discovery, though, and what typically happens with say a large 747 jetliner and its typical operational cycle.

Airliners have what's called a Minimum Equipment List. This covers a set of systems that have to be operational in order for the vehicle to safely depart on a flight. The MEL is usually designed so that a number of minor faults are tolerated, and in areas where a fault would cause the aircraft to have to stay and be repaired, where possible an extra set of redundancy is applied so that if four units are needed for safe suitably redundant flight operation, five are installed, and the MEL is four. One sensor or navigation system or whatever can be completely broken, and the required flight safety level is still met with the remaining units.

Airliners are designed that way because it costs serious money when they can't depart on time... either they have to be repaired in a hurry, which means lots of technicians at each airport and lots of expensive spare parts stocked everywhere (plus, a long enough operating cycle to accomplish the repairs in), or you have to scramble to find another plane to shift to the flight whose aircraft is down with a gripe, and then shift another plane to cover for the one you grabbed, and so on.

Shuttle was designed with an adequate level of systems redundancy for safety considerations, in most systems. It was not designed with an adequate level of systems redundancy for operational considerations. The cost per day of a Shuttle sitting on the pad, the ops crews and the control room crews and the costs of a rollback and destacking are all very significant. The opportunity cost of not being able to fly on time is also not at all a minor issue, with Shuttle's life span limited by a currently hard deadline and too many ISS flights remaining to get done between now and then.

Redundancy is often described in "N+1" or "N+2" or "2N" terms; shorthand for one or two more units than are required for safe operation, or twice as many as are required. MEL logic really goes to a different level. We should really be looking to "(N+1)+1", or both safety redundancy and an operational redundancy margin. Defining the safety redunancy factor as the N plus or multiplied by whatever, we can then define an operational redundancy factor, consisting of some margin on top of the minimum safety requirements. In shorthand, let's say O for Operational Factor = (required safety factor including margins), or for example O = N+1 . The operability factor would then be, for example, O+1 or 0+2, with the additional operability margin depending on the maintainability of the parts.

Future reusable spacecraft and their operators generally already have a clue about these issues, but it bears repeating in public to make the point. The capsules I am working on should not have to be destacked and dissassembled if one out of a set of four units fails while we're on the pad; either there should be a fifth, or three should be adequate for safe flight including safety margins, and listed in the MEL. The same should go for any other manned orbital project.

Not every system can be made this redundant, but as Discovery is showing, there are many systems for which safety dictated enough redundancy that adding an operability margin on top of that would have not been that difficult. Two wires in the shuttle/tank interface, one more sensor unit, a few pounds of payload capacity lost... and how many millions of dollars lost destacking Discovery the first time, and in this launch delay now?

Thin margins kill costs."

[Copyright 2005, by George William Herbert]

[Update a few minute later]

Via Clark Lindsey, here's a good description of the sensor that failed from Bill Harwood.

I should also mention that there's a good discussion of the problems associated with troubleshooting this problem over at sci.space.policy. Some of the posters there are theorizing that it's a separation of an electrical conductor that only occurs at cryo temperatures (if so, it would likely be due to differential thermal expansion). They also point out the high costs of figuring out just where it's happening to the degree necessary to have confidence in flying again. And as always, it points out the fragility of the system, and the danger of relying on a single hardware concept for all of NASA's human exploration goals. Because this is an element of the external tank, which would be common to all Shuttle-derived heavy lifters, our ability to get to the Moon would be shut down until this issue was resolved.

Posted by Rand Simberg at July 17, 2005 06:21 PM
TrackBack URL for this entry:
http://www.transterrestrial.com/mt-diagnostics.cgi/4035

Listed below are links to weblogs that reference this post from Transterrestrial Musings.
Comments

It also demonstrates a cost of using cryogenic propellants, the cost of putting sensors *inside* a large tank, and of using engines with pumps that explode if they are still running when a propellant line runs dry.

Posted by Paul Dietz at July 17, 2005 07:00 PM

Interestingly, according to the report I found at http://www.cnn.com/2005/TECH/space/07/18/shuttle.options/index.html, the sensor that failed was one of four installed on the external tank, of which only two are needed for the engines to function correctly. Using the analogy in this post, this system already has an (N+1)+1 margin, but shuttle managers are choosing to delay the launch as soon as one of the components fails. The story did mention that one option under consideration is launching with only three good sensors, which would effectively be a decision to launch with N+1 or the safety margin.

Of course, given that a failure in this particular subsystem means that either the SSMEs perform an inflight self-disassembly or we get to try those fun RTLS and TAL abort modes, it could well be that NASA has decided that N+2 is the acceptable safety margin here, in which case there's no operational margin. However, in my opinion this only highlights the problem with Shuttle, and perhaps with large manned space programs in general- because the flight rate is trivial, the vehicle never leaves the experimental stage, and we never figure out what is an essential safety margin and what is an operational margin. I've never heard of such a thing as an MEL for the Shuttle, and I suspect there isn't one, and failures are resolved on a case-by-case basis.

Posted by Jeff Dougherty at July 18, 2005 07:41 PM


Post a comment
Name:


Email Address:


URL:


Comments: