Transterrestrial Musings  


Amazon Honor System Click Here to Pay

Space
Alan Boyle (MSNBC)
Space Politics (Jeff Foust)
Space Transport News (Clark Lindsey)
NASA Watch
NASA Space Flight
Hobby Space
A Voyage To Arcturus (Jay Manifold)
Dispatches From The Final Frontier (Michael Belfiore)
Personal Spaceflight (Jeff Foust)
Mars Blog
The Flame Trench (Florida Today)
Space Cynic
Rocket Forge (Michael Mealing)
COTS Watch (Michael Mealing)
Curmudgeon's Corner (Mark Whittington)
Selenian Boondocks
Tales of the Heliosphere
Out Of The Cradle
Space For Commerce (Brian Dunbar)
True Anomaly
Kevin Parkin
The Speculist (Phil Bowermaster)
Spacecraft (Chris Hall)
Space Pragmatism (Dan Schrimpsher)
Eternal Golden Braid (Fred Kiesche)
Carried Away (Dan Schmelzer)
Laughing Wolf (C. Blake Powers)
Chair Force Engineer (Air Force Procurement)
Spacearium
Saturn Follies
JesusPhreaks (Scott Bell)
Journoblogs
The Ombudsgod
Cut On The Bias (Susanna Cornett)
Joanne Jacobs


Site designed by


Powered by
Movable Type
Biting Commentary about Infinity, and Beyond!

« Back In DC | Main | Back In Space »

Yes, It's Fail Operational

There are a lot of misconceptions about NASA's decision to launch tomorrow even with another sensor failure like the one that caused the last attempt to be scrubbed almost two weeks ago. They're on full display by some of Alan Boyle's readers:

“It sounds like a lot of rationalization to me. It may not be a safety issue, but someone thought that all the redundancy was important, otherwise they would have just put in two or three sensors..."

or

“I do not agree with NASA's plans. The next mission into space should not have any ‘glitches’ or any other problems. I do not believe there is an imperative reason for NASA to launch the shuttle, so they should correct any problems before doing so. Not only would it be irresponsible of NASA to launch Discovery, but also it would be a national tragedy and embarrassment if anything were to happen to the shuttle or the fine men and women aboard.”

or

“What do I think of NASA's plan? Same as always. Say one thing, do another. If triple redundancy wasn't necessary, they wouldn't have designed it in. It is necessary. The fuel-gauge issue illustrates that not only has NASA not learned its lesson, but that it is incapable of learning the lesson. Requirements are not optional. They are requirements. That alone should decide the issue. Lacking that, we have seen the same faulty safety culture destroy two vehicles and their crews. We are using a decades-old spacecraft. That isn't a reason to be more tolerant of faults, it's a reason to be less tolerant! If they can't get the shuttle safely prepared, as I feel they can, without resorting to 'management decisions' which put the crew at risk, then the shuttle should not fly. End of story. If they want to automate the shuttle so no lives are at risk, fine.

You get the picture. These people clearly don't understand the issues.

And note the very last. This person seems concerned about nothing except whether or not lives are at risk--lives that are risked willingly, and are not people he knows, and he (like all of the commenters quoted) seems completely indifferent to cost, or schedule, or whether or not we lose billions of dollars worth of hardware.

All four sensors are not required for a safe launch. If that were true, then they'd need five, or six. Only one (as far as I know) is sufficient to do the job. That's what redundancy is all about.

The problem is (as I and George William Herbert noted a few days ago), at least as I understand it, that NASA originally designed to be fail operational, but in the hysteria over Challenger and Columbia changed the rules to declare it fail safe instead. In other words, it was originally designed with the idea of having sufficient redundancy to allow launch with a single sensor failure, to allow operability, but they later (as a result of tightening things up after the disasters) changed the launch commit criteria to scrub under those circumstances, which is what happened on July 13th.

Any system as complex, with as many components as the Shuttle, must have adequate redundancy to allow safe operations with a failure of some components, because there are so many of them that some are bound to fail statistically, and if we mindlessly demand perfection on every flight, we'd never fly. This is the airline philosophy, and it used to be NASA's, but they've gotten gun shy, at least on this particular issue. But in making an a priori decision now to go with a failed sensor at launch, they're returning to a common-sense approach, for which the system was designed.

Now, here's the deal. This is a tough problem to troubleshoot, but the prevailing theory seems to be bad wiring. So they've come up with a clever solution. They're going to swap the wiring between the sensor that failed, and another one. Now, either all of the sensors will check out fine tomorrow, in which case they shrug their shoulders and launch, per the new, stricter rules. Or else, they'll see a failure, except it will be on the sensor that they changed, which will mean that the wiring was the problem, and they now understand the issue. Under those circumstances, they can waive the rules (and this won't be a "last-minute decision," as I heard it reported this morning--it will be a well-thought-out one that they've been thinking about for days) to launch knowing that they still have fail-safe redundancy. The only circumstances that should cause a scrub tomorrow (other than weather, or finding some new problem) is if there is some new unexpected failure associated with the sensor system, but an expected failure (confirming wiring hypothesis) or no failure should allow a (safe, or as safe as any Shuttle launch can be) launch.

Now as to these demands that NASA not launch until the sensor is fixed, how much are those making the demand willing to spend (noting that the money belongs to all of the taxpayers, not them individually)? And to what end?

Someone once said that when failure is not an option, success gets very expensive.

Right now, NASA's hypersafety philosophy has made spaceflight hyper expensive (though not particularly safe). Rather than unrealistically making failure not an option, we need to embrace the fact that failures will occur occasionally. What we have to do is make sure that failures aren't as expensive as they were in the case of Challenger and Columbia (and numerous other lesser NASA program failures). What that means is making it cheap to fail, which in turn means making it cost much less to make attempts. That won't happen until we develop much more robust systems, with much more activity. But investing further millions into Shuttle (not only in terms of money spent fixing things, but the costs of continued delay, which are substantial) in a futile effort to make it any safer than it currently is, is a fool's errand. We should have flown a couple years ago.

[Update, one hour before scheduled launch]

Jim Oberg has more.

[One more update, forty minutes before scheduled launch]

Yishai Mendelsohn has related thoughts.

Posted by Rand Simberg at July 25, 2005 02:29 PM
TrackBack URL for this entry:
http://www.transterrestrial.com/mt-diagnostics.cgi/4078

Listed below are links to weblogs that reference this post from Transterrestrial Musings.
A Useless Launch
Excerpt: Discovery had a successful launch. I have mixed feelings. I'm glad NASA pulled it off. I'm always proud of American...
Weblog: The American Mind
Tracked: July 26, 2005 06:05 PM
A Useless Launch
Excerpt: Discovery had a successful launch. I have mixed feelings. I'm glad NASA pulled it off. I'm always proud of American...
Weblog: The American Mind
Tracked: July 26, 2005 06:05 PM
Shuttle Launch Successful, but not Perfect
Excerpt: During yesterday's launch, a camera on the external fuel tank recorded ``what appeared to be a small fragment of tile coming from Discovery's underside on or near the nose gear doors,'' NASA said. A later image showed ``an identified piece departing f...
Weblog: Chinese Adventure Blog
Tracked: July 27, 2005 03:26 AM
The Space Shuttle Should Be History 2
Excerpt: ... and maybe it is. The recent launch of Discovery was performed safely and smoothly, but now it appears that despite years of work and hundreds of millions of dollars of engineering Space Shuttle foam keeps falling off. HOUSTON, July 27 - NASA suspen...
Weblog: Michael Williams -- Master of None
Tracked: July 28, 2005 09:31 AM
The Space Shuttle Should Be History 2
Excerpt: ... and maybe it is. The recent launch of Discovery was performed safely and smoothly, but now it appears that despite years of work and hundreds of millions of dollars of engineering Space Shuttle foam keeps falling off. HOUSTON, July 27 - NASA suspen...
Weblog: Michael Williams -- Master of None
Tracked: July 28, 2005 09:41 AM
The Space Shuttle Should Be History 2
Excerpt: ... and maybe it is. The recent launch of Discovery was performed safely and smoothly, but now it appears that despite years of work and hundreds of millions of dollars of engineering Space Shuttle foam keeps falling off. HOUSTON, July 27 - NASA suspen...
Weblog: Michael Williams -- Master of None
Tracked: July 28, 2005 10:07 AM
The Space Shuttle Should Be History 2
Excerpt: ... and maybe it is. The recent launch of Discovery was performed safely and smoothly, but now it appears that despite years of work and hundreds of millions of dollars of engineering Space Shuttle foam keeps falling off. HOUSTON, July 27 - NASA suspen...
Weblog: Michael Williams -- Master of None
Tracked: July 28, 2005 10:07 AM
Discovery
Excerpt: I remember a fateful Tuesday morning just before lunch being pulled from my class room by the principle of my school, way back in January 28th, 1986. Normally, being pulled out of a class by the principle is never good....
Weblog: The Crazy Rants of Samantha Burns
Tracked: August 1, 2005 07:20 AM
Comments

Msnbc (Oberg): Why NASA's making the right decision
Analysis, not wishful thinking, guiding shuttle officials
http://www.msnbc.msn.com/id/8700458/
COMMENTARY By James Oberg //NBC News space analyst
Special to MSNBC // 2:40 p.m. ET July 25, 2005


I'll be doing spots on msnbc.com all day tomorrow, if i touch my fingertips together consider it a secret greeting to habzoners ...

Posted by Jim O at July 25, 2005 03:24 PM

Many years ago now I had the dubious pleasure of having to conduct an independent (of the Range Safety Office) range safety evaluation for operating the shuttle from KSC. The sole criterion for range safety purposes was the question of how many innocent people on the ground might be killed with the constellation of flight paths then under consideration. There was no concern for the astronauts in the calculations.

The shuttle configuration was changing pretty frequently, which added to the fun. At that stage, for example, there was a manned flyback first stage and the orbiter had only two engines.

As it turned out, the weak point from a safety standpoint was a failure of either of the orbiter engines to light at staging. For our failure rate data we decided to use a mix of missile component failure rate data and C-135 failure rate data. As it turned out, there was not as much variance as one might expect. A new (for spacecraft) consideration was the effects of redundancy on mission failure, as well as the definition of mission failure for range safety purposes, which revolved around the question of having an intact impact.

The flight profiles included north sun-synchronous and south polar launches, and launches on a number of azimuths that are not commonly used now. This combination of launch profiles and failure configurations had the consequence of dropping the fully fueled orbiter into downtown Detroit on the north sun-synchronous, into (I think) Bogota, Columbia on the south polar, and into Lagos, Nigeria on one of the easterly azimuths. We used a 360-65 to crunch the numbers. We modeled the population density of the whole world, with higher resolution near the Cape, and simulated all of the flight profiles with random failures at the assumed failure rates.

If I remember correctly (35 years ago,now) the probability of killing someone on the ground came out to about one in three hundred flights. We briefed the results all the way up to Chris Kraft.

I don't know how realistic our study was, but it was the best we could do at the time. I do note that the present shuttle has three engines, and none of the overland flight profiles have been used by the shuttle. Maybe we did some good, or maybe it just worked out that way for other reasons.

Posted by John F at July 25, 2005 05:23 PM

Well said, all. Fear is for weaklings. If you're gonna go, go boldly. As far as I'm concerned (and I'm only a concerned taxpayer), if the guys in the orange suits are ready and willing, then strap them in, light the candle, and wish them godspeed.

Kick ass, Discovery.

Posted by Dave G at July 25, 2005 07:13 PM

Keep in mind that the failures have not been consistent, in either sensor location or indication, since the first tanking test failure. After the first failure the tank was swapped and the failure repeated. The orbiter box was replaced and a different failure appeared.

The software will fail a sensor that shows dry on the first pass leaving three for a majority vote. Three failures, all dry, are needed for premature MECO. OTOH, three wet failures lead to depending on the MEC to prevent an SSME explosion.

Posted by anon@jsc at July 25, 2005 08:49 PM

Your point of fail operational versus fail safe is a good one. But I think of the biggest realizations about the Shuttle came after Challenger, when NASA began performing its first real probabilistic risk assessment for the Shuttle. These assessments were based from the experience of the military and nuclear industry, and brought home some startling realizations about redundancy.

As I understand it (since I'm below the mean age of the aerospace industry), back in the 1970s redundancy was chosen as a principle of Shuttle design. Use multiple systems when you can to get the reliability up. After all, if a component's failure probability is 1 in 100, then by having three of them you just made that 1 in 100^3, or 1 in 1000000... right?

In the 1980s, the probabilistic risk assessment community began to become aware of a phenomenon called 'common cause'. The failure rates for many systems designed with redundancy were far higher than predicted. Take our three components, each with 1 in 100 failure probability. Now if all three are assembled at the same time... maybe they were all assembled incorrectly? Or maybe they were all exposed to some environmental effect that has damaged all three of them? Suddenly your system reliability has dropped from 1 in 1000000 to something like 1 in 200 or 300, simply because of this common cause effect. What knocks out one component may mean the other components will suffer from the same problem.

Ouch. So much for redundancy.

This common cause effect, that all three systems might fail for the same reason, means that redundancy in itself is not really a good design principle. I think future spacecraft, whether private or government, will have to have systems designed with reliability in mind, instead of redundancy.

Posted by Gavin Mendeck at July 25, 2005 10:54 PM

"..expense gets very expensive"
i think it should be success gets very expensive

Posted by kert at July 26, 2005 01:16 AM

Great post. My thoughts exactly. There is always inherent risk. Ensuring quadruple redundancy is not the way to ensure a safe flight. It only makes things prohibitively expensive and heavy. NASA thought through all the issues and decided it is worth the risks to fly with only three sensors. And the astronauts themselves have agreed to put their own necks on the line, fully understanding the situation.

Posted by Yishai at July 26, 2005 06:30 AM

“If you are distressed by anything external, the pain is not due to the thing itself but to your own estimate of it; and this you have the power to revoke at any moment.”
~ Marcus Aurelius

Posted by Josh Reiter at July 26, 2005 06:57 AM

Engineers prefer to design things that fail in a forgiving way. What happens if all the sensors failed. How would it affect a launch (pretend the media didn't exist and some real cowboys were at the controls.)

These are basically gas guages I'm lead to believe. They plan on using all the fuel right? If the gas guage on my car fails, the car is still safe. How is it with the shuttle?

The bothersome part is they suspect grounding. Is this a fire (explosion) hazard?

Posted by ken anthony at July 26, 2005 07:44 AM

Ken

No, they are not simple gas gages as you think of them. They determine when the tanks are approaching empty and signal that the main engine pumps are to shutdown. (I am not certian ,but they may be a backup to shutdown the pumps, put probably not as efficient as the sensors).

The pumps can destroy themselves if the continue to operate without a fuel flow.

If the sensors fail off it could lead to a premature shutdown of the main engines; if they fail on it could lead to the destruction of the main pumps. Eitherway, it could make for a bad day.

Posted by michael at July 26, 2005 08:21 AM

Ken, you do have a nice theory. Michael is correct though. What you are missing is the operating temperature of the SSME. Just prior to launch, you can see gas flowing through tiny nozzles near the engine. This is fuel flowing from the tank to keep the engines cool. It continues through the entire launch profile. When the fuel runs out, the engine will continue to spin and find its own fuel without the benefit of coolant. At the operating temperatures, metal itself will burn easily as a fuel. The SSME will begin to consume itself. Most automobile engines don't operate at these temperatures, but even when they do, if you remove the air coolant or the liquid coolant, you'll end up with a seized engine when the pistons begin to melt (lost engine) or a rod will be thrown (flying debris potentially destroying other systems).

For the other discussion, reliability and redundancy are both necessary. For the advancement of science and technology, focusing on improving the reliability typically leads to newer technology that has benefit elsewhere, but it is usually cheaper to solve problems using redundancy (except with redundancy leads to heavier launch vehicles to carry the redundant loads).

I agree with Rand on the standpoint of just how much money do we want to spend to make the fuel tank safer for a system that will be retired in 4 years. More specific, how many more millions to make a fuel gauge (yes, very important - read above) more reliable for a single launch? I'm glad NASA decided to trust its redundancy and press forward.

Posted by Leland at July 26, 2005 09:41 AM


Post a comment
Name:


Email Address:


URL:


Comments: