Transterrestrial Musings

Front Page
Archive

Reader's Favorites
Media Casualties Mount

Administration Split On Europe Invasion

Administration In Crisis Over Burgeoning Quagmire

Congress Concerned About Diversion From War On Japan

Pot, Kettle On Line Two...

Allies Seize Paris

The Natural

Gore Book Sales Tank, Supporters Claim Unfair Tactics

Satan Files Lack Of Defamation Suit

Why This Blog Bores People With Space Stuff

A New Beginning

My Hit Parade
Instapundit (Glenn Reynolds)
Tim Blair
James Lileks Bleats
Virginia Postrel
Kausfiles
Winds Of Change (Joe Katzman)
Little Green Footballs (Charles Johnson)
Samizdata
Eject Eject Eject (Bill Whittle)

Space
Alan Boyle (MSNBC)
Space Politics (Jeff Foust)
Space Transport News (Clark Lindsey)
NASA Watch
NASA Space Flight
Hobby Space
A Voyage To Arcturus (Jay Manifold)
Dispatches From The Final Frontier (Michael Belfiore)
Personal Spaceflight (Jeff Foust)
Mars Blog
The Flame Trench (Florida Today)
Space Cynic
Rocket Forge (Michael Mealing)
COTS Watch (Michael Mealing)
Curmudgeon's Corner (Mark Whittington)
Selenian Boondocks
Tales of the Heliosphere
Out Of The Cradle
Space For Commerce (Brian Dunbar)
True Anomaly
Kevin Parkin
The Speculist (Phil Bowermaster)
Spacecraft (Chris Hall)
Space Pragmatism (Dan Schrimpsher)
Eternal Golden Braid (Fred Kiesche)
Carried Away (Dan Schmelzer)
Laughing Wolf (C. Blake Powers)
Chair Force Engineer (Air Force Procurement)
Spacearium
Saturn Follies
JesusPhreaks (Scott Bell)

Science
Nanobot (Howard Lovy)
Lagniappe (Derek Lowe)
Geek Press (Paul Hsieh)
Gene Expression
Carl Zimmer
Redwood Dragon (Dave Trowbridge)
Charles Murtaugh
Turned Up To Eleven (Paul Orwin)
Cowlix (Wes Cowley)
Quark Soup (Dave Appell)

Economics/Finance
Assymetrical Information (Jane Galt and Mindles H. Dreck)
Marginal Revolution (Tyler Cowen et al)
Man Without Qualities (Robert Musil)
Knowledge Problem (Lynne Kiesling)

Journoblogs
The Ombudsgod
Cut On The Bias (Susanna Cornett)
Joanne Jacobs

The Funny Pages
Cox & Forkum
Day By Day
Iowahawk
Happy Fun Pundit
Jim Treacher
IMAO
The Onion
Amish Tech Support (Lawrence Simon)
Scrapple Face (Scott Ott)

Inadvertent Comic Relief
Warblogger Watcher (Cowardly Anonymous Idiotarians)

Space Policy Links
Space Future
The Space Review
The Space Show
Space Frontier Foundation
Space Policy Digest BBS

AWOL
USS Clueless (Steven Den Beste)
Media Minder
Unremitting Verse (Will Warren)
World View (Brink Lindsay)
The Last Page
More Than Zero (Andrew Hofer)
Pathetic Earthlings (Andrew Lloyd)
Spaceship Summer (Derek Lyons)
The New Space Age (Rob Wilson)
Rocketman (Mark Oakley)
Mazoo

Site designed by

Biting Commentary about Infinity, and Beyond!

« Iraq Is Not Ulster | Main | My Kind Of Camping »

Armadillo's Prospects

I haven't commented on this, but the New Scientist has a fairly extensive story of Armadillo's bad weekend.

What do I think?

First of all, full disclosure. I'm working, as I write this, for one of Armadillo's competitors, on SBIR proposals. But it's a close-knit community, even among the competitors.

And having said that, I don't think it's a disaster for Armadillo. These kinds of things are going to happen along the way, as we start to understand how to develop operable and affordable space transports (a goal that has eluded both the military and NASA, almost half a century after the dawn of the space age). I also find it interesting (and I have to confess, somewhat amusing) that the failure was fundamentally a software failure, given the pedigree of the company that provided the funds that created the vehicle:

Post-crash analysis has revealed what went wrong – the automatic shutdown that should have triggered when Texel first touched down did not occur. That's because the computer was mistakenly told to expect a stronger signal from the touchdown sensor, beyond what it is actually capable of producing.
But the touchdown did have a big enough effect to jostle the onboard GPS unit that Texel relied on to track its motion. The disturbance caused faulty readings from the unit, confusing the vehicle.

"It thought that it was plummeting to earth very quickly, so it fired the engine to reduce the speed," Eaton told New Scientist. "Well, it actually wasn't going down, so this caused it to start going up very quickly." That is when Carmack triggered the manual shutdown.

People tend to focus on the hardware problems of building space vehicles, but the software problems are in many ways more daunting (particularly to the degree that software is used to reduce weight of the vehicle). And it's a major contributor to cost. Norm Augustine wrote a book more than a couple decades ago, in which he pointed out that the only way for aerospace companies to continue to increase the cost of aircraft (particularly for the Pentagon) without increasing the weight (and they were reaching weight limits that would keep the aircraft on the ground) was to add software.

Anyway, if it does turn out to be disastrous, it's because John hasn't transitioned this activity from a hobby, shared by friends, to a real business, with real employees and risk-management plans (hint, hint, it's one of the things I do). It's in fact a good test of the company (such as it is) and, as Henry Spencer notes in the New Scientist piece, it makes the race even more interesting.

And, such is the state of this nascent industry, may the best team still win. We need lots of successful companies to have a successful industry.

[Thursday morning update]

There's a lot of discussion in the comments section that seems to infer that I was recommending a more rigorous software verification process (e.g., ISO). I was not. When I was describing risk management, it more related to things like backup hardware, etc. I was not implying that this was preventable, at least not in that way.

Posted by Rand Simberg at August 22, 2007 02:32 PM

TrackBack URL for this entry:
http://www.transterrestrial.com/mt-diagnostics.cgi/8065

Listed below are links to weblogs that reference this post from Transterrestrial Musings.

Comments

Interesting point about the hobby-->business transition. I've often wondered how non-aerospace companies handle software reliability/QA. The only data point I have came from the banking industry, where I was shocked how they hacked code in 'real time'.

Posted by Dick Stafford at August 22, 2007 03:09 PM

And in the end, we all re-learn the lesson that people knew for years: That making rockets is freakin' expensive no matter whether it's private industry or public, because in rocketry you learn through error, and rocket errors result in explosions.

Posted by DensityDuck at August 22, 2007 03:10 PM

I think risk management may a bit far down the list of things Armadillo needs to implement.

Posted by Karl Gallagher at August 22, 2007 03:20 PM

There are huge volumes written and discussed around software quality, and it generally costs a small fortune - especially in safety critical software.

Getting accreditation to work as a subcontractor or sub contract engineering company on projects where there is a safety element is quite daunting.

Banking doesn't seem to have quite the standards I've seen in Telecoms and other areas. From speaking to my wife who used to work in banking IT projects, a lot of it is they test in trial environments then go live in parallel to the systems over a weekend and hack it to make sure the results are the same and then don't touch it.

In telecoms I've seen liquidated damages conditions which can leave software vendors open to covering the loses incurred by entire mobile phone networks should a handset fail, so the testing and failure cases become quite extreme. I suspect its harder in aerospace. Airbus had quite a time ironing out all the bugs in the Airbus "die by wire" full envelope controls and they're still finding catastrophic bugs.

Posted by Dave at August 22, 2007 03:37 PM

I'm not necessarily saying they need to have a lot of controls in this early stage, and quality procedures don't ensure a zero failure rate. However, they do have their eyes set on things bigger than a lander prize. I also know if I were flying on a rocket with software controls that I'd be happier knowing the control systems weren't hacked out over a weekend by one person and with limited testing (statement in an attempt to make a point only).

As for my banking example, the only loss there was/would be money in the bottom line...nothing as immediately obvious or exciting as a rocket failure.

Posted by Dick Stafford at August 22, 2007 04:00 PM

I've always said the software can kill you just as fast as the hardware. And I wasn't even talking about aeronautics at the time. :S

Posted by Bryan Price at August 22, 2007 04:32 PM

It is a bit murky to call, but I would classify the failure as a sensor failure, rather than a software failure. I have changed the software so that it will deal with the sensor behaving that way in the future, but every time a new sensor failure mode comes up, it would be unfair to blame the software for not predicting it.

It is easy for some people to deride software as "hacked together" if it doesn't conform to an ISO development process, but that is almost always a sign of ignorance.

Real software, in the real world, is developed in an iterative fashion, and there is a strong correlation between productivity and the speed of iteration. It is possible to develop software in other ways, but the much touted space shuttle software development path is probably a full three orders of magnitude less efficient than something done in startup mode. Since that is still a small cost relative to the full space shuttle program, it might possibly have been justified, but it doesn't mean it is a good way to start from a clean sheet of paper.

I do find it interesting that there is a decent contingent in the NewSpace crowd that is fairly software phobic. Software is one of the biggest advantage we have today, and replacing physical parts with code is one of the most productive things you can do.

An interesting question for people, that highlights their beliefs about engineering: Would you rather fly on the maiden voyage of a rocket that was designed and built to the highest ISO / MIL specs, or a rocket that was built in a garage, but had made 100 successful flights in a row?

In that case, only a blithering idiot would think the ISO rocket was safer, but finding the exact break point is more revealing. What if the ISO rocket had three good flights under its belt? What if the garage rocket had ten good flights, but the previous airframe had exploded, resulting in an engineering change? Do you think your odds would be better if Space Ship One was pulled back out for a flight, or on the maiden flight of Ares I?

I have a very explicit strategy to run our program so that failure is acceptable, and iterate as fast as we can. We have backup vehicles for a reason, and the big guys used to understand that back in the 50's. We are flying again on Saturday, so this didn't even slow us down.

John Carmack

BTW, my real email address is rejected as "questionable content"...

Posted by John Carmack at August 22, 2007 06:18 PM

John,
Best wishes going fwd. As someone from high tech & software I could not agree more with you about ISO. Your physical redundant destruct systems worked as designed. A sensor misread was one of those things us designers don't plan for on the 1st iteration. Me, I'll fly the rocket with a strong success record and LOTS of iterations that smoke out the unexpected failure mechanisms vs a vehicle designed to the international standards. Failure IS an option and teaches us.

Posted by philw at August 22, 2007 06:34 PM

Thanks for the comments, John. I had no intention to deride software. I do think that its intelligent use will be one of the ways that we reduce weight and cost of space vehicles. Working only on second-hand reports, I'll accept your contention that it was a sensor failure, rather than one of software, per se. It's always a contentious issue, whether a failure like this is one of software or hardware.

Regardless, good luck for both prizes and business. As I said, the more the merrier for a healthy industry. I (and many others I'm sure) am greatly appreciative that you're putting your money on the line for the cause.

Posted by Rand Simberg at August 22, 2007 06:44 PM

John,

You bring up some good points and in the extreme I'd agree about the 100 flights on a 'garage' rocket vs. ~0 on a big development. As the numbers converge, I might think differently. I'd also look at the relative complexity of the vehicles and missions.

I didn't mean to imply that your system is 'hacked' together. I have no idea whether it is or isn't. Formal processes (ISO/SEI) tend to be suited more towards large systems with lots of contributors and a 'small' control system might not fit the same mold. And, you don't have to follow ISO/SEI processes have a valid methodology.

Anyway around it, glad it wasn't too big a bump in the road. I really want to see you succeed!

Posted by Dick Stafford at August 22, 2007 06:55 PM

i work as an embedded coder. we dont follow any of the ISO processes, however, we do have a quite small but sharp QA team that keeps coming up with darn nifty automated testing schemes that weed out lots of bugs/unpredicted behavior that probably would even never come up in real world usage.
That said, all the careful code design, analysis and "paper" testing is still no replacement for real hardware testing.

Posted by kert at August 23, 2007 12:28 AM

I don't know if anyone here has bothered to read the posts at Slashdot about Armadillo's mishap, but if the types of people that have been posting there are common, I can't think of a better excuse to keep developmental progress a secretive affair. There's a lot of smug morons out there who don't seem to respect the fact that mishaps happen along the way to promising technology.

Posted by at August 23, 2007 03:47 AM

Posted by KG232 at August 23, 2007 03:48 AM

Posted by KG232 at August 23, 2007 03:50 AM

I looked around the Slashdot comments, and there were a lot of idiots, a couple of trolls, some "I've got no idea but I wan't to comment anyways" people and quite surprisingly a few rather well informed posters as well.

Also a Matthew Ross gave some insight to one of these "I don't believe cause there's no youtube video" guys quite nicely...

Rand, if you read Johns explanation on arocket or the slashdot comments where it was also posted, you'd see that the root cause of the crash was not so much sw-related, as much as it was related to the fact that Pixel and Texel had different IMU's and the specs on their accelerometers were not the same...

John, Good Luck with the AST permit process for the Modular vehicle!

Posted by K.L. at August 23, 2007 05:06 AM

Thanks to Armidillo for such an open and interesting program. I cautiously venture the opinion (I also used to program embedded code) that this was not a software error but a purchasing/supply chain error. Both the software and the (new) sensor were operating to their specifications. A supplier part (number) change (assuming they do provide part numbers or some indication of change) may not have triggered a review of how it might affect the system. As for using GPS for vertical speed metering, how would that compare to using inertial measurements?

Posted by Lindsay at August 23, 2007 05:46 AM

they might want to revisit the decision to forgo the radar altimeter at some point in their plans.

Posted by kert at August 23, 2007 07:12 AM

A long time ago, software developers realized that it was virtually impossible to certify that code was error free. The amount of time required necessary to exhaustively test all possible iterations of even some simple algorithms could run into the years or even centuries. On top of that, bugs can (and do) exist everywhere from the microcode inside of a CPU up through the compiler and related libraries* to the OS (if there is one in an embedded system). Since that realization, the focus of testing has been to find and correct errors. A successful test is one that finds an error. This recent Armadillo crash - as unfortunate as it was - found an error that is being corrected. It's similar to the error that killed the Mars Polar Lander several years ago.

*Back in 1990, I was a crew commander at the Cobra Dane intelligence radar in the Aleutians. One day, our old computer kept crashing when we tried to do a certain routine operation. It turned out that a recent software change had uncovered a bug that had existed in the FORTRAN compiler's math library since 1973 but had never been detected before. The right set of circumstances came together that day to unveil a bug that had laid dormant for 17 years.

Posted by Larry J at August 23, 2007 07:18 AM

After reading the article I was going to go on a tirade of software vs. control algorithm but what do you know, John Carmack addressed this himself, so the point is moot.

Point is, software error is when the software does not do what it is supposed to, and control/sensor error is when you face an event that your control algorithm was just not prepared to handle, like this case or the infamous slosh instability of Falcon 1. No ISO schmiso software perfection is going to help you there.

The worst part is that boundary though when you are not flying and you're not in the ground, lots of things can happen at unforseen times. Ask the Mars Polar Lander.

Posted by Ian at August 23, 2007 07:33 AM

John

I completely agree with you on this one. The aerospace industry as a whole is very phobic when it comes to modern computers and control system software. The only real solution to the problem is exactly what you are doing, testing, testing, testing, and then some more testing.

Although I bet today you would also agree that configuration control is important!!

There is an old saying in the computer design world.

De cow jumped over De Moon, Defeat before Detail!

Keep jumping.

Dennis

Posted by Dennis Wingo at August 23, 2007 01:14 PM

by the way, in control systems, any sensor reading that can be directly or through derivative interpreted as out of realm of physical possibility, should either be discarded or median filtered.

i.e. if your IMU or GPS shows you are accelerating downwards at more than 1G while rockets are pointed down, or you are speeding up at greater acceleration than your rocket is actually capable of with empty fuel tanks, the reading is obviously bogus. getting more than a few of such readings is a good indicator for declaring this sensor malfunctioning and either downgrade its trust or switch it off completely and use your backup.

Ive read couple of postmortems on DARPA Grand Challenge where controls went bonkers because they were 100% trusting obviously malfunctioning sensors in unfavorable conditions. Had the control loop put less trust in those and used its backups, they would have made it. This failure mode was pretty frequent amongst the competitors.

i dont know if this applies here, i.e. maybe the GPS jolt that you saw at touchdown was in realm of possibility, but nevertheless i thought i'd throw it out here.

Posted by reader at August 23, 2007 01:42 PM

Of course the industry prefers hardware! It's pretty difficult to SEU a bent pipe.

Posted by DensityDuck at August 23, 2007 04:32 PM

Post a comment