Starliner Woes

It continues to look worse and worse for Boeing. I’m almost starting to wonder if it will ever fly. SpaceX can do the job for less money, and it may not be that long before either Dreamchaser or Starship is flying (though it’s not clear that the latter will be capable of docking with the ISS).

[Update a few minutes later]

Comments over there are (deservedly, IMO) brutal.

[Update Monday afternoon]

“We don’t know how many software errors we have.”

[Bumped]

51 thoughts on “Starliner Woes”

  1. On a positive note, this is why you do testing. Except the problem is NASA/Boeing test for success when doing large scale integration tests, rather than testing for failure. You find these problems with failures, and they aren’t a big deal when caught during a test.

    This shouldn’t be a big deal, except NASA/Boeing hyped this particular test as the last hurdle to launching Astronauts from US soil again. I understand why they did it in terms of politics and budget, but it certainly wasn’t for safety.

    If we had a rational space program interested in space and safety; we’d be purchasing more Falcon boosters and testing the hell out of Orion instead of waiting for SLS.

    1. It also wouldn’t be a big deal if they were capable of doing tests like this more than once. All of the things SpaceX has done along the way provide a lot of resiliency. Both companies lost a launcher but one company’s launcher was essentially free having been paid for several times by customers. Boeing budgets about half a billion dollars for a launch.

      SpaceX also has all of the recent and relevant experience of flying the cargo Dragon. If Dreamchaser ever gets going, they will be proving themselves in the same incremental way. Boeing can do it all but to use a sports analogy, they can’t finish.

    2. On a positive note, this is why you do testing.

      Except these problems should have been caught on the ground, not after launch. I can’t see this as anything except a complete system engineering and system test failure by Boeing, and a complete oversight failure by their enablers at NASA who had the responsibility for buying off on Boeing’s processes and procedures.

      1. Based on the quickie conference call arranged in the wake of Eric Berger’s Ars Technica piece, you’ve nailed it. The Boeing guys actually copped to having serious deficiencies in their development/test processes and the NASA folks acknowledged that NASA’s oversight was also deficient for having failed to catch any of this.

        Doug Loverro, the new head of HEOMD, said that, in view of this Starliner kerfuffle, NASA will be looking for much greater oversight of lunar lander development. So, even without considering the schedule slippage histories of SLS and Orion, this new declaration pretty much put paid to any slightest hope of the Trump admistration’s 2024 goal of a manned lunar landing via the usual NASA suspects.

        Making it to the Moon with people by 2024 is now entirely up to Elon Musk and SpaceX.

  2. And now we know why Boeing took that $410 million charge. They knew this was out there.

    Comments over there are (deservedly, IMO) brutal.

    Oh, it’s Eric’s combox. That was *inevitable*. And yes, deserved. Boeing’s systems software management seems to be even worse than we thought.

      1. Have you seen those ad’s on the Bombay craigslist? You can get world class coding for $9/hr. How could the accountants who run Boeing resist?

  3. SpaceX has pictures of the plastic version of BFR docking with ISS so I have no doubt they are planning to make it possible for the steel one.

  4. I have a feeling that a panel that took a long deep look at the control system might find some fundamental problems, similar to what the Apollo 1 investigation uncovered, but in software.

    The trouble is, sometimes if the controls approach was wrong, though focused efforts can turn a buggy bag of hacks into a less buggy bag of hacks, it’s still a bag of hacks.

    1. FAA/AST has one truly world-class software system safety engineer, and I suspect that he will be devoting his life to vetting Boeing’s processes prior to the first commercially licensed commercial crew flight. It has nothing to do with crew survival, just public safety. And it may add further delay to commercial operations.

  5. Bridenstine and Boeing have just anounced a public press conference on the Starliner investigation a 3:30pm, streamed on NASA TV.

    Feeling the heat?

    1. Just listened to that on YouTube.

      The panelists failed to directly answer the two obvious questions – which were asked more than once each:

      1) will a second OFT mission be required?

      2) will Starliner fly crew this year?

      Based on what was revealed though – especially the need to thoroughly review the entire software code base for Starliner – I think the answers to those two questions are overwhelmingly likely to be “yes” and “no,” respectively. The second uncrewed OFT mission, in fact, seems unlikely to fly this year.

      Based on Boeing’s track record of sloth in dealing with previous Starliner problems, I now think it quite probable – 80% or greater – that Crew Dragon 2 will fly, with crew, twice this year and once next year before Starliner does likewise.

    1. If you have to spend hundreds of millions to redo a test flight because you failed to meet test objectives, that’s pretty clearly a setback. This isn’t supposed to be an R&D program.

  6. I recall reading an article many years ago saying that congressional staffers had learned to pay close attention during testimony when a program was in trouble due to a software issue. Before that, software problems were thought easier to fix than hardware problems. As the software became orders of magnitude more complicated than what came before, fixing the problems became vastly more difficult.

    At its heart, Boeing is supposed to be an engineering company. However, it was turned over to MBA bean-counters, who in turned moved themselves to Chicago to be away from the people who actually build the company’s main products. This led to the epic mismanagement of the 787 (a good plane now but one that had massive development problems). They’ve had serious problems with the KC-46 tanker and the Air Force is very mad at Boeing. The 737 Max shows what happens when you don’t have engineers and skilled programmers involved in the software development. And now, we’re finding out about the problems with Starliner. What will be next? Were I a Boeing stockholder, I’d be pushing for a change in the corporate leadership.

    1. The 737 Max shows what happens when you don’t have engineers and skilled programmers involved in the software development.

      (My emphasis)

      The 737 Max MCAS fiasco had nothing to do with a lack of “skilled programmers”, and saying that is a tell that the speaker doesn’t know how these systems are designed. The software worked exactly as designed. The 737 Max MCAS is the outcome of a heavily flawed system engineering, system safety, redundancy management, flight control design, and airworthiness process, which is what led to the incorrect software requirements specifications that the software team appear to have correctly implemented and tested. During software qual testing, the requirements-based tests would have passed with flying colors – the SW met its requirements – but the FMET clearly wasn’t laid out properly or MCAS would have failed it; FMET isn’t based on the SRS.

      So the MCAS mess is far more fundamental than any supposed lack of “skilled programmers.”

      1. Boeing approved software on a critical system that only needed inputs from a single AoA sensor to drive the plane into heavy nose down pitch, resulting in two crashes and over 300 deaths. That hardly sounds like the decision of competent programmers and engineers. Some MBA beancounter probably got a bonus for that one.

        1. You didn’t understand my reply, partly because blockquotes here apparently don’t allow other tags :-(. I’m focusing on your claim about Boeing’s lack of “skilled programmers” in the 737 Max fiasco, which is the same thing lots of people say. But it’s not true; the engineering problems, of which there were a lot, lay elsewhere.

      2. The software was a disaster waiting to happen, and it didn’t take long. Completely incompetent from a pilot’s perspective. Even Boeing test pilots thought it was insane. Boeing has a history of flight automation, fight the automation and the automation will turn off. MCAS doesn’t turn off. It plays to win and will crash the aircraft to prove who’s boss. Like Airbus. This is a programming vs piloting issue.

        1. That’s not entirely true; the 737 has a pair of very large, very noisy manual trim control wheels that spin whenever the trim is adjusted by *any* means. It should have been very simple for pilots to determine that the autotrim was out of control, and manually stop it by gripping one of the trim control wheels (and if necessary, cranking it back by hand). This is an ancient procedure 737s, and I do not know of any reason why it would not have worked in the Max crashes.

          That said, Boeing has so much egg on their face this last year that I seriously question whether they should remain in business without a major (and unlikely) change in culture and management.

          1. -It occurs to me that the sound of the wheel should be quite different for nose-down versus nose-up directions. The pilot should “hear” which way the trim wheel is spinning because his normal assumption would be that the auto-trim is helping him, not fighting him.

      1. The unions have been unreasonable pains in the ass forever and moving to Chicago did nothing to change that. So I discount that as a decisive reason for the move.

        The Seattle city/county administration might have been an issue. It’s certainly no secret the current gang of socialist idiots running things have no consideration for businesses, large or small, within their demesne. Seattle was the first city to legislate a $15/hr. minimum wage, for example.

        1. All true but I don’t believe that Boeing has major facilities inside the Seattle City limits except for the Delivery facility at Boeing Field. 737 is in Renton, 777, 787 and large jets are at Paine Field in Everett.

          1. It isn’t the Seattle city limits that matter but those of King County of which Seattle is the county seat. The Seattle nexus of radical socialism is in the county government. Renton is just one of many smaller towns and cities within King County that have Boeing facilities and/or large concentrations of Boeing employees including Burien, Bothell, Auburn and Federal Way. It also includes Kent where Blue Origin has its HQ. King County, WA rivals San Francisco County, CA, Cook County, IL, and the five New York counties corresponding to the five boroughs of NYC in its lopsidedly left-wing political coloration.

  7. Since they’re getting an extra month of training time, they might as well transfer Nicole Mann to DM-2 as a passenger for the May launch to ISS. They can stay until either Starliner CFT arrives or they need to vacate the docking port for SpX/CRS-21 in August. The USCV-1 Dragon is scheduled for December, but can probably be moved up to November by trading places in the VV schedule with SpX/CRS-22. By then it should be clear whether Starliner is going to make it or not, and also if DreamChaser will be on schedule (for example, if they need to convert Cargo Dragons to Crew Dragons down the line).

  8. Spacex had at least one serious on orbit anomaly with CRS1. It ended up being a test- like- you- fly issue – a lesson they failed to fully internalize until after F9-20. They also had to upload a software change while on r-bar approach during the C2 mission. Boeing clearly needs to get much more methodical and flightlike with its software vetting, but let’s not pretend this is unique or unexpected for a teat flight. Also this has basically no parallels to MAX. That is a civilian program with completely different expectations and rules with some clear regulatory malpractice mixed in, likely performed by different teams with different laegacy processes in different locations. I would avoid conjuring patterns where they are probably coincidence.

  9. Apart from any talk about “test to failure” before integration, this really was an integration problem. The thruster firing problem on the service module might well not have looked obviously bad, until integrated with the service module. It’s not a question of “this is why we test”; this WAS a test.

    And to whomever found the problem and fixed it during flight, that person has earned the title of steely-eyed missle man, at least for that day. Integration issues happen during integration, and operational problems happen during operations. Some get detected before they happen, some get fixed in flight, and some will bite you in the tailfeathers.

    SpaceX learned all of this the hard way; this issue is very similar to the third failed Falcon 1 launch, in terms of recontact after separation. What SpaceX also learned is “fail early and fail often.” I’ve long expected that Boeing, and Blue Origin, and ULA’s Vulcan, and Dreamchaser, will have integration and operational problems. It’s not enough to build a rocket. You have to learn to fly it.

    1. At some point I think it would be interesting to know more about the troubleshooting effort that uncovered the CSM bug. I’d be inclined to think that the earlier, erroneous thruster firings, where the control system was in the wrong mode. led engineers to focus on when various attitude/maneuvering control modes were enabled, looking for similar mistakes, and they found another one.

      If that’s the case, the occurrence of the first failure probably prevented the second bug from causing a loss of vehicle.

    2. I have nothing to add other than I agree it’s a systems engineering issue. Integration testing above unit test level is some of the hardest to do because oftentimes there is a lack of full definition of behavior. In computer design verification that is *sometimes* caught through constrained random test, but that is done when there is a lack of clarity in the definition of system behavior, usually because the design has been modularized and optimized to bring those modules on line ASAP (read in parallel) with integration and test left for later. Sometimes the integrated system exceeds the capability of the test & simulation software to conclude it in a “reasonable” amount of time. Welcome to the real world.

  10. There’s a news story at ARS Technica that says they had communications problems with the craft as well.

    “Boeing’s Vice President and Program Manager for Starliner, John Mulholland, also elaborated on a third problem: the inability of the ground to reliably communicate with the spacecraft in the minutes after launch. He said this did not appear to be a problem with the antenna or communications system on board the spacecraft, but rather a “high noise floor” on the ground, which he attributed to frequencies associated with cell phone towers. He said this was just a preliminary finding.”

    I solved this problem for the 45th Space Wing a few years ago. I understood what was happening because of my prior experience developing cell phone infrastructure. Given what I know about this problem, I’m surprised they flew. It causes severe corruption of the telemetry data.

    Gerry Parker

    1. Gerry how did you solve the issue for the 45th? Switch frequencies? Use helical polarization? Move base stations? Deploy shielding hacks for low azimuth signals? Will 5G millimeter deployments cause new trouble? The frequency spread for 5G millimeter deployment is pretty wide according to Wikipedia, anywhere from 24GHz to 72GHz. But it doesn’t take much to shield against it apparently.

    2. Given the long-time ubiquity of cell phone transmissions, I’m a bit gobsmacked that this seems to have been overlooked until an actual test flight. Boeing continues to more and more resemble Amateur Night at the Bijou.

  11. Since I doubt most people here have seen it, I think it is valuable to note that Doug Loverro took the time, in the wee hours of the morning, to reach out and reply at length (in the combox) to the question that Keith Cowing of NASA Watch meant to ask if he were called upon in the teleconference.

    Doug went out of his way to address it, substantively, and I think he deserves real credit for doing so.

    Since Keith did not get to ask his question on the air, I thought I’d try to answer it here. To remind, Here’s Keith’s Question:

    “Boeing launched a spacecraft designed to carry humans and discovered two fundamental software issues in flight. Now Boeing wants to launch people in that spacecraft the next time it flies. I have been reporting on software issues for another Boeing product – SLS. Add in 737 Max software problems and it would seem that Boeing has some major software weaknesses. Is there any overlap between software teams or management between Starliner and SLS (or 737 Max)? Since Boeing’s current software process has clearly failed after many years and billions of dollars spent, what do you need to do differently in order to get this whole software thing working properly again?”

    To break Keith’s question down to its pieces:

    1) Q: “Is there any overlap between software teams or management between Starliner and SLS (or 737 Max)?

    Ans: For SLS and Starliner the answer is definitely “yes” in terms of management overlap (they both report to Jim Chilton). For software, it’s far more complex–much of SLS software is build in house by NASA, not Boeing. That’s one of the reasons we are having the Starliner Independent Review Team brief the SLS (and Orion, and other HEO projects) team on their results. We want to make sure that they all hear and understand the insidious nature of these kinds of issues. For the record, Boeing is far from the only large space firm that suffers from this kind of software process failure. But the nature of these two issues and the number of times they were missed means a far more thorough review is warranted. As for overlap with the 737 MAX issue — that’s doubtful from a people perspective. But there could be a “process” overlap. We’ll need to investigate so good question.

    2) Q: Since Boeing’s current software process has clearly failed after many years and billions of dollars spent, what do you need to do differently in order to get this whole software thing working properly again?

    Ans: The exact answer to this question is still being formulated and depends a bit upon the final root cause we determine. As we mentioned during the conference, we have asked the Independent review team (IRT) to go back and determine “why” these process failures happened. Is it because Boeing has a flawed process? Or because they have a good process that failed to be followed for some reason? But in general, the way we will be able to “get this whole software thing working properly” is that we will have to review the entire set of documentation for the system software and verify that similar missteps did not occur elsewhere. That means things like going back and reviewing the original requirement statements; the process by which that statement was converted to a logical subroutine; the way that subroutine was then coded; the way that code was then verified and checked off; etc. The software development process used for systems such as this is well understood and generates so-called “artifacts” (reports, other paperwork) which can be examined to determine where the above processes may have failed. For the two issues being discussed here, gathering that paper made it relatively easy to find the cause of the failure. of course, we knew where to look and what to look for. For the entire software load and looking for problems we do not yet know exist, this will be far more difficult.

    Hope that takes away some of the sting of not being called.

    Link: http://nasawatch.com/archives/2020/02/boeing-really-n.html#comments

    1. I want to think this was a nice-guy thing to do, but I can’t help also seeing it as avoiding the answer in a press conference, and instead giving it in a blog combox read by few.

      1. Could be, but then that assumes the Loverro had control over the press conference duration, or who was called upon.

        (Not impossible, but unlikely, I think.)

        1. As I said, I’d like to think it’s a nice-guy thing, but I’m way more suspicious than I used to be. This is the head of the US manned space program we’re talking about, a former DoD official and 30-year USAF retiree. Why would he know (or care) about NasaWatch? Does he read these blogs? Is some flunky detailed to read them? Or, as in my personal experience, are there trolls who make it their business to post links in places where someone like Loverro would be more likely to see them? Etc., etc…

          (Once upon a time, a random rant I posted in my discussion group on sff.net was refuted, point by point, in order, in a public posting by a NASA official without ever referencing me, and then an anonymous someone left a link to the refutation in my discussion group. I did think it was funny anyone would bother.)

  12. I think Boeing is trying to make a system that integrates its’ software with DOS. The ISS has been up there since DOS was the way you ran satellite systems. Maybe they just need to hire interpreters who can make all this understandable.

  13. ”We don’t know how many software errors we have.”

    Then do less in software. That’s actually a serious answer. The idea that software is cheap and easily fixed leads to careless design and specification. Which is ultimately the wide and well paved information superhighway to Hell.

    It’s amusing that people reject using formal methods that can actually give you an answer to the question for the same reason. It makes developing software too expensive.

    Pay me now or pay me later.

    I’m reminded of Knuth, and this xkcd cartoon:
    https://i.stack.imgur.com/1awUv.png

  14. Comparing Dragon 2 and Starliner design and operation makes Starliner look like one giant kludge IMO.
    I was looking at buying some Boeing stock but I think I’ll give it a miss.

Comments are closed.