MAY 27/Integration & Testing/4 MIN READ

Schiaparelli's tests passed. The mission failed anyway

Dan Zaidenband

Share on

In October 2016, ESA's Schiaparelli lander believed it was below the surface of Mars while it was still 3.7 kilometers in the air. It cut its parachute, shut off its retrorockets, and fell. The inquiry board, the next May, named the trigger: an inertial measurement unit had saturated during parachute inflation, the saturation flag had been programmed to clear after one second, and the actual saturation lasted longer. The scenario had been simulated. The duration had been guessed.

We bring up Schiaparelli not to relitigate the mishap, which has been written about exhaustively, but to point at what it proves about the rest of us. Schiaparelli's IMU saturation event was modeled in software-in-the-loop tests. The model had a parameter. The parameter was wrong by a factor of several. The mission's published EDL test suite passed every case in it. The case it did not have was the one that mattered.

This is the shape of most off-nominal testing on space programs. We have read V&V plans on half a dozen programs in the last year. In every one of them, the off-nominal scenario list looks coherent on paper. Parachute snatch loads ±20%. IMU bias drift up to spec sheet limits. Antenna gain pattern with ±2 dB ripple. Battery cell voltage one sigma below nominal. The numbers come from somewhere. They almost never come from a physics-based envelope of what the hardware can actually do. They come from the previous program's V&V plan, which copied it from the program before that.

The happy path is well-tested. The unhappy paths are guessed.

When we ask a flight software lead what the worst-case parachute snatch loads are, the answer is usually a specification with a margin number bolted on. When we ask where the margin number came from, the answer is usually "from heritage" or "from the contractor." When we ask what happens if the actual envelope is twice that, the answer is "that would be very bad." That is the shape of an untested off-nominal scenario being treated as a tested one because it has a number in a spreadsheet.

The reason this happens is not laziness. It is that the simulator that runs the nominal cases is also the simulator that runs the off-nominal cases, and the simulator only knows what its models know. If the IMU model is a noise generator with a saturation cap, the simulator can show what happens at the cap. It cannot show what happens above the cap, because nothing above the cap exists in the model. The team's intuition about the cap is the test envelope. The intuition is sometimes right.

Coverage is a function of what you can simulate

The bottleneck on off-nominal coverage is not test execution time. CI machines are cheap; nightly runs are cheaper. The bottleneck is whether the simulator can express the scenario at all. A team can run ten thousand Monte Carlo permutations of parachute drag coefficient against altitude, and if the IMU model does not respond to physical impacts on the lander, the run will never tell anyone that the IMU saturates for seven seconds when the parachute snatches.

This is where the simulator earns its keep, or doesn't. A simulator built around an interface model — message types, packet rates, bus voltages — can validate that subsystems talk to each other correctly under a wide range of bus conditions. It cannot validate that subsystems behave correctly when the physical input falls outside the modeling assumptions of any one component. A simulator built around physical behavior at the component level can do both, at the cost of being much harder to build. Most programs do not have a budget line for that simulator until integration phase, which is the wrong end of the program.

The off-nominal scenarios you can run are an artifact of your simulator's fidelity. The off-nominal scenarios you should run are a function of physics.

What scenario coverage looks like in a workspace

The teams we have worked with that take this seriously share three habits.

They write the off-nominal envelope from physics, not from heritage. A parachute snatch load is not "±20% of nominal." It is a function of altitude, density, deployment velocity, and inflation dynamics, with worst-case values that come from the parachute test program, not from a previous mission's V&V matrix. When the spec sheet says the IMU saturates at 30 g, the test envelope goes to 60 g and to 120 g, because the simulator should be able to show what happens above the spec sheet, and the engineer who reviews the result should be told what the system does when reality exceeds the datasheet.

They run scenarios that the spec sheet says are impossible, and audit the simulator's response. If the IMU model never saturates beyond the cap, the test should fail with a model-coverage error, not pass with a clean trace. A simulator that silently clips its inputs has a coverage hole the test plan cannot see.

They keep the scenario library separate from the test runs. Scenarios are versioned artifacts that outlive a program. The fixed-attitude-with-saturated-IMU scenario that Schiaparelli should have run is a scenario every Mars lander should run. So is the parachute-snatch-with-thermal-pre-stress scenario from Mars Polar Lander. So is the touchdown-bounce-with-stale-IMU-data scenario from Beresheet. Off-nominal scenarios are mission-class intellectual property; programs should be importing them, not reinventing them.

The CDR problem

The hardest version of this conversation is at CDR, where a program lead has to show that the V&V plan is adequate. The current default is to count tests: 4,200 scenarios run, 98.7% pass rate, three open issues. The number is impressive and it tells the customer nothing about whether the scenario set covers the mission. A customer who has been burned before will ask which off-nominal cases are missing. A customer who has not been burned will accept the pass rate.

Programs should be asked, at CDR, to defend the coverage of their off-nominal scenarios as rigorously as they defend the pass rate. The coverage argument starts with physics, ends with traceability to a scenario library, and runs through every interface the simulator can express. The pass rate is the easy half. The coverage is the half that decides whether the mission survives the part nobody thought to model.

Schiaparelli ran its EDL tests. The tests passed. The duration of an IMU saturation flag was a parameter in a model. Most missions are one mis-set parameter away from the same story, and the way to find that parameter before launch is not to run the tests harder. It is to fix the simulator that decides what counts as a test.