MAY 27/Integration & Testing/4 MIN READ

"We tested it" isn't an answer at CDR

Dan Zaidenband

Share on

Last quarter, we sat in a CDR rehearsal where a flight software lead presented twelve months of test results — green, green, green — to a panel reviewer who asked one question: "Show me the test that proves requirement 4.2.7." The team needed eleven minutes to answer. Three of those minutes were spent finding a Jira ticket that pointed to a Confluence page that linked to a test report PDF that referenced a scenario folder by name. The tests existed. The trace did not. The reviewer added two action items, both of them re-runs, and the program ate another four weeks before it could close out integration.

This is the failure mode that V&V plans never quite catch. The team is not under-testing. The team is over-collecting and under-linking. A modern spacecraft program runs tens of thousands of scenarios across SIL, HIL, and integration. The pass rate is fine. What the team cannot produce, when a senior reviewer asks, is a clean chain from one requirement to the specific scenario, the specific assertion inside that scenario, and the specific artifact that demonstrates the requirement was met.

CDR is not asking what you think it is

The CDR question every program lead prepares for is "did your tests pass." The CDR question every panel reviewer actually asks is "show me the evidence." Those are different.

Pass rate is summary statistics over a test corpus. Evidence is a graph: requirement → derived spec → ICD constraint → scenario → assertion → log file → signed-off result. A program can have a 99% pass rate and a 30% trace coverage and not know it, because the missing 70% are requirements with no inbound link from any test in the corpus. Nobody fails those tests, because nobody wrote them. Nobody wrote them, because nobody noticed the requirement existed in a different document.

We wrote two weeks ago that Schiaparelli's tests passed and the mission failed anyway. The trace problem is the second half of that story. Even when the right scenario exists and runs, the evidence chain has to point at it, or the CDR cannot count it.

What an actual evidence chain looks like

Take a requirement, picked from a real Phase B SRR document we reviewed in March: "The spacecraft shall enter Safe Mode within 5 seconds of detecting a critical battery undervoltage event." One condition, one threshold, one timing budget.

A trace-complete answer to "prove you met this" needs five things, all of them addressable by ID, all of them living in the same workspace as the requirement itself:

The ICD entry that defines the battery undervoltage telemetry word, its threshold, and its update rate.
The flight software handler that compares the telemetry to the threshold and initiates the mode transition.
The simulation scenario that drives the bus model to produce that undervoltage condition.
The assertion inside the scenario that measures elapsed time from the telemetry event to the mode change, with the 5-second budget as a hard limit.
The signed test result, with timestamp, scenario revision, and reviewer.

Programs we see can usually produce three of the five from memory. They produce the fourth by grepping a repository nobody in the room owns. The fifth — the signed result — is sometimes a screenshot in a Slack thread. The CDR panel does not accept any of this, and it should not.

Why aerospace breaks the chain

The trace breaks for five specific reasons. We have watched each one independently destroy a TRR or CDR in the last eighteen months.

The first is that the ICD lives as a document, so every link into it has to be a section reference rather than an entry ID. When the document is renumbered, the link rots silently.

The second is that test scenarios are authored by hand, in a directory structure that mirrors the team's org chart rather than the requirements tree. A new requirement does not automatically suggest a missing scenario, because the directory has no idea what requirements exist.

The third is that assertions are encoded as ad-hoc Python expressions inside scenario files. The expression assert dt < 5.0 is correct logic and useless metadata. The CDR panel cannot tell which requirement that assertion is verifying without reading the scenario top to bottom.

The fourth is that test results are post-processed into PDF reports for the milestone. The report is an artifact of the milestone, not of the test. By the time the PDF exists, the link from scenario run to specific requirement is the responsibility of whoever made the slides — usually one engineer, working from memory, two days before the review.

The fifth is that ICD changes do not invalidate downstream evidence automatically. A telemetry word's scale factor changes, the C struct updates, the scenario runs against the new value, and the old test report still claims compliance with the old number. The trace looks intact and is not.

Pass rate counts. Trace density counts. They are measuring different things, and one of them survives a CDR panel.

What changes when the workspace owns the trace

This is the deliberate design choice in how we build Lynapse. A requirement has an ID. So does an ICD entry, a scenario, an assertion, a run. The trace is not assembled at milestone time — it is a property of the workspace, written down at the moment each artifact is created, and queryable in seconds. When a reviewer asks "show me the test that proves requirement 4.2.7," the answer is a single graph traversal, not a panic in a Slack channel.

The deeper effect is on what CDR conversations cover. A program that walks in with a trace report — coverage density per requirement, gaps highlighted, every assertion linked to its parent — does not spend the meeting defending its testing. It spends the meeting discussing the gaps, which is the conversation a CDR was designed for. The reviewers shift from "prove it" to "what about this?" That is a different program.

Implication

Stop scoring V&V plans by test count. Score them by trace density: the share of requirements with at least one inbound link from an executed, signed scenario. A program with 30,000 tests and 60% trace coverage is in worse shape than a program with 8,000 tests and 95% coverage. The first one is producing telemetry. The second one is producing evidence.

The next time a program lead asks how to prepare for CDR, the right question is not "what tests should we run." It is "for the requirements we already verified, can we name the specific assertion?" If the team cannot answer in under thirty seconds per requirement, the test corpus is doing the wrong job. Build the chain first. The tests are downstream of the trace, not the other way around.

Share on

"We tested it" isn't an answer at CDR

CDR is not asking what you think it is

What an actual evidence chain looks like

Why aerospace breaks the chain

What changes when the workspace owns the trace

Implication

More from the blog

"We tested it" isn't an answer at CDR

Schiaparelli's tests passed. The mission failed anyway

Stop Authoring ICDs. Generate Them From Code

What space vendors should ship instead of source code

"We tested it" isn't an answer at CDR

Schiaparelli's tests passed. The mission failed anyway

Stop Authoring ICDs. Generate Them From Code