Correctness

Core Idea

Examples and diagrams in this page follow the shared Hypothetical Scenario.

Correctness is the degree to which software behavior satisfies a defined specification under expected operating conditions. Testing is one of the primary mechanisms for validating that claim, but correctness itself is broader than testing activity. It includes behavior, data integrity, contract semantics, failure handling, and operational guarantees.

In the scenario platform, correctness is not only "returning recommendations." A response can be fast and still wrong if it violates budget constraints, uses stale inventory assumptions, or breaks ownership invariants. Correctness must therefore be engineered as a system property across modules, services, and delivery pipelines.

Conceptual Overview

Correctness Dimensions

A practical correctness model separates concerns into explicit dimensions:

functional correctness: outputs match expected business behavior
contract correctness: API and event semantics remain stable and valid
data correctness: state transitions preserve invariants and ownership rules
temporal correctness: behavior remains valid under concurrency, retries, and timing variation
fault-path correctness: degradation and recovery behavior preserve guarantees

Without these dimensions, teams over-index on "happy-path correctness" and miss failure-path defects.

Specification and Oracles

Correctness is only testable when expected behavior is explicit. Teams need high-quality oracles:

acceptance criteria with concrete input/output expectations
domain invariants with formal or semi-formal statements
interface contracts with stable request, response, and error models
operational constraints such as idempotency and timeout semantics

An oracle should be specific enough that a failure means one of two things: the system is wrong or the specification is wrong. Ambiguous oracles create test suites that pass while defects survive.

Correctness in Distributed Systems

Distributed systems add failure modes that do not appear in local code units:

partial success across service boundaries
duplicated messages and out-of-order delivery
stale reads during eventual-consistency windows
retried commands that mutate state multiple times

A correctness strategy must include distributed guarantees such as idempotency keys, compensating flows, and explicit consistency expectations. This links directly to State and Data Modeling, Resilience and Recovery, and Correlation IDs.

Correctness and Test Layers

No single test type can validate all correctness dimensions. A layered model is required:

Unit Testing validates local behavior deterministically
Smoke Testing validates deployment-level critical-path viability
Integration and Functional Testing validates cross-boundary behavior and user-facing workflows

Test depth should follow risk, not habit. A simple pure function rarely needs broad integration scenarios. A distributed payment or reservation workflow always does.

Determinism and Signal Quality

A correctness claim is weak when tests are flaky. Deterministic tests require controlled inputs, stable clocks, explicit randomness, and isolated dependencies. Signal quality also matters:

one behavior claim per test whenever practical
descriptive names that encode state and expectation
failure output that points to violated behavior, not framework internals

This reduces diagnostic time and increases trust in the suite.

Correctness Under Change

Most correctness incidents are introduced during change, not initial implementation. A robust strategy includes:

regression protection on historical defect classes
compatibility checks for contract evolution
risk-based test selection in CI
post-incident test additions that prevent recurrence

Correctness is therefore not a one-time quality gate. It is a continuous engineering discipline tied to architecture and delivery.

Computing History

The Ariane 5 Flight 501 failure in June 1996 is a classic correctness lesson. A reused conversion routine raised an overflow exception under the new flight profile. Exception handling assumptions from a prior context were no longer valid. The incident showed that correctness cannot be inherited from previous systems without revalidation against current operating conditions.

Sources: European Space Agency (1996)

Quote

"Testing can show the presence of bugs, not their absence."

Source: Edsger W. Dijkstra, 1972

Practice Checklist

Define correctness dimensions before implementation begins.
Write behavior specifications with explicit input, output, and failure semantics.
Link every critical invariant to at least one automated verification path.
Treat flaky tests as correctness defects, not tooling noise.
Validate idempotency, retries, and timeout behavior in distributed workflows.
Add compatibility checks for every externally consumed contract.
Keep test names descriptive and behavior-oriented.
Review test strategy after incidents and architecture changes.
Track correctness debt explicitly (missing tests, unstable oracles, weak failure-path coverage).
Ensure release decisions include correctness signal review, not only throughput and latency.

Written by: Pedro Guzmán

See References for complete APA-style bibliographic entries used on this page.