Stochastic Solutions

Errors of Implementation

Reference tests are semantic regression tests

While most traditional software development now follows some form of test-driven development (TDD), most data-analysis and data-science developments do not. TDDA’s notion of reference testing is an approach to testing analytical processes that is designed to go better with the grain of analytical projects.

Why Analytical Pipelines are Different

Two things are true of almost every analytical data project:

The basic idea of reference testing is that once an analytical result is considered good enough to use, believe, or ship, examples used to develop the analysis should be packaged up as tests that can be run again later, as illustrated below

A canonical reference test.
An analytical process with inputs coming in from the left
and outputs being produced on the right.
On the left are INPUTS with an arrow pointing
to a circle labeled `ANALYTICAL PROCESS'.
The inputs are annotated:
‘Record as reference inputs --- datasets, parameters, \ldots.’
Under the ANALYTICAL PROCESS circle, it says:
‘Capture as script’
The outputs, after an arrow from the ANALYTICAL PROCESS, are annotated:
‘Record as reference outputs --- datasets, numbers, graphs,
models, decisions, etc.’
Underneath the ANALYTICAL PROCESS, with its inputs and outputs,
is step 4, annotated:
‘Develop a verification procedure (diff)
and periodically re-execute:
do the same inputs (still) produce the same or equivalent outputs?’

A reference test verifies that when the process is run again, the same inputs produce equivalent outputs, and should be run either under continuous integration before an analytical pipeline is used again. Reference tests are a powerful means of noticing when a pipeline has rusted or suffered a regression, and when this is combined with good data validation throughout a pipeline and defensive programming, the robustness of analytical processes are much increased.

Support for Reference Testing

Our open-source tdda library provides two major kinds of support for reference testing:

Semantic Equivalence

Simple unit tests usually check some small, easy-to-compute result or behaviour that is identical for a given set of inputs, and many proponents of test-driven development place most of the emphasis on unit tests, while also advocating for some high-level tests of larger components, systems and integrations.

When we talk about semantic equivalence (or semantic equality) we mean that the result is the same in all meaningful respects, but may not be byte-for-byte identical to a reference result. When comparing tabular data, we have comparisons that allow specification of acceptable differences, These can include:

Similarly, when comparing textual data, examples of allowed variation include

These sorts of semantic comparisons allow easy creation of powerful tests of outputs that would otherwise require significant custom code each time. Reducing the burden of test creation makes it more likely that analysts will actually write proper tests, which is less likely when doing so is onerous.

The tdda library also provides variations for comparing

The assertions also write out in-memory versions and suggest diff commands when a test fails. These and related features make working with complex results much more convenient.

Gentest

The tdda gentest command-line tool writes tests for software in any language, and can create tests for things like

Gentest provides both a Wizard to step through the process of creating a test for a script or program, with inputs and parameters, and also a command-line tool that allows all options to be provided as parameters to a single command. Although Gentest was initially developed as a demonstrator for tdda’s reference-testing functionality, it has turned out to be remarkably powerful particularly for test-driven document development (TDDD). Gentest works by running the code you provide, usually more than once. It examines your environment and variations in output to write tests that are often semantic rather than simple assertEqual statements. Sometimes the tests need a small amount of tweaking, but Gentest frequently writes better tests than people and usually gives a good starting point.

Work With Us

The tdda library is free and open source, so you can just use it, but if you are looking to improve the testing and robustness of analytical pipelines and processes, we can help, whether with training, review, auditing, or implementation.

Company number SC329851. Registered office: 16 Summerside Street, Edinburgh, EH6 4NU.
Copyright © Stochastic Solutions Limited 2007–2026.