Saturday, 16 October 2010

Tests are facts. Code is theory.

Programmers have turned to science to help resolve the software crisis. But they're doing it wrong.

Science envy

Programmers have science envy. We feel that, unlike much of our code, science works. Scientists have spent hundreds of years honing a methodology that helps them assimilate new knowledge and correct error, while we have spent decades frantically accumulating complexity that we can't handle. Strangely, scientific theories become more accurate over time, whereas software systems often decay.

The software industry has tried to learn from science and engineering's success. We call our programming degrees "Computer Science" and "Software Engineering", though they are neither. "Computer Science" students do almost no experiments. The "Software Engineering" concept of exhaustive up-front design has become so discredited that even those who can't imagine any other way feel obliged to pretend that they "don't do Waterfall".

Of course, science and engineering are just analogies when applied to programming. They are meant to be useful ways of imagining our profession, not to be literally true. But in their naive form, I don't think analogies between programming and science are very useful. If we want to benefit from scientific rigour, we need to be more rigorous in how we appropriate scientific concepts.

Scientific testing

Some software testers have used the scientific method as a way of framing their testing activities. For example, David Saff, Marat Boshernitsan and Michael D. Ernst explicitly cite Karl Popper and the scientific method in their paper on test theories. Test theories are invariant properties possessed by a piece of code which Saff et al attempt to falsify over a wide range of data points with an extension to the JUnit testing framework.

I find the reciprocal of this approach useful when debugging. I start with a defect, form a theory as to its cause, then design a test to try and falsify that theory. If I suspect that the issue is caused by rogue javascript, I'll disable javascript and attempt to reproduce the issue. If I can, I've disproved my theory and I need to find another explanation. This helps me to eliminate false causes and gradually home in on the bug.

The problem with analogies that treat tests as theories and code as a phenomena is that they tell us nothing about how to write code. The software under test is like gravity, a chemical reaction or the weather. It may or may not have an underlying structure and beauty, but any insights we gain during testing are inevitably after-the-fact.

Worse, they are static models. When software changes over time, the knowledge gathered through "scientific testing" may no longer apply. The scope of scientific testing is confined to a specific version of the software. For example, a tested and verified "theory" about the memory profile of an application may become invalid when a programmer makes a small change to a caching policy.

Tests are facts. Code is theory.

Science's strength is its ability to assimilate new discoveries. If we want to share in its success, a scientific model of software development needs to preserve science's adaptability.

We can go some way to achieving this by reversing the roles of testing and coding in the scientific testing model. Tests are facts. Code's role is as a theory that explains those facts as gracefully and simply as possible.

New requirements mean new tests. New tests are newly discovered facts that must be incorporated into the code's model of reality. Software can be seen as a specialised theory that attempts to embody what the stakeholders want the application to do.

How does that help us?

Once we accept that code as a theory, we are then in a position to justify employing the most powerful weapon in science's armoury - Occam's razor. Our role is to write the simplest possible code that is consistent with the facts/tests/requirements. Whenever we have the opportunity to eliminate concepts from our code, we should.

Simple code isn't just cheaper. It's more valuable too, because it's easier to change and extend. We can justify this with reference to scientists' experience that the simplest theory is the most likely to survive subsequent discoveries.

As new requirements arrive and our understanding of the domain deepens, we have the opportunity to refactor. Refactoring isn't rework or throwing away effort. Refactoring is enhancing code's value by incorporating new knowledge on what we want our software to do. This could be by adding functionality, or in reducing complexity. Either makes the software as a whole more valuable.

Science celebrates refactoring. Each new piece of evidence clarifies scientists' understanding of phenomena and helps yield more useful theories. Often these refinements are small, but occasionally Einstein will have an insight that supercedes Newton's laws of motion. Domain driven design founder Eric Evans describes such pivotal moments on software projects as "breakthroughs".

Non-developers often assume an application is invariably more valuable with a feature than without it. Yet the example of special relativity allows us to explain otherwise. Newton's laws of motion are perfectly adequate for ordinary use. Unless we are interested in bodies moving close to the speed of light, it's not worth bothering with the additional complexity Einstein's theories bring.

If stakeholders are willing to accept that the application targets the common case and excludes troublesome edge cases, they will enjoy software that is simpler and therefore cheaper and more valuable. Sometimes, there is value in absent features. Always, there is value in simpler code.

Crave simplicity. Celebrate deletion. If science responded to new information by adding special cases then science would be in as big a mess as the software industry. As you incorporate new requirements, attempt to refine your code so that it remains flexible enough to accomodate tomorrow's requirements. Otherwise, your code will become less and less fit for its purpose, which is to provide business value.


When John Maynard Keynes was attacked for repeatedly revising his economic theories, he said, "When the facts change, I change my mind – what do you do, sir?" Take the same attitude with your code, but treat requirements and tests as your facts. And remember, your code is just your best approximation of what your stakeholders want it to be.


  1. Great Read.

    One thought.

    It's disheartening to think that the test "tools" themselves are also complex pieces of software that require considerable attention and tuning. To apply the "tests are facts" paradigm, your tools must be as dependable as gravity, the photoelectric effect etc. I haven't found this to be the case.

    In the physical sciences your tools are calibrated to dependable standards that are relatively fixed and unchanging. How can we do this with a complex set of software test tools that may also be "theories"?

  2. Good point.

    Unit testing gets around this somewhat by making individual tests as specific as possible.

    But many times I've encountered system tests failing and not been able to tell whether it's the tools, the tests or the code itself that's to blame.

  3. This would be all well and good if science really did progress according to Popper's principles, but it's well accepted in the scientific community that this is not the case: Kuhn's Structure of Scientific Revolutions presents a very different model which at the same time should have resonance with any experienced developer.

    Drawing a direct analogy we can consider a set of requirements to be a paradigm embodying a world view for the domain under investigation, programs meeting the requirements to be theories within that paradigm and the processes of coding, testing, debugging and refactoring to be the normal science which sustains the paradigm.

    A significant change of requirements represents a revolutionary step which makes the previous paradigm largely irrelevant.

    It's also important to bear in mind that contrary to your statement above normal science is very much about adding special cases to cope with new information: that is in fact one of the defining traits of normal science. The prevailing paradigm will have many anomalies and the work of normal science seeks to explain these in its terms no matter how convoluted such explanations may be.

    When a revolution in scientific understanding does occur it is in part because the accumulation of such special cases reaches the point at which their weight exceeds the tolerance of the prevailing paradigm, creating an opportunity for alternative world views which under other circumstances - regardless of their fundamental proximity to accurate domain knowledge - would be of insufficient demonstrable benefit for the accompanying phase transition.

    It is also important to bear in mind that a test is not an experiment, nor is it a fact. A test is a measurement. When performing experiments in the sciences it is usual to take many measurements and to consider these in aggregate as a means of discovering facts. In this sense the execution of a test suite represents an experiment as does each execution of a program.

    Tests are unfortunately bound by all the same problems that bedevil measurement in any other discipline. Is the right thing being tested? How can we validate that it's the right thing? How do we verify that the test is correctly implemented? Is the test even necessary or informative?

    To these considerations a suite of tests adds the complexities of Zeno's paradox, not to mention suffering the fundamental limitations imposed by Gödel.

    None of this is to dismiss testing as a useful tool in developing software or to rag on any particular software development methodology, just to point out that if we really aspire to writing code the way that scientists discover laws of nature we first have to speak the same language as scientists...

  4. Hm, you cite science, so I'll disagree on the "tests are facts" point. Tests are experiments. The (huge, IMHO) difference to facts is that they can have bugs themselves.