Dec 23, 2014 - 4 minute read - programming science

Programming Done Right Follows The Scientific Method

AKA Good Programmers Are Doing Real Science

There are a bunch of long-ish winded definitions of the scientific method, but I think a reasonable pithy description would be:

  1. Come up with a theory to explain an existing phenomena.
  2. Use that theory to come up with a falsifiable prediction.
  3. Conduct experiments to prove or disprove the truth of the prediction.
  4. Independent parties repeat #3 and reproduce the results.

Now if the experiments at step three or four disprove the truth of the prediction, one of a few things can happen:

  1. Other related falsifiable predictions are in agreement, so the theory might only need adjustment.
  2. The results of the experiment indicate that the theory is fundamentally flawed. This can happen due to results contradicting base assumptions and/or presuppositions.
  3. The experiment itself is the problem.
  4. Fun statistics stuff.

Sound Familiar?

If you follow good programmer discipline, it should sound quite familiar. This is the story of good software testing. Ideally, you write code that does something. Your theory is that it works in a certain way. Then you write tests to prove or disprove that it works that way. Then you run the tests again and again in a continuous build system apart from your main development box so you get reproducible results. And in an open source project, random people on the internet run those same tests.

Now when you have consistently reproducible green builds, you can feel good that your theory (assuming the experiments cover enough) is correct. But the “interesting” things are when step three or four go awry. Analogous to experimental results, with automated tests:

  • You could get perfectly reproducible red builds. That is, the theory is consistently disproved by experiments.
  • It could build locally just fine, but fail on the continuous build or vice versa. That is, reproducibility fails based upon who does the experiment.
  • You could have a build that succeeds some/most of the time but fails indeterminately. That is, reproducibility fails randomly.

These happen for some of the following reasons:

  • The subject under test is just plain busted. That is, the theory is wrong.
  • The subject under test mostly works. That is, the theory is an approximation of truth.
  • Sometimes, as above, the test is just wrong, that is the experiment is the problem.
  • Race conditions in the code. That is, the theory needs adjustment.
  • Under specified environmental conditions. The theory does not sufficiently describe all variables.

Now depending on the nature of the failure, you can either:

  • Ignore the fact that the tests fail because you don’t believe in unit and integration testing as a valid thing.
  • Ignore the fact that the tests fail because it shows some fundamental thing in your code is wrong. Or perhaps maybe somebody you don’t like wrote the test, or they adhere to a methodology you don’t believe is valid. Validity of the methodology aside, if the test is doing the right thing, the subject under test is still broken. I’m not talking about temporarily disabling a test for practical reasons like getting a build out on a system that won’t push a build with a failing test that you’re not immediately worried about, I’m talking about denial here.
  • Acknowledge the problem, but don’t address it right away. There are a number of valid reasons to do so, such as it’s a rare flaky test, and the underlying problem is not a high-risk fault.
  • Track down the problem in the code or test and fix it.

If you do the first two, you’re not testing, you’re being irrational. There’s a problem in your code that needs to be addressed. If you do the latter two, you my friend are doing real computer science.

So if you’re doing software development correctly, you are in fact exercising the scientific method, and by extension are doing real science. It may not be in the natural world, per se, but is science just the same.