How Computers Broke Science – and What We Can do to Fix it

Ben Marwick, University of Washington

Reproducibility is one of the cornerstones of science. Made popular by British scientist Robert Boyle in the 1660s, the idea is that a discovery should be reproducible before being accepted as scientific knowledge.

In essence, you should be able to produce the same results I did if you follow the method I describe when announcing my discovery in a scholarly publication. For example, if researchers can reproduce the effectiveness of a new drug at treating a disease, that’s a good sign it could work for all sufferers of the disease. If not, we’re left wondering what accident or mistake produced the original favorable result, and would doubt the drug’s usefulness.

For most of the history of science, researchers have reported their methods in a way that enabled independent reproduction of their results. But, since the introduction of the personal computer – and the point-and-click software programs that have evolved to make it more user-friendly – reproducibility of much research has become questionable, if not impossible. Too much of the research process is now shrouded by the opaque use of computers that many researchers have come to depend on. This makes it almost impossible for an outsider to recreate their results.

Recently, several groups have proposed similar solutions to this problem. Together they would break scientific data out of the black box of unrecorded computer manipulations so independent readers can again critically assess and reproduce results. Researchers, the public, and science itself would benefit.

Computers wrangle the data, but also obscure it

Statistician Victoria Stodden has described the unique place personal computers hold in the history of science. They’re not just an instrument – like a telescope or microscope – that enables new research. The computer is revolutionary in a different way; it’s a tiny factory for producing all kinds of new “scopes” to see new patterns in scientific data.

It’s hard to find a modern researcher who works without a computer, even in fields that aren’t intensely quantitative. Ecologists use computers to simulate the effect of disasters on animal populations. Biologists use computers to search massive amounts of DNA data. Astronomers use computers to control vast arrays of telescopes, and then process the collected data. Oceanographers use computers to combine data from satellites, ships and buoys to predict global climates. Social scientists use computers to discover and predict the effects of policy or to analyze interview transcripts. Computers help researchers in almost every discipline identify what’s interesting within their data.

Computers also tend to be personal instruments. We typically have exclusive use of our own, and the files and folders it contains are generally considered a private space, hidden from public view. Preparing data, analyzing it, visualizing the results – these are tasks done on the computer, in private. Only at the very end of the pipeline comes a publicly visible journal article summarizing all the private tasks.

The problem is that most modern science is so complicated, and most journal articles so brief, it’s impossible for the article to include details of many important methods and decisions made by the researcher as he analyzed his data on his computer. How, then, can another researcher judge the reliability of the results, or reproduce the analysis?