Virtual machines (VMs) provide a way for scientists to package not only scientific software, but also data, external dependences, and even entire operating system configurations, facilitating a faithful and exact reproduction of a particular computing environment used to derive a particular result.

Are VMs and other container systems used in this way actually a net positive for open science? If the scientific software used to compute a result can only be replicated in a very particular computing environment, is it useful and reliable?

I use VMs to make my work reproducible. I don't have much informatics knowledge and someone else sets them up for me. I am a statistician so the following will be quite specific to the kind of work I do. I do the following:

  • I have one VM per project. It lives on a University server.
  • I have all my code (R, Makefiles) and the manuscript (LaTex, knitr, Rmarkdown) under version control (SVN or Git). This way I can work both on my laptop and on the VM.
  • Before submitting the manuscript to a journal I make sure all analyses are run on the VM (make helps here). This way I make sure everything is run under the same conditions (same OS, same R version, same R package versions, ...).
  • For doing revisions of the paper I go back to the VM am again work there. This way I avoid having differences in the results merely due to a change in e.g. R package versions.
  • At any time later I can go back to the VM and get intermediate results or run slightly different analyses.

I don't think using VMs makes my work more open. It makes me more confident about my results, which helps me to be more open.

