Reproducible Research in Computational Science

Reproducible Research in Computational Science

2011 December 2 | Roger D. Peng
Computational science has led to exciting developments but has exposed limitations in evaluating published findings. Reproducibility is a minimum standard for assessing scientific claims, especially when full replication is not possible. The rise of computational science has enabled the collection of complex data, allowing researchers to engage more directly in science. However, replication is challenging due to resource constraints, time, and cost, especially in fields like environmental epidemiology. Researchers have called for reproducible research as a standard, requiring data and code to be available for others. This standard allows limited exploration of data and code, which may verify the quality of scientific claims. The "R" kite-mark indicates that a knowledgeable individual has reviewed the code and data. While reproducibility does not guarantee quality, it is crucial for identifying computational errors and building on findings. A major barrier is the lack of a culture requiring reproducibility for all claims. Another barrier is the absence of an integrated infrastructure for sharing reproducible research. Current systems are ad hoc, with varying resources for different fields. To improve reproducibility, individuals and the scientific community can take steps. First, researchers should publish their code, which can be done at low cost. Next, cleaned-up code and data should be published in durable formats. Finally, a centralized repository (DataMed Central and CodeMed Central) could be created to store and link data, metadata, and code with publications. While change is slow, bringing reproducibility to the forefront and making it routine will make a difference. Developing a culture of reproducibility requires time and sustained effort from the scientific community.Computational science has led to exciting developments but has exposed limitations in evaluating published findings. Reproducibility is a minimum standard for assessing scientific claims, especially when full replication is not possible. The rise of computational science has enabled the collection of complex data, allowing researchers to engage more directly in science. However, replication is challenging due to resource constraints, time, and cost, especially in fields like environmental epidemiology. Researchers have called for reproducible research as a standard, requiring data and code to be available for others. This standard allows limited exploration of data and code, which may verify the quality of scientific claims. The "R" kite-mark indicates that a knowledgeable individual has reviewed the code and data. While reproducibility does not guarantee quality, it is crucial for identifying computational errors and building on findings. A major barrier is the lack of a culture requiring reproducibility for all claims. Another barrier is the absence of an integrated infrastructure for sharing reproducible research. Current systems are ad hoc, with varying resources for different fields. To improve reproducibility, individuals and the scientific community can take steps. First, researchers should publish their code, which can be done at low cost. Next, cleaned-up code and data should be published in durable formats. Finally, a centralized repository (DataMed Central and CodeMed Central) could be created to store and link data, metadata, and code with publications. While change is slow, bringing reproducibility to the forefront and making it routine will make a difference. Developing a culture of reproducibility requires time and sustained effort from the scientific community.
Reach us at info@study.space