Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data

Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data

13 Nov 2017 | Anuj Karpatne, Gowtham Atluri, James H. Faghmous, Michael Steinbach, Arindam Banerjee, Auroop Ganguly, Shashi Shekhar, Nagiza Samatova, and Vipin Kumar
The paper introduces the concept of Theory-Guided Data Science (TGDS), a paradigm that aims to integrate scientific knowledge with data science models to enhance the effectiveness of data-driven models in scientific discovery. TGDS seeks to address the limitations of black-box data science models in scientific domains by leveraging scientific theories and principles to improve model interpretability, generalizability, and scientific understanding. The authors outline five research themes in TGDS, including the design of model families, guided learning algorithms, refinement of model outputs, hybrid models, and the augmentation of theory-based models with data science methods. They provide illustrative examples from various scientific disciplines, such as hydrology, computational chemistry, and climate science, to demonstrate how TGDS can be applied to real-world problems. The paper emphasizes the importance of incorporating physical consistency and interpretability in data science models to advance scientific knowledge and make more reliable predictions.The paper introduces the concept of Theory-Guided Data Science (TGDS), a paradigm that aims to integrate scientific knowledge with data science models to enhance the effectiveness of data-driven models in scientific discovery. TGDS seeks to address the limitations of black-box data science models in scientific domains by leveraging scientific theories and principles to improve model interpretability, generalizability, and scientific understanding. The authors outline five research themes in TGDS, including the design of model families, guided learning algorithms, refinement of model outputs, hybrid models, and the augmentation of theory-based models with data science methods. They provide illustrative examples from various scientific disciplines, such as hydrology, computational chemistry, and climate science, to demonstrate how TGDS can be applied to real-world problems. The paper emphasizes the importance of incorporating physical consistency and interpretability in data science models to advance scientific knowledge and make more reliable predictions.
Reach us at info@study.space