2015 June ; 16(6): 321–332. doi:10.1038/nrg3920. | Maxwell W. Libbrecht, William Stafford Noble
The article provides an overview of machine learning applications in genetics and genomics, highlighting its potential to assist in understanding large, complex data sets. It outlines the main categories of machine learning methods—supervised, unsupervised, and semi-supervised—and discusses the trade-offs between performance and interpretability. The authors emphasize the importance of incorporating prior knowledge, handling heterogeneous data, and addressing challenges such as imbalanced class sizes, missing data, and modeling dependencies among examples. They also explore strategies for feature selection and the integration of multiple data types. The review concludes by discussing the future of machine learning in genomics, emphasizing its growing importance with the increasing availability of large datasets.The article provides an overview of machine learning applications in genetics and genomics, highlighting its potential to assist in understanding large, complex data sets. It outlines the main categories of machine learning methods—supervised, unsupervised, and semi-supervised—and discusses the trade-offs between performance and interpretability. The authors emphasize the importance of incorporating prior knowledge, handling heterogeneous data, and addressing challenges such as imbalanced class sizes, missing data, and modeling dependencies among examples. They also explore strategies for feature selection and the integration of multiple data types. The review concludes by discussing the future of machine learning in genomics, emphasizing its growing importance with the increasing availability of large datasets.