Toil enables reproducible, open source, big biomedical data analyses

Toil enables reproducible, open source, big biomedical data analyses

2017 April 11; 35(4): 314–316. doi:10.1038/nbt.3772 | John Vivian, Arjun Arkal Rao, Frank Austin Nothaft, Christopher Ketchum, Joel Armstrong, Adam Novak, Jacob Pfeil, Jake Narkizian, Alden D Deran, Audrey Musselman-Brown, Hannes Schmidt, Peter Amstutz, Brian Craft, Mary Goldman, Kate Rosenbloom, Melissa Cline, Brian O'Connor, Megan Hanna, Chet Birger, W James Kent, David A Patterson, Anthony D Joseph, Jingchun Zhu, Sasha Zaranek, Gad Getz, David Haussler, and Benedict Paten
Toil is a portable, open-source workflow software designed to handle large-scale genomic data analysis in cloud or high-performance computing (HPC) environments. It addresses the challenges of processing vast genomic datasets, which are often siloed and require significant computational resources. Toil supports common workflow languages like CWL and WDL, enabling reproducibility and portability. It includes performance optimizations such as file caching, data streaming, and a leader/worker job scheduling pattern to reduce costs and improve efficiency. Toil can run on various cloud platforms and HPC environments, ensuring data privacy and security. The authors demonstrate Toil's capabilities by processing over 20,000 RNA-seq samples from multiple studies, achieving significant cost and time savings compared to traditional workflows. The resulting meta-analysis is available for public use, showcasing Toil's potential for large-scale, reproducible biomedical data analysis.Toil is a portable, open-source workflow software designed to handle large-scale genomic data analysis in cloud or high-performance computing (HPC) environments. It addresses the challenges of processing vast genomic datasets, which are often siloed and require significant computational resources. Toil supports common workflow languages like CWL and WDL, enabling reproducibility and portability. It includes performance optimizations such as file caching, data streaming, and a leader/worker job scheduling pattern to reduce costs and improve efficiency. Toil can run on various cloud platforms and HPC environments, ensuring data privacy and security. The authors demonstrate Toil's capabilities by processing over 20,000 RNA-seq samples from multiple studies, achieving significant cost and time savings compared to traditional workflows. The resulting meta-analysis is available for public use, showcasing Toil's potential for large-scale, reproducible biomedical data analysis.
Reach us at info@study.space
[slides and audio] Toil enables reproducible%2C open source%2C big biomedical data analyses