[slides and audio] Biology%3A The big challenges of big data

The article discusses the challenges and opportunities presented by big data in biology. Biologists are increasingly dealing with massive datasets, similar to those handled by astronomers and high-energy physicists, which require powerful computers and advanced data management systems. The European Bioinformatics Institute (EBI) stores 20 petabytes of data, with genomic data accounting for 2 petabytes, growing at an exponential rate. To manage this data, institutions like EBI are building cloud-based infrastructure and collaborating with organizations like CERN and ESA to share best practices in data storage, analysis, and sharing. Cloud computing is seen as a solution to the challenges of data storage and processing, allowing researchers to access powerful computing resources without the need for expensive hardware. However, the proliferation of clouds also introduces new bottlenecks, such as data being parked on multiple clouds and the need for secure data transfer. Companies like Aspera are developing technologies to improve data transfer speeds and reliability. The article also highlights the importance of open science and the need for tools that are accessible to researchers with varying levels of computational expertise. Platforms like Galaxy and DNAnexus are designed to make data analysis more accessible, while initiatives like BISTI support the development and maintenance of computational tools. Drug companies and medical centers are also leveraging big data to discover new treatments and understand disease mechanisms. Despite the challenges, big data is transforming how science is conducted, with researchers like Marcie McClure using computational methods to discover new viruses and generate new hypotheses from existing data. The article emphasizes the importance of open science and the need for a cultural shift in biology to fully capitalize on the potential of big data.The article discusses the challenges and opportunities presented by big data in biology. Biologists are increasingly dealing with massive datasets, similar to those handled by astronomers and high-energy physicists, which require powerful computers and advanced data management systems. The European Bioinformatics Institute (EBI) stores 20 petabytes of data, with genomic data accounting for 2 petabytes, growing at an exponential rate. To manage this data, institutions like EBI are building cloud-based infrastructure and collaborating with organizations like CERN and ESA to share best practices in data storage, analysis, and sharing. Cloud computing is seen as a solution to the challenges of data storage and processing, allowing researchers to access powerful computing resources without the need for expensive hardware. However, the proliferation of clouds also introduces new bottlenecks, such as data being parked on multiple clouds and the need for secure data transfer. Companies like Aspera are developing technologies to improve data transfer speeds and reliability. The article also highlights the importance of open science and the need for tools that are accessible to researchers with varying levels of computational expertise. Platforms like Galaxy and DNAnexus are designed to make data analysis more accessible, while initiatives like BISTI support the development and maintenance of computational tools. Drug companies and medical centers are also leveraging big data to discover new treatments and understand disease mechanisms. Despite the challenges, big data is transforming how science is conducted, with researchers like Marcie McClure using computational methods to discover new viruses and generate new hypotheses from existing data. The article emphasizes the importance of open science and the need for a cultural shift in biology to fully capitalize on the potential of big data.

THE BIG CHALLENGES OF BIG DATA

13 JUNE 2013 | VOL 498 | BY VIVIEN MARX