Overlay databank unlocks data-driven analyses of biomolecules for all

Overlay databank unlocks data-driven analyses of biomolecules for all

07 February 2024 | Anne M. Kiirikki1, Hanne S. Antila2,3, Lara S. Bort2,4, Pavel Buslaev5, Fernando Favela-Rosales6, Tiago Mendes Ferreira7, Patrick F. J. Fuchs8,9, Rebeca Garcia-Fandino10, Ivan Gushchin, Batuhan Kav11,12, Norbert Kučerka13, Patrik Kula14, Milla Kurki15, Alexander Kuzmin16, Anusha Lalitha16, Fabio Lolicato17,18, Jesper J. Madsen19,20, Markus S. Miettinen21,22, Cedric Mingham23, Luca Monticelli24,25, Ricky Nencini26, Alexey M. Nesterenko21,22, Thomas J. Piggott27, Ángel Piñeiro28, Nathalie Reuter21,22, Suman Samantray11,29, Fabián Suárez-Lestón10,28,30, Reza Talandashti21,22 & O. H. Samuli Ollila1,31
The article introduces the NMRlipids Databank, an overlay databank designed to make molecular dynamics (MD) simulation data accessible for data-driven and machine learning applications. The NMRlipids Databank is a community-driven, open-access resource that provides programmatic access to atom-resolution MD simulations of lipid bilayers. This resource addresses the challenge of limited training data in programmatically accessible formats, which often hinders the development of AI-based tools in fields such as structural biology. The NMRlipids Databank is structured into three layers: the Data layer, the Databank layer, and the Application layer. The Data layer stores raw simulation data in publicly available locations, while the Databank layer contains metadata and universal naming conventions for molecules and atoms. The Application layer includes repositories and tools for further analysis, such as the NMRlipids Databank-GUI and the NMRlipids Databank-API. Key features of the NMRlipids Databank include: 1. **Quality Evaluation**: Automatic ranking of lipid bilayer simulations based on their quality against experimental data, such as C-H bond order parameters and X-ray scattering form factors. 2. **Machine Learning Applications**: Use of the NMRlipids Databank as a training set for building ML models that predict membrane properties, such as area per lipid and thickness. 3. **Rare Phenomena Analysis**: Analysis of rare events, such as cholesterol flip-flops, which are difficult to study experimentally but are crucial for understanding membrane dynamics. 4. **New Fields Extension**: Extension of MD simulations to new areas, such as anisotropic water diffusion in membrane systems, which is important for pharmacokinetic modeling and MRI imaging. The article demonstrates the practical relevance of the NMRlipids Databank through several examples, including the selection of best-performing simulation models, prediction of multi-component membrane properties, and analysis of rare phenomena. The NMRlipids Databank is designed to be flexible and extendable, making it a valuable resource for researchers in various fields, particularly those working with biomolecules and membrane systems.The article introduces the NMRlipids Databank, an overlay databank designed to make molecular dynamics (MD) simulation data accessible for data-driven and machine learning applications. The NMRlipids Databank is a community-driven, open-access resource that provides programmatic access to atom-resolution MD simulations of lipid bilayers. This resource addresses the challenge of limited training data in programmatically accessible formats, which often hinders the development of AI-based tools in fields such as structural biology. The NMRlipids Databank is structured into three layers: the Data layer, the Databank layer, and the Application layer. The Data layer stores raw simulation data in publicly available locations, while the Databank layer contains metadata and universal naming conventions for molecules and atoms. The Application layer includes repositories and tools for further analysis, such as the NMRlipids Databank-GUI and the NMRlipids Databank-API. Key features of the NMRlipids Databank include: 1. **Quality Evaluation**: Automatic ranking of lipid bilayer simulations based on their quality against experimental data, such as C-H bond order parameters and X-ray scattering form factors. 2. **Machine Learning Applications**: Use of the NMRlipids Databank as a training set for building ML models that predict membrane properties, such as area per lipid and thickness. 3. **Rare Phenomena Analysis**: Analysis of rare events, such as cholesterol flip-flops, which are difficult to study experimentally but are crucial for understanding membrane dynamics. 4. **New Fields Extension**: Extension of MD simulations to new areas, such as anisotropic water diffusion in membrane systems, which is important for pharmacokinetic modeling and MRI imaging. The article demonstrates the practical relevance of the NMRlipids Databank through several examples, including the selection of best-performing simulation models, prediction of multi-component membrane properties, and analysis of rare phenomena. The NMRlipids Databank is designed to be flexible and extendable, making it a valuable resource for researchers in various fields, particularly those working with biomolecules and membrane systems.
Reach us at info@study.space