07 February 2024 | Anne M. Kiirikki, Hanne S. Antila, Lara S. Bort, Pavel Buslaev, Fernando Favela-Rosales, Tiago Mendes Ferreira, Patrick F. J. Fuchs, Rebeca Garcia-Fandino, Ivan Gushchin, Batuhan Kav, Norbert Kučerka, Patrik Kula, Milla Kurki, Alexander Kuzmin, Anusha Lalitha, Fabio Lolicato, Jesper J. Madsen, Markus S. Miettinen, Cedric Mingh, Luca Monticelli, Ricky Nencini, Alexey M. Nesterenko, Thomas J. Piggot, Ángel Piñeiro, Nathalie Reuter, Suman Samantray, Fabián Suárez-Lestón, Reza Talandashti & O. H. Samuli Ollila
The article introduces the NMRlipids Databank, an overlay databank that provides programmatic access to high-quality atom-resolution molecular dynamics (MD) simulations of lipid bilayers. This databank enables data-driven and machine learning (ML) analyses of cellular membranes, which are complex and difficult to study experimentally. The NMRlipids Databank is community-driven and open for all, featuring a structured overlay format that allows efficient access to simulation data. It includes 765 simulation trajectories with a total length of approximately 0.4 ms, covering a wide range of lipid compositions and membrane properties. The databank includes metadata, simulation quality evaluations, and tools for data analysis, enabling researchers to select the best simulations for specific applications. The NMRlipids Databank also provides a graphical user interface (GUI) and an API for programmatic access, facilitating the development of ML models and analyses of rare phenomena, such as cholesterol flip-flops. The databank has been used to predict membrane properties, analyze water diffusion anisotropy, and extend MD simulations to new fields like pharmacokinetics and MRI imaging. The overlay databank concept can be applied to other biomolecules and fields where data access is limited, promoting open collaboration and data sharing. The NMRlipids Databank demonstrates how overlay databanks can overcome barriers to AI applications by providing accessible, standardized data for data-driven analyses.The article introduces the NMRlipids Databank, an overlay databank that provides programmatic access to high-quality atom-resolution molecular dynamics (MD) simulations of lipid bilayers. This databank enables data-driven and machine learning (ML) analyses of cellular membranes, which are complex and difficult to study experimentally. The NMRlipids Databank is community-driven and open for all, featuring a structured overlay format that allows efficient access to simulation data. It includes 765 simulation trajectories with a total length of approximately 0.4 ms, covering a wide range of lipid compositions and membrane properties. The databank includes metadata, simulation quality evaluations, and tools for data analysis, enabling researchers to select the best simulations for specific applications. The NMRlipids Databank also provides a graphical user interface (GUI) and an API for programmatic access, facilitating the development of ML models and analyses of rare phenomena, such as cholesterol flip-flops. The databank has been used to predict membrane properties, analyze water diffusion anisotropy, and extend MD simulations to new fields like pharmacokinetics and MRI imaging. The overlay databank concept can be applied to other biomolecules and fields where data access is limited, promoting open collaboration and data sharing. The NMRlipids Databank demonstrates how overlay databanks can overcome barriers to AI applications by providing accessible, standardized data for data-driven analyses.