Cooler: scalable storage for Hi-C data and other genomically-labeled arrays

Cooler: scalable storage for Hi-C data and other genomically-labeled arrays

February 21, 2019 | Nezar Abdennur and Leonid Mirny
Cooler is a scalable storage format for Hi-C data and other genomically-labeled arrays, designed to efficiently handle the sparse nature of genomic data. It is based on a sparse data model and uses HDF5 as its underlying storage format. The format supports flexible data structures, allowing for various resolutions and metadata, and is cross-platform, BSD-licensed, and available via Python Package Index or bioconda. The cooler package includes a Python library and command line tools for creating, reading, and manipulating cooler data collections. It enables efficient compression, fast random access, and supports both sequential and random access, making it suitable for out-of-core data processing algorithms. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Cooler is particularly useful for handling large, high-resolution datasets, as it reduces storage requirements by representing data sparsely, which is critical for scalable storage and analysis. The format supports multi-resolution data, allowing for interactive multiscale visualization. The cooler package is integrated into various genomic analysis tools and is compatible with a wide range of programming environments, including Python, Java, and R. It provides a flexible and efficient solution for the storage and analysis of large genomic datasets.Cooler is a scalable storage format for Hi-C data and other genomically-labeled arrays, designed to efficiently handle the sparse nature of genomic data. It is based on a sparse data model and uses HDF5 as its underlying storage format. The format supports flexible data structures, allowing for various resolutions and metadata, and is cross-platform, BSD-licensed, and available via Python Package Index or bioconda. The cooler package includes a Python library and command line tools for creating, reading, and manipulating cooler data collections. It enables efficient compression, fast random access, and supports both sequential and random access, making it suitable for out-of-core data processing algorithms. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Cooler is particularly useful for handling large, high-resolution datasets, as it reduces storage requirements by representing data sparsely, which is critical for scalable storage and analysis. The format supports multi-resolution data, allowing for interactive multiscale visualization. The cooler package is integrated into various genomic analysis tools and is compatible with a wide range of programming environments, including Python, Java, and R. It provides a flexible and efficient solution for the storage and analysis of large genomic datasets.
Reach us at info@study.space