Pairtools: From sequencing data to chromosome contacts

Pairtools: From sequencing data to chromosome contacts

May 29, 2024 | Open2C*, Nezar Abdennur2,3, Geoffrey Fudenberg4, Ilya M. Flyamer5*, Aleksandra A. Galitsyna6,7*, Anton Goloborodko6, Maxim Imakaev6, Sergey V. Veney3
Pairtools is a flexible and high-performance suite of tools designed for processing 3D genome organization data, particularly from Hi-C and other chromosome conformation capture (3C+) protocols. The software provides modular command-line interface (CLI) tools that can be chained into data processing pipelines, facilitating the extraction, manipulation, and analysis of chromosome contacts from sequencing data. Key features include parsing .sam alignments into Hi-C pairs, sorting and deduplication of pairs, and quality control (QC) tools. Pairtools supports a wide range of 3C+ protocols, including standard Hi-C, homolog- and sister-sensitive Hi-C, and single-cell Hi-C. It also includes protocol-specific tools for restriction-based protocols, haplotype-resolved contacts, and single-cell Hi-C. Pairtools is implemented in Python and integrates with common data analysis libraries, making it a versatile foundation for 3C+ pipelines. Benchmarking against popular 3C+ data pipelines demonstrates its advantages in terms of performance and flexibility. The software is freely available as open-source code and is integrated into the high-performant nextflow-based pipeline *distiller*.Pairtools is a flexible and high-performance suite of tools designed for processing 3D genome organization data, particularly from Hi-C and other chromosome conformation capture (3C+) protocols. The software provides modular command-line interface (CLI) tools that can be chained into data processing pipelines, facilitating the extraction, manipulation, and analysis of chromosome contacts from sequencing data. Key features include parsing .sam alignments into Hi-C pairs, sorting and deduplication of pairs, and quality control (QC) tools. Pairtools supports a wide range of 3C+ protocols, including standard Hi-C, homolog- and sister-sensitive Hi-C, and single-cell Hi-C. It also includes protocol-specific tools for restriction-based protocols, haplotype-resolved contacts, and single-cell Hi-C. Pairtools is implemented in Python and integrates with common data analysis libraries, making it a versatile foundation for 3C+ pipelines. Benchmarking against popular 3C+ data pipelines demonstrates its advantages in terms of performance and flexibility. The software is freely available as open-source code and is integrated into the high-performant nextflow-based pipeline *distiller*.
Reach us at info@study.space
[slides and audio] Pairtools%3A From sequencing data to chromosome contacts