September 21, 2014 | Simon Anders, Paul Theodor Pyl and Wolfgang Huber
HTSeq is a Python library for processing high-throughput sequencing (HTS) data. It provides parsers for common data formats, classes to represent genomic data, and data structures for querying via genomic coordinates. It also includes the htseq-count tool for preprocessing RNA-Seq data for differential expression analysis. HTSeq is open-source and available under the GNU General Public Licence. It is designed to allow users with moderate Python knowledge to develop scripts for HTS data analysis. The library includes a GenomicArray class for storing genomic-position-dependent data and a GenomicArrayOfSets class for storing overlapping annotation data. HTSeq also includes a tool, htseq-qa, for quality assessment of sequencing runs. The htseq-count tool counts reads overlapping genes and is designed for differential expression analysis. It only counts reads unambiguously mapped to a single gene. HTSeq is used for a wide range of tasks, including read coverage analysis and gene-level differential expression analysis. It is efficient and can process large datasets. HTSeq complements other bioinformatics tools and is used in the research community for a variety of applications. The library is well-documented and provides extensive examples and tutorials. It is designed to be flexible and efficient, with a focus on handling complex data formats and providing a common interface for diverse data types. HTSeq is a valuable tool for HTS data analysis, offering a comprehensive solution for processing and analyzing genomic data.HTSeq is a Python library for processing high-throughput sequencing (HTS) data. It provides parsers for common data formats, classes to represent genomic data, and data structures for querying via genomic coordinates. It also includes the htseq-count tool for preprocessing RNA-Seq data for differential expression analysis. HTSeq is open-source and available under the GNU General Public Licence. It is designed to allow users with moderate Python knowledge to develop scripts for HTS data analysis. The library includes a GenomicArray class for storing genomic-position-dependent data and a GenomicArrayOfSets class for storing overlapping annotation data. HTSeq also includes a tool, htseq-qa, for quality assessment of sequencing runs. The htseq-count tool counts reads overlapping genes and is designed for differential expression analysis. It only counts reads unambiguously mapped to a single gene. HTSeq is used for a wide range of tasks, including read coverage analysis and gene-level differential expression analysis. It is efficient and can process large datasets. HTSeq complements other bioinformatics tools and is used in the research community for a variety of applications. The library is well-documented and provides extensive examples and tutorials. It is designed to be flexible and efficient, with a focus on handling complex data formats and providing a common interface for diverse data types. HTSeq is a valuable tool for HTS data analysis, offering a comprehensive solution for processing and analyzing genomic data.