RNA-Seq gene expression estimation with read mapping uncertainty

RNA-Seq gene expression estimation with read mapping uncertainty

Vol. 26 no. 4 2010, pages 493-500 | Bo Li, Victor Ruotti, Ron M. Stewart, James A. Thomson and Colin N. Dewey
The article presents a statistical model and inference methods for handling read mapping uncertainty in RNA-Seq data, which is a promising technology for measuring gene expression levels. The authors address the challenge of mapping short sequencing reads to a reference genome or transcript set, where a single read may map to multiple genes and isoforms. Previous methods either discard multireads or allocate them heuristically, leading to less accurate expression estimates. The proposed method uses a generative model to handle read mapping uncertainty and estimate gene expression levels as the sum of isoform expression levels. Through simulations parameterized by real RNA-Seq data, the method is shown to be more accurate than previous methods. The model can handle non-uniform read distributions and optimal read lengths for gene-level expression estimation are determined to be around 20–25 bases for mouse and maize transcriptomes. The method is also applied to real mouse liver data, demonstrating its practical utility. The authors conclude that their method improves accuracy, especially for repetitive genomes like maize, and suggests that sequencing technologies should focus on producing larger numbers of short reads rather than longer reads to achieve the highest accuracy in gene expression estimation.The article presents a statistical model and inference methods for handling read mapping uncertainty in RNA-Seq data, which is a promising technology for measuring gene expression levels. The authors address the challenge of mapping short sequencing reads to a reference genome or transcript set, where a single read may map to multiple genes and isoforms. Previous methods either discard multireads or allocate them heuristically, leading to less accurate expression estimates. The proposed method uses a generative model to handle read mapping uncertainty and estimate gene expression levels as the sum of isoform expression levels. Through simulations parameterized by real RNA-Seq data, the method is shown to be more accurate than previous methods. The model can handle non-uniform read distributions and optimal read lengths for gene-level expression estimation are determined to be around 20–25 bases for mouse and maize transcriptomes. The method is also applied to real mouse liver data, demonstrating its practical utility. The authors conclude that their method improves accuracy, especially for repetitive genomes like maize, and suggests that sequencing technologies should focus on producing larger numbers of short reads rather than longer reads to achieve the highest accuracy in gene expression estimation.
Reach us at info@study.space
[slides] RNA-Seq gene expression estimation with read mapping uncertainty | StudySpace