DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing

DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing

08 July 2023 | Peng Ni, Fan Nie, Zeyu Zhong, Jinrui Xu, Neng Huang, Jun Zhang, Haochen Zhao, You Zou, Yuanfeng Huang, Jinchen Li, Chuan-Le Xiao, Feng Luo & Jianxin Wang
This study presents ccsmeth, a deep-learning method for detecting DNA 5-methylcytosine (5mC) in CpG sites using PacBio circular consensus sequencing (CCS) reads. The method achieves high accuracy (0.90) and area under the curve (AUC) (0.97) in 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Additionally, the study develops a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, validated on a Chinese family trio. ccsmeth and ccsmethphase are robust and accurate tools for detecting DNA 5-methylcytosines. 5mC is the most common form of DNA methylation, involved in regulating many biological processes. In humans, most 5mCs occur at CpG sites, associated with embryonic development, diseases, and aging. Bisulfite sequencing (BS-seq) is the most widely used method for profiling 5mC methylation. However, bisulfite treatment damages DNA, leading to DNA degradation and loss of sequencing diversity. Recent bisulfite-free methods, such as TAPS and EM-seq, have been developed, which offer more uniform coverage and higher unique mapping rates than BS-seq. These methods can be applied to both short-read and long-read sequencing. Two major long-read sequencing technologies, PacBio single-molecule real-time (SMRT) sequencing and nanopore sequencing, can directly sequence native DNA without PCR amplification. DNA base modifications alter polymerase kinetics in SMRT sequencing and affect the electrical current signals near the modified bases in nanopore sequencing. Thus, DNA base modifications can be directly detected from native DNA reads of SMRT and nanopore sequencing without extra laboratory techniques. For nanopore sequencing, computational methods for 5mC detection either apply statistical tests to compare current signals of native DNA reads with an unmodified control or use pre-trained models. Previous studies have shown that methods using pre-trained models achieve high accuracies for DNA 5mC detection from human nanopore reads. PacBio CCS is a long-read sequencing technology that can be used to detect 5mC in CpG sites. The study presents ccsmeth, a deep-learning method that uses kinetics features (IPDs and PWs) of PacBio CCS reads to detect 5mCpGs. ccsmeth uses bidirectional Gated Recurrent Units (GRUs) and attention neural networks to detect methylation states of CpGs at both read level and genome-wide site level. The study also develops a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, validated on a Chinese family trio. ccsmeth and ccsmThis study presents ccsmeth, a deep-learning method for detecting DNA 5-methylcytosine (5mC) in CpG sites using PacBio circular consensus sequencing (CCS) reads. The method achieves high accuracy (0.90) and area under the curve (AUC) (0.97) in 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Additionally, the study develops a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, validated on a Chinese family trio. ccsmeth and ccsmethphase are robust and accurate tools for detecting DNA 5-methylcytosines. 5mC is the most common form of DNA methylation, involved in regulating many biological processes. In humans, most 5mCs occur at CpG sites, associated with embryonic development, diseases, and aging. Bisulfite sequencing (BS-seq) is the most widely used method for profiling 5mC methylation. However, bisulfite treatment damages DNA, leading to DNA degradation and loss of sequencing diversity. Recent bisulfite-free methods, such as TAPS and EM-seq, have been developed, which offer more uniform coverage and higher unique mapping rates than BS-seq. These methods can be applied to both short-read and long-read sequencing. Two major long-read sequencing technologies, PacBio single-molecule real-time (SMRT) sequencing and nanopore sequencing, can directly sequence native DNA without PCR amplification. DNA base modifications alter polymerase kinetics in SMRT sequencing and affect the electrical current signals near the modified bases in nanopore sequencing. Thus, DNA base modifications can be directly detected from native DNA reads of SMRT and nanopore sequencing without extra laboratory techniques. For nanopore sequencing, computational methods for 5mC detection either apply statistical tests to compare current signals of native DNA reads with an unmodified control or use pre-trained models. Previous studies have shown that methods using pre-trained models achieve high accuracies for DNA 5mC detection from human nanopore reads. PacBio CCS is a long-read sequencing technology that can be used to detect 5mC in CpG sites. The study presents ccsmeth, a deep-learning method that uses kinetics features (IPDs and PWs) of PacBio CCS reads to detect 5mCpGs. ccsmeth uses bidirectional Gated Recurrent Units (GRUs) and attention neural networks to detect methylation states of CpGs at both read level and genome-wide site level. The study also develops a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, validated on a Chinese family trio. ccsmeth and ccsm
Reach us at info@study.space
[slides] DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing | StudySpace