2013 | Jaina Mistry1,2,*, Robert D. Finn3, Sean R. Eddy3, Alex Bateman1,2 and Marco Punta1,2,*
The paper discusses challenges in homology search using HMMER3 and the issue of convergent evolution in coiled-coil regions. It evaluates HMMER3's ability to correctly assign homologous sequences to over 13,000 Pfam families. The study identifies families with problematic overlaps, where regions match multiple Pfam families not annotated as related. HMMER3's E-value estimates are less accurate for families with periodic compositional bias, such as coiled-coils. The results suggest that manually curated inclusion thresholds in Pfam are still necessary, especially for problematic families. The study highlights the need for new methods to correct for compositional bias.
The paper benchmarks HMMER3's E-value estimates using Pfam. It defines overlaps as regions matching two or more Pfam families from different clans. Overlaps are likely to indicate annotation errors. The study finds that coiled-coil and transmembrane regions are overrepresented in families overlapping with multiple clans. These regions are enriched in predicted coiled-coil and transmembrane residues. The top 20 overlapping families are identified, with many containing coiled-coil or transmembrane regions. HMMER3's bias correction can help identify problematic families.
The paper also discusses the use of E-value-based strategies for assigning overlapping domains to families, which produce similar overrepresentation results. The study concludes that the 'random sequence' null hypothesis in log-odds score tests is oversimplified. Non-homologous sequences may contain higher-order features like coiled-coils, which should be included in the null model. Pfam's manual curation of family-specific bit score thresholds helps minimize false positives, but better bias correction methods are needed for automatic searches. The study emphasizes the importance of addressing compositional bias in homology detection.The paper discusses challenges in homology search using HMMER3 and the issue of convergent evolution in coiled-coil regions. It evaluates HMMER3's ability to correctly assign homologous sequences to over 13,000 Pfam families. The study identifies families with problematic overlaps, where regions match multiple Pfam families not annotated as related. HMMER3's E-value estimates are less accurate for families with periodic compositional bias, such as coiled-coils. The results suggest that manually curated inclusion thresholds in Pfam are still necessary, especially for problematic families. The study highlights the need for new methods to correct for compositional bias.
The paper benchmarks HMMER3's E-value estimates using Pfam. It defines overlaps as regions matching two or more Pfam families from different clans. Overlaps are likely to indicate annotation errors. The study finds that coiled-coil and transmembrane regions are overrepresented in families overlapping with multiple clans. These regions are enriched in predicted coiled-coil and transmembrane residues. The top 20 overlapping families are identified, with many containing coiled-coil or transmembrane regions. HMMER3's bias correction can help identify problematic families.
The paper also discusses the use of E-value-based strategies for assigning overlapping domains to families, which produce similar overrepresentation results. The study concludes that the 'random sequence' null hypothesis in log-odds score tests is oversimplified. Non-homologous sequences may contain higher-order features like coiled-coils, which should be included in the null model. Pfam's manual curation of family-specific bit score thresholds helps minimize false positives, but better bias correction methods are needed for automatic searches. The study emphasizes the importance of addressing compositional bias in homology detection.