Indexing and searching petabase-scale nucleotide resources

Indexing and searching petabase-scale nucleotide resources

2024 June ; 21(6): 994–1002. | Sergey A. Shiryev, Richa Agarwala
The paper introduces Pebblescout, a tool designed to index and search large nucleotide resources, such as the Sequence Read Archive (SRA) and GenBank, which contain vast amounts of sequence data. Pebblescout uses dense sampling of sequences to create an index and provides a search function that finds subjects (runs or assemblies) with short sequence matches to user queries, ensuring no false negatives for exact matches at least 42 base pairs long. The tool supports three search modes: Profile, Summary, and Detailed, allowing users to customize the level of detail in the results. The authors demonstrate Pebblescout's effectiveness through various applications, including finding specific genes, similar metagenomic runs, and viral sequences with mutations. They compare Pebblescout's performance to other tools like MetaGraph and Sourmash, showing that it can find more relevant subjects and reduce the volume of sequences needed for analysis without sacrificing quality. The tool is available as a web service at <https://pebblescout.ncbi.nlm.nih.gov>, and the authors encourage users to provide feedback for future improvements.The paper introduces Pebblescout, a tool designed to index and search large nucleotide resources, such as the Sequence Read Archive (SRA) and GenBank, which contain vast amounts of sequence data. Pebblescout uses dense sampling of sequences to create an index and provides a search function that finds subjects (runs or assemblies) with short sequence matches to user queries, ensuring no false negatives for exact matches at least 42 base pairs long. The tool supports three search modes: Profile, Summary, and Detailed, allowing users to customize the level of detail in the results. The authors demonstrate Pebblescout's effectiveness through various applications, including finding specific genes, similar metagenomic runs, and viral sequences with mutations. They compare Pebblescout's performance to other tools like MetaGraph and Sourmash, showing that it can find more relevant subjects and reduce the volume of sequences needed for analysis without sacrificing quality. The tool is available as a web service at <https://pebblescout.ncbi.nlm.nih.gov>, and the authors encourage users to provide feedback for future improvements.
Reach us at info@study.space