21 May 2024 | Camille Moeckel, Manvita Mareboina, Maxwell A. Konnaris, Candace S.Y. Chan, Ioannis Mouratidis, Austin Montgomery, Nikol Chantzi, Georgios A. Pavlopoulos, Ilias Georgakopoulos-Soares
This review provides an overview of k-mers and their applications in bioinformatics, focusing on their significance in genomic and proteomic data analyses. K-mers, defined as contiguous nucleotide or amino acid sequences of fixed length \( k \), have become essential tools in addressing the challenges posed by large and complex datasets in genomics and proteomics. The review highlights the advantages of k-mers in computational speed, memory efficiency, and their potential biological functionality. Key applications include k-mer counting and frequency analysis, sequence alignment, genome assembly, error correction, genome editing, comparative genomics, metagenomics, metaproteomics, and protein structure prediction. The review also discusses the utility of absent sequences, such as nullomers and nulipptides, in disease detection, vaccine development, therapeutics, and forensic science. The selection of \( k \) values is crucial, balancing the need for accurate representation with computational efficiency. The review concludes by emphasizing the pivotal role of k-mers in advancing research and future breakthroughs in genomics and proteomics.This review provides an overview of k-mers and their applications in bioinformatics, focusing on their significance in genomic and proteomic data analyses. K-mers, defined as contiguous nucleotide or amino acid sequences of fixed length \( k \), have become essential tools in addressing the challenges posed by large and complex datasets in genomics and proteomics. The review highlights the advantages of k-mers in computational speed, memory efficiency, and their potential biological functionality. Key applications include k-mer counting and frequency analysis, sequence alignment, genome assembly, error correction, genome editing, comparative genomics, metagenomics, metaproteomics, and protein structure prediction. The review also discusses the utility of absent sequences, such as nullomers and nulipptides, in disease detection, vaccine development, therapeutics, and forensic science. The selection of \( k \) values is crucial, balancing the need for accurate representation with computational efficiency. The review concludes by emphasizing the pivotal role of k-mers in advancing research and future breakthroughs in genomics and proteomics.