2012 | Jonas S Almeida, Alexander Grüneberg, Wolfgang Maass and Susana Vinga
This paper presents a novel approach to sequence alignment using fractal MapReduce decomposition. The method, called Universal Sequence Map (USM), allows for the alignment of sequences without relying on dynamic programming. Instead, it uses a fractal representation technique known as Chaos Game Representation (CGR) to encode sequences as numerical coordinates. These coordinates are then used to calculate the length of the longest similar segment between two sequences, without the need for dynamic programming.
The USM approach is implemented using a browser-based application (webApp) and an open-source JavaScript library (usm.js). The webApp allows users to interact with the USM method, while the library provides the underlying functionality for sequence encoding, decoding, and distance calculation. The method is particularly effective for large-scale genomic data analysis, as it can be parallelized and distributed across multiple computing nodes.
The USM method is based on the concept of iterated maps, which are used to generate numerical coordinates for sequences. These coordinates are then used to calculate the similarity between sequences. The method is compared to traditional dynamic programming approaches and is shown to be more efficient and scalable for large datasets.
The paper also discusses the implementation of the USM method in JavaScript, highlighting the advantages of using a functional programming language for this type of computation. The method is demonstrated using a variety of examples, including the alignment of short sequences and full genomes. The results show that the USM method can be used to align sequences efficiently and accurately, even for very long sequences.
The study concludes that the USM method provides a scalable and efficient alternative to traditional sequence alignment methods. It is particularly well-suited for large-scale genomic data analysis, where the volume of data is too large for traditional methods to handle. The method is implemented in JavaScript, making it accessible and easy to use for a wide range of applications.This paper presents a novel approach to sequence alignment using fractal MapReduce decomposition. The method, called Universal Sequence Map (USM), allows for the alignment of sequences without relying on dynamic programming. Instead, it uses a fractal representation technique known as Chaos Game Representation (CGR) to encode sequences as numerical coordinates. These coordinates are then used to calculate the length of the longest similar segment between two sequences, without the need for dynamic programming.
The USM approach is implemented using a browser-based application (webApp) and an open-source JavaScript library (usm.js). The webApp allows users to interact with the USM method, while the library provides the underlying functionality for sequence encoding, decoding, and distance calculation. The method is particularly effective for large-scale genomic data analysis, as it can be parallelized and distributed across multiple computing nodes.
The USM method is based on the concept of iterated maps, which are used to generate numerical coordinates for sequences. These coordinates are then used to calculate the similarity between sequences. The method is compared to traditional dynamic programming approaches and is shown to be more efficient and scalable for large datasets.
The paper also discusses the implementation of the USM method in JavaScript, highlighting the advantages of using a functional programming language for this type of computation. The method is demonstrated using a variety of examples, including the alignment of short sequences and full genomes. The results show that the USM method can be used to align sequences efficiently and accurately, even for very long sequences.
The study concludes that the USM method provides a scalable and efficient alternative to traditional sequence alignment methods. It is particularly well-suited for large-scale genomic data analysis, where the volume of data is too large for traditional methods to handle. The method is implemented in JavaScript, making it accessible and easy to use for a wide range of applications.