Efficient Similarity Search In Sequence Databases

Efficient Similarity Search In Sequence Databases

| Rakesh Agrawal, Christos Faloutsos*, Arun Swami
The paper proposes an efficient method for indexing time sequences to process similarity queries. The authors use the Discrete Fourier Transform (DFT) to map sequences to the frequency domain, leveraging the fact that most practical sequences have strong amplitudes for only a few frequencies. Parseval's theorem ensures that the Euclidean distance in the time domain is preserved in the frequency domain. By indexing only the first few Fourier coefficients using $R^*$-trees, the method achieves efficient and accurate similarity searches. Experimental results show that the method outperforms sequential scanning, with performance gains increasing with the number and length of sequences. The proposed technique is robust and can be adapted to various similarity measures and multi-dimensional indexing methods.The paper proposes an efficient method for indexing time sequences to process similarity queries. The authors use the Discrete Fourier Transform (DFT) to map sequences to the frequency domain, leveraging the fact that most practical sequences have strong amplitudes for only a few frequencies. Parseval's theorem ensures that the Euclidean distance in the time domain is preserved in the frequency domain. By indexing only the first few Fourier coefficients using $R^*$-trees, the method achieves efficient and accurate similarity searches. Experimental results show that the method outperforms sequential scanning, with performance gains increasing with the number and length of sequences. The proposed technique is robust and can be adapted to various similarity measures and multi-dimensional indexing methods.
Reach us at info@study.space