| Kaushik Chakrabarti, Eamonn Keogh, Sharad Mehrotra, Michael Pazzani
This paper introduces a new dimensionality reduction technique called Adaptive Piecewise Constant Approximation (APCA) for indexing large time series databases. APCA approximates each time series by a set of constant value segments of varying lengths, minimizing individual reconstruction errors. Unlike previous techniques such as SVD, DFT, and DWT, which use a common representation for all items, APCA creates a representation tailored to each time series. The paper shows how APCA can be indexed using a multidimensional index structure and proposes two distance measures that exploit APCA's high fidelity for fast searching: a lower bounding Euclidean distance approximation and a non-lower bounding but very tight Euclidean distance approximation. These measures support both exact and approximate searching on the same index structure. The paper theoretically and empirically compares APCA to other techniques and demonstrates its superiority. APCA is efficient, with performance up to one to two orders of magnitude better than alternative techniques on real-world datasets. The paper also discusses the indexing of APCA, including the use of a multidimensional index structure and the definition of distance measures for APCA. The results show that APCA provides tighter lower bounds for distance measures compared to other techniques, making it suitable for efficient similarity search in large time series databases.This paper introduces a new dimensionality reduction technique called Adaptive Piecewise Constant Approximation (APCA) for indexing large time series databases. APCA approximates each time series by a set of constant value segments of varying lengths, minimizing individual reconstruction errors. Unlike previous techniques such as SVD, DFT, and DWT, which use a common representation for all items, APCA creates a representation tailored to each time series. The paper shows how APCA can be indexed using a multidimensional index structure and proposes two distance measures that exploit APCA's high fidelity for fast searching: a lower bounding Euclidean distance approximation and a non-lower bounding but very tight Euclidean distance approximation. These measures support both exact and approximate searching on the same index structure. The paper theoretically and empirically compares APCA to other techniques and demonstrates its superiority. APCA is efficient, with performance up to one to two orders of magnitude better than alternative techniques on real-world datasets. The paper also discusses the indexing of APCA, including the use of a multidimensional index structure and the definition of distance measures for APCA. The results show that APCA provides tighter lower bounds for distance measures compared to other techniques, making it suitable for efficient similarity search in large time series databases.