2007 | Jessica Lin · Eamonn Keogh · Li Wei · Stefano Lonardi
This paper introduces a novel symbolic representation of time series called SAX (Symbolic Aggregate approXimation). The proposed method allows for dimensionality and numerosity reduction, and enables the definition of distance measures on the symbolic representation that lower bound corresponding distance measures on the original time series. This feature is particularly valuable as it allows data mining algorithms to be applied to the efficiently manipulated symbolic representation while producing identical results to those obtained from the original data. The method involves first transforming the time series into a Piecewise Aggregate Approximation (PAA) representation, and then discretizing the PAA representation into a symbolic string. The PAA representation reduces the dimensionality of the time series, while the discretization process ensures that the symbolic representation preserves the essential characteristics of the original data. The distance measures defined on the symbolic representation are shown to lower bound the true distance measures on the original time series, making the symbolic representation a powerful tool for data mining tasks such as clustering, classification, query by content, anomaly detection, and visualization. The paper also discusses the experimental validation of the SAX approach on various data mining tasks, demonstrating its effectiveness in comparison to other existing methods. The results show that SAX outperforms traditional methods in terms of accuracy and efficiency, particularly in noisy or high-dimensional data. The method is also shown to be effective in reducing the numerosity of the data, making it a valuable tool for data mining applications.This paper introduces a novel symbolic representation of time series called SAX (Symbolic Aggregate approXimation). The proposed method allows for dimensionality and numerosity reduction, and enables the definition of distance measures on the symbolic representation that lower bound corresponding distance measures on the original time series. This feature is particularly valuable as it allows data mining algorithms to be applied to the efficiently manipulated symbolic representation while producing identical results to those obtained from the original data. The method involves first transforming the time series into a Piecewise Aggregate Approximation (PAA) representation, and then discretizing the PAA representation into a symbolic string. The PAA representation reduces the dimensionality of the time series, while the discretization process ensures that the symbolic representation preserves the essential characteristics of the original data. The distance measures defined on the symbolic representation are shown to lower bound the true distance measures on the original time series, making the symbolic representation a powerful tool for data mining tasks such as clustering, classification, query by content, anomaly detection, and visualization. The paper also discusses the experimental validation of the SAX approach on various data mining tasks, demonstrating its effectiveness in comparison to other existing methods. The results show that SAX outperforms traditional methods in terms of accuracy and efficiency, particularly in noisy or high-dimensional data. The method is also shown to be effective in reducing the numerosity of the data, making it a valuable tool for data mining applications.