15 June 2006 / Accepted: 10 January 2007 / Published online: 3 April 2007 | Jessica Lin · Eamonn Keogh · Li Wei · Stefano Lonardi
The paper introduces a novel symbolic representation of time series called SAX (Symbolic Aggregate approXimation). Unlike other symbolic representations, SAX allows for dimensionality and numerosity reduction, and it defines distance measures that lower bound the distances in the original time series space. This feature enables the use of efficient data structures and algorithms from text processing and bioinformatics, which are not applicable to real-valued representations. The authors demonstrate the effectiveness of SAX on various data mining tasks, including clustering, classification, query by content, anomaly detection, motif discovery, and visualization. Experimental results show that SAX can produce identical results to algorithms operating on the original data while being more efficient in terms of space and time. The paper also discusses the impact of SAX on the choice of parameters and provides a visual comparison with other common time series representations.The paper introduces a novel symbolic representation of time series called SAX (Symbolic Aggregate approXimation). Unlike other symbolic representations, SAX allows for dimensionality and numerosity reduction, and it defines distance measures that lower bound the distances in the original time series space. This feature enables the use of efficient data structures and algorithms from text processing and bioinformatics, which are not applicable to real-valued representations. The authors demonstrate the effectiveness of SAX on various data mining tasks, including clustering, classification, query by content, anomaly detection, motif discovery, and visualization. Experimental results show that SAX can produce identical results to algorithms operating on the original data while being more efficient in terms of space and time. The paper also discusses the impact of SAX on the choice of parameters and provides a visual comparison with other common time series representations.