Indexing and Querying XML Data for Regular Path Expressions

Indexing and Querying XML Data for Regular Path Expressions

Roma, Italy, 2001 | Quanzhong Li, Bongki Moon
This paper presents a new system for indexing and storing XML data, called XISS, which is based on a numbering scheme for elements and attributes. The system allows efficient processing of regular path expressions, which are commonly used in XML query languages. The proposed numbering scheme enables quick determination of ancestor-descendant relationships between elements and attributes in the XML hierarchy. This scheme is designed to accommodate future insertions and allows for constant-time determination of these relationships. The XISS system includes three major index structures: element index, attribute index, and structure index. These indexes support efficient search by name string and structure. The system also proposes three algorithms for processing regular path expressions: EE-Join for searching paths between elements, EA-Join for scanning sorted elements and attributes to find element-attribute pairs, and KC-Join for finding Kleene-Closure on repeated paths or elements. The EE-Join algorithm is particularly effective for long or unknown-length paths. The proposed algorithms significantly outperform conventional approaches in processing XML queries with regular path expressions. Experimental results show that the proposed algorithms can process XML queries up to 10 times faster than conventional methods. The system is designed to handle both value and structure searches, and the index structures allow for efficient retrieval of elements and attributes by name and structure. The paper also discusses the limitations of conventional query processing methods for regular path expressions, which often involve tree traversals and can be inefficient for long or unknown-length paths. The proposed algorithms avoid these inefficiencies by leveraging the numbering scheme and index structures to directly access relevant elements and attributes without traversing the XML hierarchy. This approach significantly improves the performance of XML query processing.This paper presents a new system for indexing and storing XML data, called XISS, which is based on a numbering scheme for elements and attributes. The system allows efficient processing of regular path expressions, which are commonly used in XML query languages. The proposed numbering scheme enables quick determination of ancestor-descendant relationships between elements and attributes in the XML hierarchy. This scheme is designed to accommodate future insertions and allows for constant-time determination of these relationships. The XISS system includes three major index structures: element index, attribute index, and structure index. These indexes support efficient search by name string and structure. The system also proposes three algorithms for processing regular path expressions: EE-Join for searching paths between elements, EA-Join for scanning sorted elements and attributes to find element-attribute pairs, and KC-Join for finding Kleene-Closure on repeated paths or elements. The EE-Join algorithm is particularly effective for long or unknown-length paths. The proposed algorithms significantly outperform conventional approaches in processing XML queries with regular path expressions. Experimental results show that the proposed algorithms can process XML queries up to 10 times faster than conventional methods. The system is designed to handle both value and structure searches, and the index structures allow for efficient retrieval of elements and attributes by name and structure. The paper also discusses the limitations of conventional query processing methods for regular path expressions, which often involve tree traversals and can be inefficient for long or unknown-length paths. The proposed algorithms avoid these inefficiencies by leveraging the numbering scheme and index structures to directly access relevant elements and attributes without traversing the XML hierarchy. This approach significantly improves the performance of XML query processing.
Reach us at info@study.space