DBXplorer: A System for Keyword-Based Search over Relational Databases

DBXplorer: A System for Keyword-Based Search over Relational Databases

2002 | Sanjay Agrawal, Surajit Chaudhuri, Gautam Das
DBXplorer is a system that enables keyword-based search in relational databases without requiring users to know the database schema. It allows users to search for rows containing specified keywords through a web-based interface. The system uses a commercial relational database and web server, and leverages the database's physical structure to build efficient symbol tables for keyword search. The symbol tables are compressed to reduce space and improve performance, and the system supports both exact and generalized keyword matches. The system's core components include a preprocessing step called "Publish" that builds the symbol table and a "Search" step that retrieves matching rows. The symbol table is designed to store information at different granularities (column or cell level) based on the database's physical structure and index availability. The system also supports efficient search across multiple tables by generating SQL queries that join relevant tables and select rows containing all keywords. DBXplorer handles generalized keyword matches by using techniques such as full-text indexing, prefix-based searching, and compression algorithms. It is implemented using Microsoft SQL Server and IIS, and is currently deployed on Microsoft's corporate intranet. The system has been evaluated through experiments showing that Pub-Col (column-level symbol tables) is more efficient in terms of space and search performance compared to Pub-Cell (cell-level symbol tables), especially when the number of matching rows is large. Pub-Prefix, a method that uses traditional B+ tree indexes, provides efficient token matching for small-width columns. The system's design allows for scalability and efficient search performance, making it a valuable tool for enabling keyword-based search in relational databases. The research also explores trade-offs between different symbol table designs and compression techniques, and evaluates the effectiveness of various approaches in different scenarios.DBXplorer is a system that enables keyword-based search in relational databases without requiring users to know the database schema. It allows users to search for rows containing specified keywords through a web-based interface. The system uses a commercial relational database and web server, and leverages the database's physical structure to build efficient symbol tables for keyword search. The symbol tables are compressed to reduce space and improve performance, and the system supports both exact and generalized keyword matches. The system's core components include a preprocessing step called "Publish" that builds the symbol table and a "Search" step that retrieves matching rows. The symbol table is designed to store information at different granularities (column or cell level) based on the database's physical structure and index availability. The system also supports efficient search across multiple tables by generating SQL queries that join relevant tables and select rows containing all keywords. DBXplorer handles generalized keyword matches by using techniques such as full-text indexing, prefix-based searching, and compression algorithms. It is implemented using Microsoft SQL Server and IIS, and is currently deployed on Microsoft's corporate intranet. The system has been evaluated through experiments showing that Pub-Col (column-level symbol tables) is more efficient in terms of space and search performance compared to Pub-Cell (cell-level symbol tables), especially when the number of matching rows is large. Pub-Prefix, a method that uses traditional B+ tree indexes, provides efficient token matching for small-width columns. The system's design allows for scalability and efficient search performance, making it a valuable tool for enabling keyword-based search in relational databases. The research also explores trade-offs between different symbol table designs and compression techniques, and evaluates the effectiveness of various approaches in different scenarios.
Reach us at info@study.space
[slides] DBXplorer%3A a system for keyword-based search over relational databases | StudySpace