PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

October 1968 | DONALD R. MORRISON
PATRICIA is a flexible algorithm for storing, indexing, and retrieving information in a large file. It is economical in index space and reindexing time, requiring no rearrangement of text or index when new material is added. It is highly flexible in the variety of keys it can respond to and retrieves information efficiently, with computation bounded linearly on key length and occurrences, independent of library size. PATRICIA has been implemented in FORTRAN for the CDC-3600 and applied to large information-retrieval problems. The algorithm constructs an index for a binary-coded library, uses it to retrieve targets via keys, and modifies the index for changes. The index contains only numbers designating text or index locations. Retrieval of a key's occurrence has a computation bound linear in key length, while finding all occurrences is linear in their number. Adding a new item involves copying the text, determining existing parts, and updating the index with minimal effort. PATRICIA responds to keys present in the text, not the index, allowing natural formats. The text is unrestricted in format, and items can be stored in any order. Keys and targets need not be in one-to-one correspondence. The algorithm uses a binary alphabet, which simplifies storage and computation. The TWIN table is accessed frequently, so it is stored in fast-access memory. The START table links odd-numbered chains to the text, simplifying internal and external file management. The algorithm uses a numbering scheme for starts, ends, branches, chains, and twins, with odd numbers for starts and even for branches. The START table is used for indirect addressing, linking the TWIN table to the text. The updating algorithm, ADD p, efficiently finds where to branch, using FINDONE to maintain linear computational steps. The algorithm defines seven key terms: START, STOP, END, L-PHRASE, BRANCH, TWIN, and CHAIN. Each L-phrase is identified by a chain and length coordinate. Starts are designated by the librarian, and ends are determined by their position in the text. Branches and twins are determined by their extensions in the text. The algorithm uses a coordinate system to identify L-phrases, with chain and length coordinates. Starts are assigned odd numbers, ends and branches even. Twins are assigned based on branch extensions. The HEIGHT INDEX records the length of the longest member of each chain. The algorithm includes two key algorithms: FINDONE and FINDALL. FINDONE identifies twin left parts of a phrase, while FINDALL finds all ends that are right extensions of a phrase. These algorithms efficiently retrieve and update information in the library. The algorithm ensures compliance with Rule 1, which requires each end to have only one proper occurrence. Rule 2 ensures this by assigning unique identifiers to each book, preventing overlaps. The reverse library technique is an alternative for large libraries, allowing retrieval of multiple occurrences of common right parts without violating Rule 1.PATRICIA is a flexible algorithm for storing, indexing, and retrieving information in a large file. It is economical in index space and reindexing time, requiring no rearrangement of text or index when new material is added. It is highly flexible in the variety of keys it can respond to and retrieves information efficiently, with computation bounded linearly on key length and occurrences, independent of library size. PATRICIA has been implemented in FORTRAN for the CDC-3600 and applied to large information-retrieval problems. The algorithm constructs an index for a binary-coded library, uses it to retrieve targets via keys, and modifies the index for changes. The index contains only numbers designating text or index locations. Retrieval of a key's occurrence has a computation bound linear in key length, while finding all occurrences is linear in their number. Adding a new item involves copying the text, determining existing parts, and updating the index with minimal effort. PATRICIA responds to keys present in the text, not the index, allowing natural formats. The text is unrestricted in format, and items can be stored in any order. Keys and targets need not be in one-to-one correspondence. The algorithm uses a binary alphabet, which simplifies storage and computation. The TWIN table is accessed frequently, so it is stored in fast-access memory. The START table links odd-numbered chains to the text, simplifying internal and external file management. The algorithm uses a numbering scheme for starts, ends, branches, chains, and twins, with odd numbers for starts and even for branches. The START table is used for indirect addressing, linking the TWIN table to the text. The updating algorithm, ADD p, efficiently finds where to branch, using FINDONE to maintain linear computational steps. The algorithm defines seven key terms: START, STOP, END, L-PHRASE, BRANCH, TWIN, and CHAIN. Each L-phrase is identified by a chain and length coordinate. Starts are designated by the librarian, and ends are determined by their position in the text. Branches and twins are determined by their extensions in the text. The algorithm uses a coordinate system to identify L-phrases, with chain and length coordinates. Starts are assigned odd numbers, ends and branches even. Twins are assigned based on branch extensions. The HEIGHT INDEX records the length of the longest member of each chain. The algorithm includes two key algorithms: FINDONE and FINDALL. FINDONE identifies twin left parts of a phrase, while FINDALL finds all ends that are right extensions of a phrase. These algorithms efficiently retrieve and update information in the library. The algorithm ensures compliance with Rule 1, which requires each end to have only one proper occurrence. Rule 2 ensures this by assigning unique identifiers to each book, preventing overlaps. The reverse library technique is an alternative for large libraries, allowing retrieval of multiple occurrences of common right parts without violating Rule 1.
Reach us at info@study.space
[slides and audio] PATRICIA%E2%80%94Practical Algorithm To Retrieve Information Coded in Alphanumeric