[slides and audio] On Relevance%2C Probabilistic Indexing and Information Retrieval

This paper introduces a novel technique called "Probabilistic Indexing" for literature indexing and searching in a mechanized library system. The core concept is "relevance," which is defined in terms of probability theory. Probabilistic Indexing allows a computing machine to derive a "relevance number" for each document, indicating the probability that the document will satisfy a given request. The relevance numbers are used to rank documents according to their probable relevance. The paper also discusses the limitations of conventional library systems, where cross-referencing is based solely on the semantic closeness between index terms. Probabilistic Indexing addresses these limitations by defining statistical measures of closeness between index terms, enabling the machine to elaborate on the request and retrieve more relevant documents. The authors propose an interpretation of the library problem as a statistical inference process, where the request is used to generate a list of documents that are most likely to satisfy the user's information needs. They present three measures of closeness in index space and discuss heuristics for elaborating the selection process, including extending the request and refining the retrieved documents. Finally, the paper outlines an overall search strategy and presents preliminary experimental results to validate the effectiveness of the Probabilistic Indexing technique.This paper introduces a novel technique called "Probabilistic Indexing" for literature indexing and searching in a mechanized library system. The core concept is "relevance," which is defined in terms of probability theory. Probabilistic Indexing allows a computing machine to derive a "relevance number" for each document, indicating the probability that the document will satisfy a given request. The relevance numbers are used to rank documents according to their probable relevance. The paper also discusses the limitations of conventional library systems, where cross-referencing is based solely on the semantic closeness between index terms. Probabilistic Indexing addresses these limitations by defining statistical measures of closeness between index terms, enabling the machine to elaborate on the request and retrieve more relevant documents. The authors propose an interpretation of the library problem as a statistical inference process, where the request is used to generate a list of documents that are most likely to satisfy the user's information needs. They present three measures of closeness in index space and discuss heuristics for elaborating the selection process, including extending the request and refining the retrieved documents. Finally, the paper outlines an overall search strategy and presents preliminary experimental results to validate the effectiveness of the Probabilistic Indexing technique.

On Relevance, Probabilistic Indexing and Information Retrieval

Received October, 1959 | M. E. MARON AND J. L. KUHNS