This paper introduces a novel technique for literature indexing and searching in a mechanized library system, called "Probabilistic Indexing." The core concept is relevance, defined probabilistically to measure the likelihood that a document satisfies a given request. The technique allows a computer to compute a "relevance number" for each document, which quantifies its probable relevance to the request. The search result is an ordered list of documents ranked by their relevance numbers.
The paper discusses the challenges of conventional library systems, where indexing relies on semantic closeness between terms. It argues that statistical measures of closeness can be defined and used to improve search accuracy. The library problem is interpreted as one where the request serves as a clue, and the system makes statistical inferences to provide an ordered list of documents most likely to satisfy the user's needs.
The paper outlines three parts: (a) the conventional approach to library systems, (b) the solution via Probabilistic Indexing, and (c) preliminary experiments. It explains that conventional indexing uses binary tags (yes/no) to identify document content, but this approach is limited by semantic noise. Probabilistic Indexing addresses this by assigning weights to index terms, allowing for a more nuanced assessment of relevance.
The paper introduces the concept of a "relevance number," derived from probabilistic calculations involving a priori probabilities and conditional probabilities. It shows how these probabilities can be used to compute a relevance number for each document, enabling ranking by relevance. The technique also allows for automatic elaboration of search requests, expanding the search to include documents that might otherwise be excluded due to semantic noise.
The paper discusses how to measure closeness between index terms and documents, using statistical relationships and conditional probabilities. It introduces three measures of closeness: conditional probability, inverse conditional probability, and a coefficient of association between attributes. These measures are used to generate heuristics for improving search results.
The paper also describes how to extend the request language to include numerical weights for index terms, allowing for more precise control over the search process. It outlines a search strategy that combines basic selection, elaboration of requests, and adjoined documents to produce an ordered list of documents most relevant to the request.
The paper concludes with experimental results showing that Probabilistic Indexing improves search effectiveness by reducing the probability of retrieving irrelevant documents and increasing the probability of retrieving relevant ones. The technique provides a more accurate and efficient way to index and search documents in a library system.This paper introduces a novel technique for literature indexing and searching in a mechanized library system, called "Probabilistic Indexing." The core concept is relevance, defined probabilistically to measure the likelihood that a document satisfies a given request. The technique allows a computer to compute a "relevance number" for each document, which quantifies its probable relevance to the request. The search result is an ordered list of documents ranked by their relevance numbers.
The paper discusses the challenges of conventional library systems, where indexing relies on semantic closeness between terms. It argues that statistical measures of closeness can be defined and used to improve search accuracy. The library problem is interpreted as one where the request serves as a clue, and the system makes statistical inferences to provide an ordered list of documents most likely to satisfy the user's needs.
The paper outlines three parts: (a) the conventional approach to library systems, (b) the solution via Probabilistic Indexing, and (c) preliminary experiments. It explains that conventional indexing uses binary tags (yes/no) to identify document content, but this approach is limited by semantic noise. Probabilistic Indexing addresses this by assigning weights to index terms, allowing for a more nuanced assessment of relevance.
The paper introduces the concept of a "relevance number," derived from probabilistic calculations involving a priori probabilities and conditional probabilities. It shows how these probabilities can be used to compute a relevance number for each document, enabling ranking by relevance. The technique also allows for automatic elaboration of search requests, expanding the search to include documents that might otherwise be excluded due to semantic noise.
The paper discusses how to measure closeness between index terms and documents, using statistical relationships and conditional probabilities. It introduces three measures of closeness: conditional probability, inverse conditional probability, and a coefficient of association between attributes. These measures are used to generate heuristics for improving search results.
The paper also describes how to extend the request language to include numerical weights for index terms, allowing for more precise control over the search process. It outlines a search strategy that combines basic selection, elaboration of requests, and adjoined documents to produce an ordered list of documents most relevant to the request.
The paper concludes with experimental results showing that Probabilistic Indexing improves search effectiveness by reducing the probability of retrieving irrelevant documents and increasing the probability of retrieving relevant ones. The technique provides a more accurate and efficient way to index and search documents in a library system.