November 1987 | G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais
The vocabulary problem in human-system communication refers to the difficulty users face in entering the correct words to access desired objects or actions in computer systems. This problem arises because people use a wide variety of words to refer to the same thing, making it challenging for systems to recognize the correct term. The study found that the probability of two people using the same term for an object is less than 0.20, indicating significant variability in word choice. This variability limits the effectiveness of design methodologies that rely on a single word for access, leading to high failure rates. An optimal strategy, unlimited aliasing, is proposed, which allows for multiple alternative terms, significantly improving success rates.
The study analyzed data from five different application domains, revealing that no single term can cover more than a small proportion of users' attempts. Designers often underestimate the problem, leading to insufficient alternate entries in databases or services, which creates barriers to effective use. The research suggests that rich, probabilistically weighted indexes or alias lists can improve success rates by factors of three to five.
The study also highlights the precision problem, where a single word may refer to multiple objects, leading to ambiguity. However, the data show that unlimited aliasing can reduce this ambiguity by providing multiple alternative terms, allowing the system to make better guesses about the user's intent. The study concludes that extensive tables of word usage behavior are necessary to provide reasonable access for untutored users. The approach of unlimited aliasing is shown to be effective, with performance improvements depending on the domain and the amount of data collected. The study emphasizes the importance of understanding human behavior and variability in word selection to improve system design. The research also suggests that iterative design processes and the collection of multiple aliases can lead to significant improvements in system performance.The vocabulary problem in human-system communication refers to the difficulty users face in entering the correct words to access desired objects or actions in computer systems. This problem arises because people use a wide variety of words to refer to the same thing, making it challenging for systems to recognize the correct term. The study found that the probability of two people using the same term for an object is less than 0.20, indicating significant variability in word choice. This variability limits the effectiveness of design methodologies that rely on a single word for access, leading to high failure rates. An optimal strategy, unlimited aliasing, is proposed, which allows for multiple alternative terms, significantly improving success rates.
The study analyzed data from five different application domains, revealing that no single term can cover more than a small proportion of users' attempts. Designers often underestimate the problem, leading to insufficient alternate entries in databases or services, which creates barriers to effective use. The research suggests that rich, probabilistically weighted indexes or alias lists can improve success rates by factors of three to five.
The study also highlights the precision problem, where a single word may refer to multiple objects, leading to ambiguity. However, the data show that unlimited aliasing can reduce this ambiguity by providing multiple alternative terms, allowing the system to make better guesses about the user's intent. The study concludes that extensive tables of word usage behavior are necessary to provide reasonable access for untutored users. The approach of unlimited aliasing is shown to be effective, with performance improvements depending on the domain and the amount of data collected. The study emphasizes the importance of understanding human behavior and variability in word selection to improve system design. The research also suggests that iterative design processes and the collection of multiple aliases can lead to significant improvements in system performance.