[slides and audio] Latent semantic analysis

This paper explores the technical considerations of using Latent Semantic Analysis (LSA) to assess student knowledge. The authors investigate four key questions: the role of technical vocabulary, optimal essay length, the suitability of the cosine measure, and the directionality of knowledge in high-dimensional space. They find that both technical and non-technical terms contribute equally to predicting student knowledge, suggesting that essays should be written in the student's own words. The optimal essay length is around 200 words, with shorter essays providing less predictive value. The cosine measure is the best single measure of semantic relatedness, but it does not indicate the directionality of knowledge (i.e., whether the essay is below or above the instructional text). To address the directionality problem, the authors propose using multidimensional scaling (MDS) to construct a Euclidean subspace that can distinguish between low- and high-knowledge individuals. Three methods for MDS are described, with Method 3 being the most effective. The paper concludes that the simple approach used in previous studies is justified, but further research is needed to address open questions, such as the role of different training corpora in LSA.This paper explores the technical considerations of using Latent Semantic Analysis (LSA) to assess student knowledge. The authors investigate four key questions: the role of technical vocabulary, optimal essay length, the suitability of the cosine measure, and the directionality of knowledge in high-dimensional space. They find that both technical and non-technical terms contribute equally to predicting student knowledge, suggesting that essays should be written in the student's own words. The optimal essay length is around 200 words, with shorter essays providing less predictive value. The cosine measure is the best single measure of semantic relatedness, but it does not indicate the directionality of knowledge (i.e., whether the essay is below or above the instructional text). To address the directionality problem, the authors propose using multidimensional scaling (MDS) to construct a Euclidean subspace that can distinguish between low- and high-knowledge individuals. Three methods for MDS are described, with Method 3 being the most effective. The paper concludes that the simple approach used in previous studies is justified, but further research is needed to address open questions, such as the role of different training corpora in LSA.

Using Latent Semantic Analysis to assess knowledge: Some technical considerations

| Bob Rehder, M. E. Schreiner, Michael B. W. Wolfe, Darrell Laham, Thomas K Landauer, and Walter Kintsch