OCTOBER 2002 | Giuliano Antoniol, Member, IEEE, Gerardo Canfora, Member, IEEE, Gerardo Casazza, Member, IEEE, Andrea De Lucia, Member, IEEE, and Ettore Merlo, Member, IEEE
The paper proposes a method to recover traceability links between source code and free text documentation, leveraging Information Retrieval (IR) techniques. The authors argue that programmers use meaningful names for program items, which can be used to associate high-level concepts with program concepts. They apply both a probabilistic and a vector space IR model in two case studies: mapping C++ classes to manual pages and Java classes to functional requirements. The results show satisfactory precision and recall, supporting the effectiveness of IR models in recovering traceability links. The method is evaluated against the grep UNIX utility and a preliminary experiment involving students demonstrates its practical benefits. The paper also discusses the trade-offs between fixed and variable cut levels in document retrieval and related work in traceability link recovery and software engineering.The paper proposes a method to recover traceability links between source code and free text documentation, leveraging Information Retrieval (IR) techniques. The authors argue that programmers use meaningful names for program items, which can be used to associate high-level concepts with program concepts. They apply both a probabilistic and a vector space IR model in two case studies: mapping C++ classes to manual pages and Java classes to functional requirements. The results show satisfactory precision and recall, supporting the effectiveness of IR models in recovering traceability links. The method is evaluated against the grep UNIX utility and a preliminary experiment involving students demonstrates its practical benefits. The paper also discusses the trade-offs between fixed and variable cut levels in document retrieval and related work in traceability link recovery and software engineering.