Recovering Traceability Links between Code and Documentation

Recovering Traceability Links between Code and Documentation

OCTOBER 2002 | Giuliano Antoniol, Member, IEEE, Gerardo Canfora, Member, IEEE, Gerardo Casazza, Member, IEEE, Andrea De Lucia, Member, IEEE, and Ettore Merlo, Member, IEEE
This paper presents a method based on information retrieval (IR) to recover traceability links between source code and free text documentation. The method uses identifiers extracted from source code components as queries to retrieve relevant documents. It assumes that programmers use meaningful names for program items, and that the mnemonics for identifiers capture application-domain knowledge. Two IR models, a probabilistic model and a vector space model, are applied in two case studies to trace C++ source code onto manual pages and Java code to functional requirements. The results, measured in terms of precision and recall, are satisfactory, supporting the hypothesis that IR models provide a practical solution for semiautomatically recovering traceability links. The probabilistic model ranks free-text documents based on the probability of being relevant to a query, using a language model approach. The vector space model treats documents and queries as vectors in an n-dimensional space, where n is the number of indexing features. Documents are ranked against queries by computing a distance function between the corresponding vectors. The two IR models are applied in two case studies: the first case study traces C++ classes of the LEDA library to relevant pages, and the second case study recovers traceability links between Java classes of a hotel management system and its functional requirements. The results show that both models are effective in recovering traceability links, with the vector space model achieving higher recall values with fewer documents. The method is evaluated by comparing it with the results achieved using the grep UNIX utility. The results show that the IR method outperforms grep in terms of precision and recall. The benefits of the method in helping software engineers recover traceability links between code and free text documentation are experimentally evaluated. The paper also discusses related work, including other methods for recovering traceability links between source code and documentation, and compares the effectiveness of different IR models. The results suggest that IR models are effective in recovering traceability links, and that the probabilistic model achieves higher recall values with smaller cut values, while the vector space model achieves higher recall values with a larger number of documents. The paper concludes that IR models provide a practical solution for recovering traceability links between code and documentation.This paper presents a method based on information retrieval (IR) to recover traceability links between source code and free text documentation. The method uses identifiers extracted from source code components as queries to retrieve relevant documents. It assumes that programmers use meaningful names for program items, and that the mnemonics for identifiers capture application-domain knowledge. Two IR models, a probabilistic model and a vector space model, are applied in two case studies to trace C++ source code onto manual pages and Java code to functional requirements. The results, measured in terms of precision and recall, are satisfactory, supporting the hypothesis that IR models provide a practical solution for semiautomatically recovering traceability links. The probabilistic model ranks free-text documents based on the probability of being relevant to a query, using a language model approach. The vector space model treats documents and queries as vectors in an n-dimensional space, where n is the number of indexing features. Documents are ranked against queries by computing a distance function between the corresponding vectors. The two IR models are applied in two case studies: the first case study traces C++ classes of the LEDA library to relevant pages, and the second case study recovers traceability links between Java classes of a hotel management system and its functional requirements. The results show that both models are effective in recovering traceability links, with the vector space model achieving higher recall values with fewer documents. The method is evaluated by comparing it with the results achieved using the grep UNIX utility. The results show that the IR method outperforms grep in terms of precision and recall. The benefits of the method in helping software engineers recover traceability links between code and free text documentation are experimentally evaluated. The paper also discusses related work, including other methods for recovering traceability links between source code and documentation, and compares the effectiveness of different IR models. The results suggest that IR models are effective in recovering traceability links, and that the probabilistic model achieves higher recall values with smaller cut values, while the vector space model achieves higher recall values with a larger number of documents. The paper concludes that IR models provide a practical solution for recovering traceability links between code and documentation.
Reach us at info@study.space
[slides and audio] Recovering Traceability Links between Code and Documentation