BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching

BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching

April 14-20, 2024 | Ling Jiang, Junwen An, Huihui Huang, Qiyi Tang, Sen Nie, Shi Wu, and Yuqun Zhang
BinaryAI is a novel binary-to-source software composition analysis (SCA) technique that improves the accuracy of binary source code matching and downstream SCA tasks. The method introduces a two-phase binary source code matching approach, combining syntactic and semantic features to identify reused third-party libraries (TPLs) in binary files. The first phase trains a transformer-based model to generate function-level embeddings, enabling the identification of similar source functions. The second phase leverages link-time locality and function call graphs to enhance the accuracy of function matching. BinaryAI outperforms existing SCA techniques, achieving higher precision and recall in TPL detection. The method is evaluated on a large-scale dataset, demonstrating superior performance in binary source code matching and SCA tasks. The results show that BinaryAI significantly improves the accuracy of TPL detection compared to state-of-the-art techniques, with a precision of 85.84% and recall of 64.98%. The method is implemented using a transformer-based model and a vector database for efficient retrieval of similar source functions. The results indicate that BinaryAI is effective in identifying TPLs in binary files and improving the accuracy of SCA tasks. The method is evaluated on a large-scale dataset, demonstrating its effectiveness in binary source code matching and SCA tasks. The results show that BinaryAI significantly improves the accuracy of TPL detection compared to state-of-the-art techniques, with a precision of 85.84% and recall of 64.98%. The method is implemented using a transformer-based model and a vector database for efficient retrieval of similar source functions. The results indicate that BinaryAI is effective in identifying TPLs in binary files and improving the accuracy of SCA tasks.BinaryAI is a novel binary-to-source software composition analysis (SCA) technique that improves the accuracy of binary source code matching and downstream SCA tasks. The method introduces a two-phase binary source code matching approach, combining syntactic and semantic features to identify reused third-party libraries (TPLs) in binary files. The first phase trains a transformer-based model to generate function-level embeddings, enabling the identification of similar source functions. The second phase leverages link-time locality and function call graphs to enhance the accuracy of function matching. BinaryAI outperforms existing SCA techniques, achieving higher precision and recall in TPL detection. The method is evaluated on a large-scale dataset, demonstrating superior performance in binary source code matching and SCA tasks. The results show that BinaryAI significantly improves the accuracy of TPL detection compared to state-of-the-art techniques, with a precision of 85.84% and recall of 64.98%. The method is implemented using a transformer-based model and a vector database for efficient retrieval of similar source functions. The results indicate that BinaryAI is effective in identifying TPLs in binary files and improving the accuracy of SCA tasks. The method is evaluated on a large-scale dataset, demonstrating its effectiveness in binary source code matching and SCA tasks. The results show that BinaryAI significantly improves the accuracy of TPL detection compared to state-of-the-art techniques, with a precision of 85.84% and recall of 64.98%. The method is implemented using a transformer-based model and a vector database for efficient retrieval of similar source functions. The results indicate that BinaryAI is effective in identifying TPLs in binary files and improving the accuracy of SCA tasks.
Reach us at info@study.space
[slides] BinaryAI%3A Binary Software Composition Analysis via Intelligent Binary Source Code Matching | StudySpace