2016 | Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein and Yves Le Traon
The paper introduces AndroZoo, a comprehensive dataset of over three million Android applications collected from various sources, including the official Google Play app market and other alternative markets. The dataset is designed to support research in areas such as malware detection, code recommendation, API usage, and application similarity. The authors detail the challenges and solutions involved in crawling and maintaining the dataset, including issues with HTML stability, protocol changes, and market restrictions. They also provide statistics on the dataset, highlighting the distribution of app sizes and the percentage of malware detected by antivirus products. The paper emphasizes the importance of reproducible research and outlines the conditions for accessing the dataset. AndroZoo has been used to conduct various studies, demonstrating its utility in advancing research in the Android ecosystem.The paper introduces AndroZoo, a comprehensive dataset of over three million Android applications collected from various sources, including the official Google Play app market and other alternative markets. The dataset is designed to support research in areas such as malware detection, code recommendation, API usage, and application similarity. The authors detail the challenges and solutions involved in crawling and maintaining the dataset, including issues with HTML stability, protocol changes, and market restrictions. They also provide statistics on the dataset, highlighting the distribution of app sizes and the percentage of malware detected by antivirus products. The paper emphasizes the importance of reproducible research and outlines the conditions for accessing the dataset. AndroZoo has been used to conduct various studies, demonstrating its utility in advancing research in the Android ecosystem.