Automated Detection of Container-based Audio Forgery Using Mobile Crowdsourcing for Dataset Building

Automated Detection of Container-based Audio Forgery Using Mobile Crowdsourcing for Dataset Building

March 2024 | Homin Son, Sung Won Beak, Jae Wan Park
This research introduces a novel approach for detecting digital audio file forgeries using a cyclical system aided by mobile crowdsourcing to collect a comprehensive dataset of smartphone recordings. The study emphasizes the utility of metadata and file structure analysis, and is scalable, reflecting an entrepreneurship mindset in creating adaptable and sustainable solutions. The researchers developed a mobile web-based prototype system to collect diverse audio data and automatically detect forgeries, showcasing their initiative and innovative thinking. They also conducted scenario-based testing to validate the effectiveness of their methodology, a step that underscores the entrepreneurship value of practical validation. This approach has the potential to significantly advance digital forensic practices by enabling broader detection of manipulated audio files, opening avenues for entrepreneurship ventures in digital security. The collected dataset will be made publicly available and serve as a valuable resource for future forensic investigations, encouraging entrepreneurship collaboration and knowledge sharing. The paper lays the groundwork for future research in expanding the scope of digital forensics, fostering technological innovation, and enhancing participatory models for data collection, all of which are essential elements in entrepreneurship ecosystems. The study highlights the critical need for developing practical and scalable solutions in digital forensics, demonstrating the effectiveness of analyzing metadata and file structures. Creating a crowdsourced, diverse audio dataset not only facilitates the identification of forgeries but also serves as a valuable resource for ongoing and future investigations. The mobile web-based prototype system developed in this study is built on self-reliance, immediacy, and autonomy. Our findings underscore the significance of a self-sustaining, cyclical system in enhancing participant engagement and ensuring its continuous growth. Furthermore, the false positive rate, which refers to incorrectly identifying genuine files as forgeries, is zero. However, because the potential for metadata and file structure manipulation exists, this system has the limitation of false negatives that a manipulated file is incorrectly identified as authentic. Thus, some manipulated voice files may not be detected by this system. Future works include expanding the dataset to encompass a wider variety of smartphone models, recording applications, and recording settings, integrating with content-based forgery detection techniques based on machine learning, advocating for the public sharing of this dataset, conducting real field testing and external validation with forensic institutions to enhance the credibility and reliability, enhancing the UI/UX of the prototype system to expand participation and the quality of data collected, investigating the interoperability of systems with various technologies to increase their applicability, and exploring the social impact and implications of audio file counterfeiting, including ways to be protected from misuse of this audio file forgery detection technology, through collaboration with experts in linguistics, psychology, and law. This research is expected to significantly contribute to digital forensics, especially by proposing a cyclical approach to collect diverse audio data and automatically detect forgeries using container-based methods. Additionally, the dataset collected will be made publicly available, serving as a valuable resource for future forensic investigations.This research introduces a novel approach for detecting digital audio file forgeries using a cyclical system aided by mobile crowdsourcing to collect a comprehensive dataset of smartphone recordings. The study emphasizes the utility of metadata and file structure analysis, and is scalable, reflecting an entrepreneurship mindset in creating adaptable and sustainable solutions. The researchers developed a mobile web-based prototype system to collect diverse audio data and automatically detect forgeries, showcasing their initiative and innovative thinking. They also conducted scenario-based testing to validate the effectiveness of their methodology, a step that underscores the entrepreneurship value of practical validation. This approach has the potential to significantly advance digital forensic practices by enabling broader detection of manipulated audio files, opening avenues for entrepreneurship ventures in digital security. The collected dataset will be made publicly available and serve as a valuable resource for future forensic investigations, encouraging entrepreneurship collaboration and knowledge sharing. The paper lays the groundwork for future research in expanding the scope of digital forensics, fostering technological innovation, and enhancing participatory models for data collection, all of which are essential elements in entrepreneurship ecosystems. The study highlights the critical need for developing practical and scalable solutions in digital forensics, demonstrating the effectiveness of analyzing metadata and file structures. Creating a crowdsourced, diverse audio dataset not only facilitates the identification of forgeries but also serves as a valuable resource for ongoing and future investigations. The mobile web-based prototype system developed in this study is built on self-reliance, immediacy, and autonomy. Our findings underscore the significance of a self-sustaining, cyclical system in enhancing participant engagement and ensuring its continuous growth. Furthermore, the false positive rate, which refers to incorrectly identifying genuine files as forgeries, is zero. However, because the potential for metadata and file structure manipulation exists, this system has the limitation of false negatives that a manipulated file is incorrectly identified as authentic. Thus, some manipulated voice files may not be detected by this system. Future works include expanding the dataset to encompass a wider variety of smartphone models, recording applications, and recording settings, integrating with content-based forgery detection techniques based on machine learning, advocating for the public sharing of this dataset, conducting real field testing and external validation with forensic institutions to enhance the credibility and reliability, enhancing the UI/UX of the prototype system to expand participation and the quality of data collected, investigating the interoperability of systems with various technologies to increase their applicability, and exploring the social impact and implications of audio file counterfeiting, including ways to be protected from misuse of this audio file forgery detection technology, through collaboration with experts in linguistics, psychology, and law. This research is expected to significantly contribute to digital forensics, especially by proposing a cyclical approach to collect diverse audio data and automatically detect forgeries using container-based methods. Additionally, the dataset collected will be made publicly available, serving as a valuable resource for future forensic investigations.
Reach us at info@study.space