[slides] The VIA Annotation Software for Images%2C Audio and Video

The VGG Image Annotator (VIA) is a lightweight, standalone, and offline annotation tool for images, audio, and video. It runs in a web browser without requiring installation or setup. VIA allows users to define spatial regions in images or video frames and temporal segments in audio or video. Annotations can be exported to JSON or CSV formats for further processing. The software supports collaborative annotation and is available under a BSD open source license, making it suitable for academic and commercial use. VIA is designed with a minimalistic user interface and has been rigorously tested by an active open source community. It supports various shapes for spatial regions and different input types for textual descriptions. The software also includes a two-stage annotation process for large image datasets, where automatic annotation is followed by manual filtering and updating. VIA's image grid view enables efficient annotation and management of large image groups. VIA also supports temporal annotation of audio and video, allowing users to define and describe speech segments or video segments. It has been widely adopted in academic and industrial sectors for tasks such as face tracking, speaker diarisation, and object detection in videos. The software is built using HTML, CSS, and JavaScript, with over 9000 lines of code. It is lightweight and can run on most modern web browsers without installation. VIA has been used over a million times and has a large open source community contributing to its development. The software continues to evolve with new features such as collaborative annotation and plugins for advanced computer vision models. It is funded by the EPSRC programme grant Seebibyte.The VGG Image Annotator (VIA) is a lightweight, standalone, and offline annotation tool for images, audio, and video. It runs in a web browser without requiring installation or setup. VIA allows users to define spatial regions in images or video frames and temporal segments in audio or video. Annotations can be exported to JSON or CSV formats for further processing. The software supports collaborative annotation and is available under a BSD open source license, making it suitable for academic and commercial use. VIA is designed with a minimalistic user interface and has been rigorously tested by an active open source community. It supports various shapes for spatial regions and different input types for textual descriptions. The software also includes a two-stage annotation process for large image datasets, where automatic annotation is followed by manual filtering and updating. VIA's image grid view enables efficient annotation and management of large image groups. VIA also supports temporal annotation of audio and video, allowing users to define and describe speech segments or video segments. It has been widely adopted in academic and industrial sectors for tasks such as face tracking, speaker diarisation, and object detection in videos. The software is built using HTML, CSS, and JavaScript, with over 9000 lines of code. It is lightweight and can run on most modern web browsers without installation. VIA has been used over a million times and has a large open source community contributing to its development. The software continues to evolve with new features such as collaborative annotation and plugins for advanced computer vision models. It is funded by the EPSRC programme grant Seebibyte.

The VIA Annotation Software for Images, Audio and Video

9 Aug 2019 | Abhishek Dutta, Andrew Zisserman