19 Jul 2024 | Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kaijing Zhou, Zongyuan Ge
OphNet is a large-scale, expert-annotated video benchmark designed for understanding ophthalmic surgical workflows. It addresses the limitations of existing datasets, which often suffer from small scale, lack of diversity in surgery and phase categories, and coarse-grained annotations. OphNet features 2,278 surgical videos spanning 66 types of cataract, glaucoma, and corneal surgeries, with detailed annotations for 102 unique surgical phases and 150 fine-grained operations. The dataset includes sequential and hierarchical annotations, enabling comprehensive understanding and improved interpretability. Time-localized annotations facilitate temporal localization and prediction tasks within surgical workflows. OphNet is about 20 times larger than the largest existing surgical workflow analysis benchmark, with approximately 285 hours of surgical videos. The dataset and code are available at: https://minghu0830.github.io/OphNet-benchmark/. The paper also discusses the construction of the dataset, including data collection and preprocessing, and presents experimental results on tasks such as primary surgery presence recognition, phase and operation recognition, phase localization, and phase anticipation. The findings contribute to the broader understanding of video understanding within sequences and fine granularity in medical contexts.OphNet is a large-scale, expert-annotated video benchmark designed for understanding ophthalmic surgical workflows. It addresses the limitations of existing datasets, which often suffer from small scale, lack of diversity in surgery and phase categories, and coarse-grained annotations. OphNet features 2,278 surgical videos spanning 66 types of cataract, glaucoma, and corneal surgeries, with detailed annotations for 102 unique surgical phases and 150 fine-grained operations. The dataset includes sequential and hierarchical annotations, enabling comprehensive understanding and improved interpretability. Time-localized annotations facilitate temporal localization and prediction tasks within surgical workflows. OphNet is about 20 times larger than the largest existing surgical workflow analysis benchmark, with approximately 285 hours of surgical videos. The dataset and code are available at: https://minghu0830.github.io/OphNet-benchmark/. The paper also discusses the construction of the dataset, including data collection and preprocessing, and presents experimental results on tasks such as primary surgery presence recognition, phase and operation recognition, phase localization, and phase anticipation. The findings contribute to the broader understanding of video understanding within sequences and fine granularity in medical contexts.