19 Jul 2024 | Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kaijing Zhou, Zongyuan Ge
OphNet is a large-scale, expert-annotated video benchmark for ophthalmic surgical workflow understanding. It contains 2,278 surgical videos (284.8 hours) covering 66 types of ophthalmic surgeries, including 13 cataract, 14 glaucoma, and 39 corneal surgeries. The dataset includes detailed annotations for 102 surgical phases and 150 operations, with sequential and hierarchical annotations for each surgery, phase, and operation. OphNet also features time-localized annotations, enabling temporal localization and prediction tasks within surgical workflows. It is approximately 20 times larger than the largest existing surgical workflow analysis benchmark. The dataset is annotated by ten experienced ophthalmologists and five individuals with ophthalmic experience, ensuring high-quality and professional annotations. OphNet provides a comprehensive set of annotations for surgical workflow understanding, including phase and operation recognition, phase localization, and phase anticipation. The dataset is used to evaluate the performance of various models, including I3D, SlowFast, X3D, MViT V2, X-CLIP, and ViFi-CLIP. The results show that ViFi-CLIP performs best in phase and operation recognition, with high accuracy in cataract and corneal surgeries. OphNet addresses the limitations of existing surgical workflow analysis benchmarks, such as small scale, limited categories of surgeries and phases, and coarse-grained annotations. The dataset is valuable for developing intelligent systems for surgical workflow analysis and has applications in surgical documentation, education, and training.OphNet is a large-scale, expert-annotated video benchmark for ophthalmic surgical workflow understanding. It contains 2,278 surgical videos (284.8 hours) covering 66 types of ophthalmic surgeries, including 13 cataract, 14 glaucoma, and 39 corneal surgeries. The dataset includes detailed annotations for 102 surgical phases and 150 operations, with sequential and hierarchical annotations for each surgery, phase, and operation. OphNet also features time-localized annotations, enabling temporal localization and prediction tasks within surgical workflows. It is approximately 20 times larger than the largest existing surgical workflow analysis benchmark. The dataset is annotated by ten experienced ophthalmologists and five individuals with ophthalmic experience, ensuring high-quality and professional annotations. OphNet provides a comprehensive set of annotations for surgical workflow understanding, including phase and operation recognition, phase localization, and phase anticipation. The dataset is used to evaluate the performance of various models, including I3D, SlowFast, X3D, MViT V2, X-CLIP, and ViFi-CLIP. The results show that ViFi-CLIP performs best in phase and operation recognition, with high accuracy in cataract and corneal surgeries. OphNet addresses the limitations of existing surgical workflow analysis benchmarks, such as small scale, limited categories of surgeries and phases, and coarse-grained annotations. The dataset is valuable for developing intelligent systems for surgical workflow analysis and has applications in surgical documentation, education, and training.