Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

2 Mar 2024 | MENGZHAO WANG†‡‡, Zhejiang University, China WEIZHI XU‡, Zilliz, USA XIAOMENG YI, Zhejiang Lab, China SONGLIN WU†, Tongji University, China ZHANGYANG PENG, Hangzhou Dianzi University, China XIANGYU KE, Zhejiang University, China YUNJUN GAO, Zhejiang University, China XIAOLIANG XU, Hangzhou Dianzi University, China RENTONG GUO, Zilliz, USA CHARLES XIE, Zilliz, USA
Starling is an I/O-efficient disk-resident graph index framework designed for high-dimensional vector similarity search (HVSS) on data segments. The framework addresses the challenges of in-memory indexes, which struggle with the substantial increase in main memory requirements as vector data scales up. Starling optimizes data layout and search strategy within the segment, incorporating an in-memory navigation graph and a reordered disk-based graph with enhanced locality. It also employs a block search strategy to minimize costly disk I/O operations during vector query execution. Through extensive experiments, Starling demonstrates superior performance, achieving 43.9x higher throughput and 98% lower query latency compared to state-of-the-art methods while maintaining high accuracy. The framework is particularly effective in handling large-scale vector datasets on a single machine, accommodating tens of millions of vectors with 128 dimensions, and offering over 0.9 average precision and top-10 recall rates. Starling's key contributions include a novel data layout that improves vertex utilization ratio and reduces search path length, effective block shuffling algorithms to enhance data locality, and a block search strategy with computation-specific optimizations.Starling is an I/O-efficient disk-resident graph index framework designed for high-dimensional vector similarity search (HVSS) on data segments. The framework addresses the challenges of in-memory indexes, which struggle with the substantial increase in main memory requirements as vector data scales up. Starling optimizes data layout and search strategy within the segment, incorporating an in-memory navigation graph and a reordered disk-based graph with enhanced locality. It also employs a block search strategy to minimize costly disk I/O operations during vector query execution. Through extensive experiments, Starling demonstrates superior performance, achieving 43.9x higher throughput and 98% lower query latency compared to state-of-the-art methods while maintaining high accuracy. The framework is particularly effective in handling large-scale vector datasets on a single machine, accommodating tens of millions of vectors with 128 dimensions, and offering over 0.9 average precision and top-10 recall rates. Starling's key contributions include a novel data layout that improves vertex utilization ratio and reduces search path length, effective block shuffling algorithms to enhance data locality, and a block search strategy with computation-specific optimizations.
Reach us at info@study.space
[slides and audio] Starling%3A An I%2FO-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment