June 2024 | YONGYE SU, YINQI SUN, MINJIA ZHANG, JIANGGUO WANG
Vexless is a serverless vector database system optimized for cloud functions, designed to efficiently handle bursty and sparse workloads. The system addresses three main challenges: sharding, communication overhead, and cold-start time. Vexless introduces a global coordinator (orchestrator) to assign workloads to cloud function instances based on available resources, uses stateful cloud functions to reduce communication overhead, and implements a workload-aware strategy to minimize cold-start times. Implemented using Azure Functions, Vexless achieves significant cost savings compared to cloud VM instances while maintaining or improving query performance and accuracy. The system uses constrained K-Means clustering for sharding, a communication mechanism with Azure Durable Functions and Azure Queue Storage for efficient message passing, and an adaptive scheduling algorithm to reduce cold-start times. Vexless is open-sourced and has been evaluated on various datasets and workloads, demonstrating its effectiveness in reducing costs and improving performance for vector similarity search. The system is designed to be generalizable to other cloud providers with similar services.Vexless is a serverless vector database system optimized for cloud functions, designed to efficiently handle bursty and sparse workloads. The system addresses three main challenges: sharding, communication overhead, and cold-start time. Vexless introduces a global coordinator (orchestrator) to assign workloads to cloud function instances based on available resources, uses stateful cloud functions to reduce communication overhead, and implements a workload-aware strategy to minimize cold-start times. Implemented using Azure Functions, Vexless achieves significant cost savings compared to cloud VM instances while maintaining or improving query performance and accuracy. The system uses constrained K-Means clustering for sharding, a communication mechanism with Azure Durable Functions and Azure Queue Storage for efficient message passing, and an adaptive scheduling algorithm to reduce cold-start times. Vexless is open-sourced and has been evaluated on various datasets and workloads, demonstrating its effectiveness in reducing costs and improving performance for vector similarity search. The system is designed to be generalizable to other cloud providers with similar services.