Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL

Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL

January 2024 | ZHONGHUA WANG, YIXING GUO, KAI LU, and JIGUANG WAN, Huazhong University of Science and Technology, China DAOHUI WANG, TING YAO, and HUATAO WU, Huawei Cloud, China
The article introduces Rcmp, a novel low-latency and highly scalable memory disaggregation system that combines RDMA and CXL. Memory disaggregation is a promising architecture for modern datacenters, separating compute and memory resources into independent pools connected by ultra-fast networks to improve memory utilization, reduce costs, and enable elastic scaling. However, existing RDMA-based solutions suffer from high latency and additional overheads, while CXL-based approaches have physical distance limitations and high costs. Rcmp addresses these challenges by providing global page-based memory space management, enabling fine-grained data access, designing an efficient communication mechanism to avoid blocking issues, proposing a hot-page identification and swapping strategy to reduce RDMA communications, and designing an RDMA-optimized RPC framework to accelerate transfers. The system is implemented as a user-level architecture with simple APIs and evaluated using micro-benchmarks and a key-value store with YCSB benchmarks. Results show that Rcmp achieves 5.2× lower latency and 3.8× higher throughput compared to RDMA-based systems, demonstrating good scalability with increasing node numbers.The article introduces Rcmp, a novel low-latency and highly scalable memory disaggregation system that combines RDMA and CXL. Memory disaggregation is a promising architecture for modern datacenters, separating compute and memory resources into independent pools connected by ultra-fast networks to improve memory utilization, reduce costs, and enable elastic scaling. However, existing RDMA-based solutions suffer from high latency and additional overheads, while CXL-based approaches have physical distance limitations and high costs. Rcmp addresses these challenges by providing global page-based memory space management, enabling fine-grained data access, designing an efficient communication mechanism to avoid blocking issues, proposing a hot-page identification and swapping strategy to reduce RDMA communications, and designing an RDMA-optimized RPC framework to accelerate transfers. The system is implemented as a user-level architecture with simple APIs and evaluated using micro-benchmarks and a key-value store with YCSB benchmarks. Results show that Rcmp achieves 5.2× lower latency and 3.8× higher throughput compared to RDMA-based systems, demonstrating good scalability with increasing node numbers.
Reach us at info@study.space