Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL

Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL

January 2024 | ZHONGHUA WANG, YIXING GUO, KAI LU, and JIGUANG WAN, HuaZhong University of Science and Technology, China; DAOHUI WANG, TING YAO, and HUATAO WU, Huawei Cloud, China
Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL Memory disaggregation is a promising architecture for modern datacenters that separates compute and memory resources into independent pools connected by ultra-fast networks, improving memory utilization, reducing costs, and enabling elastic scaling. However, existing RDMA-based solutions suffer from high latency and additional overheads. Emerging cache-coherent interconnects like CXL offer opportunities to reconstruct high-performance memory disaggregation, but existing CXL-based approaches have physical distance limitations. Rcmp is a novel low-latency and highly scalable memory disaggregation system combining RDMA and CXL. It improves RDMA performance via CXL and leverages RDMA to overcome CXL's distance limitation. Rcmp addresses challenges of RDMA and CXL mismatch through global memory management, efficient communication, hot-page identification, and an RDMA-optimized RPC framework. A prototype of Rcmp is implemented and evaluated using micro-benchmarks and YCSB benchmarks, showing 5.2× lower latency and 3.8× higher throughput than RDMA-based systems. Rcmp scales well with increasing nodes. Rcmp combines RDMA and CXL to overcome their drawbacks, achieving high performance and scalability. It provides global memory allocation, efficient communication, hot-page swapping, and a high-performance RDMA-aware RPC framework. Rcmp is a user-level architecture with 6,483 lines of C++ code, offering simple APIs for memory pool services and integrating with FUSE for in-memory file systems. Evaluation results show Rcmp achieves high and stable performance across workloads, with latency reduced by 3-8× and throughput improved by 2-4× compared to RDMA-based systems. Rcmp has good scalability with increasing nodes. Rcmp addresses the challenges of granularity, communication, and performance mismatch between RDMA and CXL. It uses a hybrid architecture combining CXL and RDMA, with a small CXL-based memory pool in a rack and RDMA connecting racks. Rcmp reduces remote-rack access by identifying and swapping hot pages, caching remote rack data in CXL memory, and using a high-performance RDMA RPC framework. Rcmp's design includes global memory management, efficient communication, hot-page swapping, and an RDMA-optimized RPC framework. Rcmp is evaluated using micro-benchmarks and YCSB benchmarks, showing significant performance improvements over existing systems. Rcmp's open source code and experimental datasets are available at https://github.com/PDS-Lab/Rcmp.Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL Memory disaggregation is a promising architecture for modern datacenters that separates compute and memory resources into independent pools connected by ultra-fast networks, improving memory utilization, reducing costs, and enabling elastic scaling. However, existing RDMA-based solutions suffer from high latency and additional overheads. Emerging cache-coherent interconnects like CXL offer opportunities to reconstruct high-performance memory disaggregation, but existing CXL-based approaches have physical distance limitations. Rcmp is a novel low-latency and highly scalable memory disaggregation system combining RDMA and CXL. It improves RDMA performance via CXL and leverages RDMA to overcome CXL's distance limitation. Rcmp addresses challenges of RDMA and CXL mismatch through global memory management, efficient communication, hot-page identification, and an RDMA-optimized RPC framework. A prototype of Rcmp is implemented and evaluated using micro-benchmarks and YCSB benchmarks, showing 5.2× lower latency and 3.8× higher throughput than RDMA-based systems. Rcmp scales well with increasing nodes. Rcmp combines RDMA and CXL to overcome their drawbacks, achieving high performance and scalability. It provides global memory allocation, efficient communication, hot-page swapping, and a high-performance RDMA-aware RPC framework. Rcmp is a user-level architecture with 6,483 lines of C++ code, offering simple APIs for memory pool services and integrating with FUSE for in-memory file systems. Evaluation results show Rcmp achieves high and stable performance across workloads, with latency reduced by 3-8× and throughput improved by 2-4× compared to RDMA-based systems. Rcmp has good scalability with increasing nodes. Rcmp addresses the challenges of granularity, communication, and performance mismatch between RDMA and CXL. It uses a hybrid architecture combining CXL and RDMA, with a small CXL-based memory pool in a rack and RDMA connecting racks. Rcmp reduces remote-rack access by identifying and swapping hot pages, caching remote rack data in CXL memory, and using a high-performance RDMA RPC framework. Rcmp's design includes global memory management, efficient communication, hot-page swapping, and an RDMA-optimized RPC framework. Rcmp is evaluated using micro-benchmarks and YCSB benchmarks, showing significant performance improvements over existing systems. Rcmp's open source code and experimental datasets are available at https://github.com/PDS-Lab/Rcmp.
Reach us at info@study.space
Understanding Rcmp%3A Reconstructing RDMA-Based Memory Disaggregation via CXL