中国电信云计算研究院

China Telecom Cloud Computing Research Institute Publishes Two Papers at ICPP 2025, Driving Key Technological Advances in Intelligent Cloud Infrastructure

2025-06-13

Recently, the China Telecom Cloud Computing Research Institute has achieved research breakthroughs in the fields of cloud storage and data center resource management. The research team led by Tang Wenda, Wang Yiduo, Wang Yanwen, and Wu Jie completed the paper "Leave No One Behind: Fair and Efficient Tiered Memory Management for Multi-Applications," and the team led by Wang Yiduo, Tang Wenda, Meng Linghang, Li Liang, and Wu Jie completed the paper "Origami: Efficient ML-Driven Metadata Load Balancing for Distributed File Systems." Both papers have been accepted by ICPP 2025, the oldest and most prestigious international conference in the field of parallel processing. These two studies focus respectively on the challenges of tiered memory management and metadata management, providing innovative solutions for cloud infrastructure optimization and demonstrating China Telecom’s cutting-edge technical expertise in the core technologies of cloud computing.

In data center memory resource optimization, Tang Wenda et al. proposed a workload-aware tiered memory management framework to address the issue of memory resource contention among different applications in multi-tenant environments. They have innovatively developed a user-space memory page migration and fast memory capacity fair allocation mechanism based on workload characteristics, achieving comprehensive innovation in memory resource management, page migration strategy design, page table structure optimization, and page migration mechanism optimization. This effectively solves the “cold page dilemma” present in existing solutions—where, due to neglecting differences in application characteristics, critical pages of key workloads are mistakenly identified as “cold” because of relatively low access frequency in multi-tenant co-location scenarios, resulting in their migration to slow memory and causing performance degradation in critical services.

Figure 1: The Overall Architecture of Proposed Tiered Memory Management for Multi-Tenant Workload Co-location.

The framework utilizes PEBS (Performance Event-Based Sampling) technology to collect and analyze memory page access characteristics of workloads in real time. Combined with the eBPF mechanism, it flexibly adjusts memory page migration strategies to adapt to the memory access patterns of different workloads. In particular, for QoS assurance, it employs the Fast Tier Hit Ratio to measure the effectiveness of tiering and dynamically allocates fast and slow memory capacity in real time, ensuring both the memory access efficiency of high-priority applications and the fair allocation of resources. This technological breakthrough provides a new approach for resource isolation and performance optimization in cloud computing and big data scenarios, and is expected to achieve large-scale application in cloud platforms, new computing power networks, and other settings.

Wang Yiduo et al. proposed Origami, a ML-driven metadata load balancing framework, to address the efficiency bottleneck of managing massive metadata in cloud-based distributed storage systems. This work overcomes the limitation of traditional methods that focus solely on even partitioning of metadata. It takes minimizing user job completion time as key objective, considering both metadata locality and hierarchical structure while balancing. The framework includes real-time data collection, near-optimal decision computation, efficient model training, and model validation, ultimately achieving a good trade-off between the benefits of load balancing and additional overhead.

Figure 2: The Overall Architecture of the Origami Model Training Framework for Metadata Load Balancing.

Experimental data show that the Origami framework effectively solves the access hotspot problem in distributed file systems caused by hierarchical namespaces and dynamic loads, significantly reducing user end-to-end job completion time compared to traditional solutions. This achievement deeply integrates intelligent technology with cloud storage systems and can be widely applied in cloud storage, data centers, and ubiquitous storage scenarios in the future, providing key technical support for building low-latency, high-concurrency storage architectures.

As a top international conference in parallel and distributed computing (CCF recommended Category B), ACM ICPP (International Conference on Parallel Processing) requires that accepted papers undergo rigorous review by internationally recognized experts. This time, China Telecom Cloud Computing Research Institute has achieved two papers accepted simultaneously, fully demonstrating its research depth in areas such as system architecture and resource management. ICPP 2025 will be held from September 8 to 11 in San Diego, USA. At that time, our research team will join scholars from around the world to discuss the technological frontiers of intelligent cloud infrastructure.