Shaofeng Wu 伍少枫

Ph.D. Candidate at CUHK. Data Center Networking & Programmable Dataplane.


Education
  • The Chinese University of Hong Kong
    The Chinese University of Hong Kong
    Department of Computer Science and Engineering
    Ph.D. Candidate
    July 2024 - present
    Ph.D. Student
    Aug. 2022 - July 2024
  • University of Science and Technology of China
    University of Science and Technology of China
    Second Bachelor of Engineering in Computer Science and Technology
    July 2020 - June 2022
    Bachelor of Natural Science in Material Physics
    July 2016 - June 2020
Work Experience
  • Microsoft Research Asia, Beijing
    Microsoft Research Asia, Beijing
    Research Intern
    June 2024 - present
  • Microsoft Research Asia, Shanghai
    Microsoft Research Asia, Shanghai
    Research Intern
    Nov. 2021 - June 2022
Honors & Awards
  • Provost's Strategic Allocation of Centrally-Funded RPg Places
    2022-2026
  • Outstanding Graduates Award in USTC
    2020
  • Outstanding Graduation Thesis in USTC
    2020
  • Outstanding Undergraduate Scholarship in USTC
    2017, 2018, 2019
Selected Publications (view all )
Performance Prediction of On-NIC Network Functions with Multi-Resource Contention and Traffic Awareness
Performance Prediction of On-NIC Network Functions with Multi-Resource Contention and Traffic Awareness

Shaofeng Wu*, Qiang Su*, Zhixiong Niu, Hong Xu (* equal contribution)

ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2025 ConferenceNetwork

Network function (NF) offloading on SmartNICs has been widely used in modern data centers, offering benefits in host resource saving and programmability. Co-running NFs on the same SmartNICs can cause performance interference due to contention of onboard resources. To meet performance SLAs while ensuring efficient resource management, operators need mechanisms to predict NF performance under such contention. However, existing solutions lack SmartNIC-specific knowledge and exhibit limited traffic awareness, leading to poor accuracy for on-NIC NFs.
This paper proposes Yala, a novel performance predictive system for on-NIC NFs. Yala builds upon the key observation that co-located NFs contend for multiple resources, including onboard accelerators and the memory subsystem. It also facilitates traffic awareness according to the behaviors of individual resources to maintain accuracy as the external traffic attributes vary. Evaluation using Bluefield SmartNICs shows that Yala improves the prediction accuracy by 78.8% and reduces SLA violations by 92.2% compared to state-of-the-art approaches, and enables new practical usecases.

Performance Prediction of On-NIC Network Functions with Multi-Resource Contention and Traffic Awareness

Shaofeng Wu*, Qiang Su*, Zhixiong Niu, Hong Xu (* equal contribution)

ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2025 ConferenceNetwork

Network function (NF) offloading on SmartNICs has been widely used in modern data centers, offering benefits in host resource saving and programmability. Co-running NFs on the same SmartNICs can cause performance interference due to contention of onboard resources. To meet performance SLAs while ensuring efficient resource management, operators need mechanisms to predict NF performance under such contention. However, existing solutions lack SmartNIC-specific knowledge and exhibit limited traffic awareness, leading to poor accuracy for on-NIC NFs.
This paper proposes Yala, a novel performance predictive system for on-NIC NFs. Yala builds upon the key observation that co-located NFs contend for multiple resources, including onboard accelerators and the memory subsystem. It also facilitates traffic awareness according to the behaviors of individual resources to maintain accuracy as the external traffic attributes vary. Evaluation using Bluefield SmartNICs shows that Yala improves the prediction accuracy by 78.8% and reduces SLA violations by 92.2% compared to state-of-the-art approaches, and enables new practical usecases.

Meili: Enabling SmartNIC as a Service in the Cloud
Meili: Enabling SmartNIC as a Service in the Cloud

Qiang Su, Shaofeng Wu, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Zaoxing Liu, Hong Xu

2024 PreprintNetworkSystem

SmartNICs are touted as an attractive substrate for network application offloading, offering benefits in programmability, host resource saving, and energy efficiency. The current usage restricts offloading to local hosts and confines SmartNIC ownership to individual application teams, resulting in poor resource efficiency and scalability. This paper presents Meili, a novel system that realizes SmartNIC as a service to address these issues. Meili organizes heterogeneous SmartNIC resources as a pool and offers a unified one-NIC abstraction to application developers. This allows developers to focus solely on the application logic while dynamically optimizing their performance needs. Our evaluation on NVIDIA BlueField series and AMD Pensando SmartNICs demonstrates that Meili achieves scalable single-flow throughput with a maximum 8 $\mu$s latency overhead and enhances resource efficiency by 3.07$\times$ compared to standalone deployments and 1.44$\times$ compared to state-of-the-art microservice deployments.

Meili: Enabling SmartNIC as a Service in the Cloud

Qiang Su, Shaofeng Wu, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Zaoxing Liu, Hong Xu

2024 PreprintNetworkSystem

SmartNICs are touted as an attractive substrate for network application offloading, offering benefits in programmability, host resource saving, and energy efficiency. The current usage restricts offloading to local hosts and confines SmartNIC ownership to individual application teams, resulting in poor resource efficiency and scalability. This paper presents Meili, a novel system that realizes SmartNIC as a service to address these issues. Meili organizes heterogeneous SmartNIC resources as a pool and offers a unified one-NIC abstraction to application developers. This allows developers to focus solely on the application logic while dynamically optimizing their performance needs. Our evaluation on NVIDIA BlueField series and AMD Pensando SmartNICs demonstrates that Meili achieves scalable single-flow throughput with a maximum 8 $\mu$s latency overhead and enhances resource efficiency by 3.07$\times$ compared to standalone deployments and 1.44$\times$ compared to state-of-the-art microservice deployments.

All publications