Publications | Bowen LIU

2025

SIGCOMM

CEIO: A Cache-Efficient Network I/O Architecture for NIC-CPU Data Paths

Bowen Liu^* , Xinyang Huang^* , Qijing Li, Zhuobin Huang, Yijun Sun, Wenxue Li, Junxue Zhang, Ping Yin, and Kai Chen.

In 39th ACM Special Interest Group on Data Communication (SIGCOMM) , 2025

Abs

Efficient Input/Output (I/O) data path between NICs and CPUs/DRAMs is critical for supporting datacenter applications with high-performance network transmission, especially as link speed scales to 100Gbps and beyond. Traditional I/O acceleration strategies, such as Data Direct I/O (DDIO) and Remote Direct Memory Access (RDMA), perform suboptimally due to the inefficient utilization of the Last-Level Cache (LLC). This paper presents CEIO, a novel cache-efficient network I/O architecture that employs proactive rate control and elastic buffering to achieve zero LLC misses in the I/O data path while ensuring the effectiveness of DDIO and RDMA under various network conditions. We have implemented CEIO on commodity SmartNICs and incorporated it into widely-used DPDK and RDMA libraries. Experiments with well-optimized RPC framework and distributed file system under realistic workloads demonstrate that CEIO achieves up to 2.9x higher throughput and 1.9x lower P99.9 latency over prior work.
SIGCOMM

Revisiting RDMA Reliability for Lossy Fabrics

Wenxue Li , Xiangzhou Liu , Yunxuan Zhang , Zihao Wang, Wei Gu , Gaoxiong Zeng , Shoushou Ren , Xinyang Huang, Zhenghang Ren, Bowen Liu, Junxue Zhang, and Kai Chen.

In 39th ACM Special Interest Group on Data Communication (SIGCOMM) , 2025
ATC

FLB: Fine-grained Load Balancing for Lossless Datacenter Networks

Jinbin Hu, Wenxue Li , Xiangzhou Liu , Junfeng Wang, Bowen Liu, Ping Yin , Jianxin Wang , Jiawei Huang, and Kai Chen.

In 2025 USENIX Annual Technical Conference (ATC) , 2025

Abs

Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) cooperating with Priority Flow Control (PFC) has been widely deployed in production datacenters to enable low latency, lossless transmission. At the same time, modern datacenters typically offer parallel transmission paths between any pair of end-hosts, underscoring the importance of load balancing. However, the well-studied load balancing mechanisms designed for lossy datacenter networks (DCNs) are ill-suited for such lossless environments. Through extensive experiments, we are among the first to comprehensively inspect the interactions between PFC and load balancing, and uncover that existing fine-grained rerouting schemes can be counterproductive to spread the congested flows among more paths, further aggravating PFC’s head-of-line (HoL) blocking. Motivated by this, we present FLB, a Fine-grained Load Balancing scheme for lossless DCNs. At its core, FLB employs threshold-free rerouting to effectively balance traffic load and improve link utilization during normal conditions and leverages timely congested flow isolation to eliminate HoL blocking on non-congested flows when congestion occurs. We have fully implemented a FLB prototype, and our evaluation results show that FLB reduces PFC PAUSE rate by up to 96% and avoids HoL blocking, translating to up to 45% improvement in goodput over CONGA+DCQCN and 40%, 36%, 29% and 18% reduction in average flow completion time (FCT) over LetFlow+Swift, MP-RDMA, Proteus+DCQCN and LetFlow+PCN, respectively.
APNet

Cache-Aware I/O Rate Control for RDMA

Qijing Li , Xinyang Huang, Bowen Liu , Pengbo Li, Junxue Zhang, and Kai Chen.

In 9th Asia-Pacific Workshop on Networking (APNet) , 2025

Abs

Remote Direct Memory Access (RDMA) has become a cornerstone technology in modern datacenter networks due to its high throughput and extremely low latency. However, recent works have revealed that congestion arises in the "last mile" of the RDMA I/O path—–between DRAM and CPU registers–—due to inefficiencies in the memory hierarchy, where severe cache misses and memory bandwidth contention degrade performance. We identify the root cause of this I/O congestion as the speed mismatch between network ingress and CPU processing, which leads to data accumulation and, eventually, Last-Level Cache (LLC) overflow. To address this issue, we propose RhyR, a credit-based rate control mechanism that dynamically aligns network ingress speed with CPU processing speed. Our preliminary evaluation on eRPC over RDMA, a widely used RPC framework, demonstrates that RhyR effectively mitigates I/O congestion, reducing tail latency by up to 1.40x and improving throughput by up to 1.35x compared to prior work.
OSDI

Enabling Efficient GPU Communication over Multiple NICs with FuseLink

Zhenghang Ren , Yuxuan Li , Zilong Wang , Xinyang Huang, Wenxue Li, Kaiqiang Xu, Xudong Liao, Yijun Sun, Bowen Liu, Han Tian, Junxue Zhang , Mingfei Wang, Zhizhen Zhong , Guyue Liu , Ying Zhang, and Kai Chen.

In Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI) , 2025

Abs Code

Machine learning (ML) clusters stack multiple network interface cards (NICs) within each server to improve GPU communication bandwidth. However, existing systems fall short in fully utilizing NICs because of statically binding GPU traffic to NICs and PCIe bottleneck. This leads to suboptimal performance under imbalanced traffic, such as when GPUs process different LLM serving requests and training models with varying communication pattern. We propose FuseLink to enable efficient GPU communication over multiple NICs. FuseLink extends inter-server network by integrating high-speed intra-server connections, and recognizes GPUs to efficiently relay traffic to idle NICs. We implement FuseLink and integrate it into NCCL, so that ML applications can use FuseLink seamlessly without code modifications. Compared to NCCL with PXN, we verify that FuseLink can achieve 212GBps bandwidth between two inter-server GPUs and bring speedup on producing the first token in LLM model serving by 1.06-2.89, mixture-of-expert (MoE) training by up to 1.3x, and recommendation model training by up to 1.2x.
S&P

Edge Unlearning is Not" on Edge"! An Adaptive Exact Unlearning System on Resource-Constrained Devices

Xiaoyu Xia , Ziqi Wang, Ruoxi Sun, Bowen Liu, Ibrahim Khalil, and Minhui Xue.

In 46th IEEE Symposium on Security and Privacy (S&P) , 2025

Abs

The right to be forgotten mandates that machine learning models enable the erasure of a data owner’s data and information from a trained model. Removing data from the dataset alone is inadequate, as machine learning models can memorize information from the training data, increasing the potential privacy risk to users. To address this, multiple machine unlearning techniques have been developed and deployed. Among them, approximate unlearning is a popular solution, but recent studies report that its unlearning effectiveness is not fully guaranteed. Another approach, exact unlearning, tackles this issue by discarding the data and retraining the model from scratch, but at the cost of considerable computational and memory resources. However, not all devices have the capability to perform such retraining. In numerous machine learning applications, such as edge devices, Internet-of-Things (IoT), mobile devices, and satellites, resources are constrained, posing challenges for deploying existing exact unlearning methods. In this study, we propose a Constraint-aware Adaptive Exact Unlearning System at the network Edge (CAUSE), an approach to enabling exact unlearning on resource-constrained devices. Aiming to minimize the retrain overhead by storing sub-models on the resource-constrained device, CAUSE innovatively applies a Fibonacci-based replacement strategy and updates the number of shards adaptively in the user-based data partition process. To further improve the effectiveness of memory usage, CAUSE leverages the advantage of model pruning to save memory via compression with minimal accuracy sacrifice. The experimental results demonstrate that CAUSE significantly outperforms other representative systems in realizing exact unlearning on the resource-constrained device by 9.23%-80.86%, 66.21%-83.46%, and 5.26%-194.13% in terms of unlearning speed, energy consumption, and accuracy.
AAAI

PFedCS: A Personalized Federated Learning Method for Enhancing Collaboration among Similar Classifiers

Siyuan Wu, Yongzhe Jia, Bowen Liu, Haolong Xiang, Xiaolong Xu, and Wanchun Dou

In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , 2025

Abs

Personalized federated learning (PFL) has recently gained significant attention for its capability to address the poor convergence performance on highly heterogeneous data and the lack of personalized solutions of traditional federated learning (FL). Existing mainstream approaches either perform personalized aggregation based on a specific model architecture to leverage global knowledge or achieve personalization by exploiting client similarities. However, the former overlooks the discrepancies in client data distributions by indiscriminately aggregating all clients, while the latter lacks fine-grained collaboration of classifiers relevant to local tasks. In view of this challenge, we propose a Personalized Federated learning method for Enhancing Collaboration among Similar Classifiers (PFedCS), which aims at improving the client’s accuracy on local tasks. Concretely, it is achieved by leveraging awareness of the client classifier similarities to address the above problems. By iteratively measuring the distance of the classifier parameters between clients and clustering with each client as a cluster center, the central server adaptively identifies the collaborating clients with similar data distributions. In addition, a distance-constrained aggregation method is designed to generate customized collaborative classifiers to guide local training. As a result, extensive experimental evaluations conducted on three datasets demonstrate that our method achieves state-of-the-art performance.

2024

TMC

EdgeShield: Enabling collaborative DDoS mitigation at the edge

Xiaoyu Xia , Feifei Chen, Qiang He, Ruikun Luo, Bowen Liu, Caslon Chua, Rajkumar Buyya, and Yun Yang.

In IEEE Transactions on Mobile Computing (TMC) , 2024

Abs

Edge computing (EC) enables low-latency services by pushing computing resources to the network edge. Due to the geographic distribution and limited capacities of edge servers, EC systems face the challenge of edge distributed denial-of-service (DDoS) attacks. Existing systems designed to fight cloud DDoS attacks cannot mitigate edge DDoS attacks effectively due to new attack characteristics. In addition, those systems are typically activated upon detected attacks, which is not always realistic in EC systems. DDoS mitigation needs to be cohesively integrated with workload migration at the edge to ensure timely responses to edge DDoS attacks. In this paper, we present EdgeShield, a novel DDoS mitigation system that leverages edge servers’ computing resources collectively to defend against edge DDoS attacks without the need for attack detection. Aiming to maximize system throughput over time without causing significant service delays, EdgeShield monitors service delays and migrates workloads across an EC system with adaptive mitigation strategies. The experimental results show that EdgeShield significantly outperforms state-of-the-art solutions in both system throughput and service delays.
TAAS

A Consortium Blockchain-Based Edge Task Offloading Method for Connected Autonomous Vehicles

Bowen Liu , Hao Tian, Zhijie Shen, Yueyue Xu, and Wanchun Dou

In ACM Transactions on Autonomous and Adaptive Systems (TAAS) , 2024

Abs

In recent years, the proliferation of Connected Autonomous Vehicles(CAV) has revolutionizing the transportation industry. However, these vehicles often face limitations in terms of local computing resources, leading to the need for offloading interactive-intensive application tasks to servers for processing. Traditional paradigm has its limitations in meeting the demands of massive task processing. The combination of Web3.0 and edge computing offers users high-reliable, low-latency, and highly flexible services. Nevertheless, the new paradigm also presents its own challenges such as ensuring privacy data protection, and reducing the time and energy costs associated with task offloading. To tackle these challenges, a edge task offloading framework based on consortium blockchain for CAVs has been developed. Within this framework, a consortium blockchain-based interaction-intensive task offloading method, called CBIToMe, has been designed. CBIToMe specifically addresses the multi-stage nature of interactive-intensive CAV tasks and aims to minimize task completion time and offloading costs, particularly when the waiting time for interaction is uncertain. Additionally, CBIToMe effectively utilizes consortium blockchain technology to safeguard the CAV privacy data. Results from experiments conducted in various scenarios demonstrate that CBIToMe outperforms three representative methods, showcasing its superior performance.
SPE

An edge-assisted federated contrastive learning method with local intrinsic dimensionality in noisy label environment

Siyuan Wu , Guoming Zhang, Fei Dai, Bowen Liu, and Wanchun Dou

In Software: Practice and Experience (SPE) , 2024

Abs

The advent of federated learning (FL) has presented a viable solution for distributed training in edge environment, while simultaneously ensuring the preservation of privacy. In real-world scenarios, edge devices may be subject to label noise caused by environmental differences, automated weakly supervised annotation, malicious tampering, or even human error. However, the potential of the noisy samples have not been fully leveraged by prior studies on FL aimed at addressing label noise. Rather, they have primarily focused on conventional filtering or correction techniques to alleviate the impact of noisy labels. To tackle this challenge, a method, named DETECTION, is proposed in this article. It aims at effectively detecting noisy clients and mitigating the adverse impact of label noise while preserving data privacy. Specially, a confidence scoring mechanism based on local intrinsic dimensionality (LID) is investigated for distinguishing noisy clients from clean clients. Then, a loss function based on prototype contrastive learning is designed to optimize the local model. To address the varying levels of noise across clients, a LID weighted aggregation strategy (LA) is introduced. Experimental results on three datasets demonstrate the effectiveness of DETECTION in addressing the issue of label noise in FL while maintaining data privacy.

2023

SCN

Adversarial Attacks on Large Language Model Based System and Mitigating Strategies: A Case Study on ChatGPT

Bowen Liu, Boao Xiao, Xutong Jiang, Siyuan Cen, Xin He, and Wanchun Dou

In Security and Communication Networks (SCN) , 2023

Abs

Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.
CC

A Game Theory-Based COVID-19 Close Contact Detecting Method with Edge Computing Collaboration

Yue Shen, Bowen Liu, Xiaoyu Xia, Lianyong Qi, Xiaolong Xu, and Wanchun Dou

In Computer Communications (CC) , 2023

Abs

People all throughout the world have suffered from the COVID-19 pandemic. People can be infected after brief contact, so how to assess the risk of infection for everyone effectively is a tricky challenge. In view of this challenge, the combination of wireless networks with edge computing provides new possibilities for solving the COVID-19 prevention problem. With this observation, this paper proposed a game theory-based COVID-19 close contact detecting method with edge computing collaboration, named GCDM. The GCDM method is an efficient method for detecting COVID-19 close contact infection with users’ location information. With the help of edge computing’s feature, the GCDM can deal with the detecting requirements of computing and storage and relieve the user privacy problem. Technically, as the game reaches equilibrium, the GCDM method can maximize close contact detection completion rate while minimizing the latency and cost of the evaluation process in a decentralized manner. The GCDM is described in detail and the performance of GCDM is analyzed theoretically. Extensive experiments were conducted and experimental results demonstrate the superior performance of GCDM over other three representative methods through comprehensive analysis.
HPCC

A GCN-based Model for Recommendation Using Local Differential Privacy Method

Cheng Chen , Guoming Zhang, Bowen Liu, and Wanchun Dou

In 2023 IEEE International Conference on High Performance Computing & Communications (HPCC) , 2023

Abs

Differential privacy (DP) aims to protect user privacy in the age of ubiquitous data collection through inserting perturbations and achieves great success in the area of recommender systems. However, current recommendation models based on differential privacy treat all users equally, but some users want stronger privacy-preserving while some users pay less attention on privacy and want more accurate recommendations. To overcome the above challenge, this paper proposes a new security recommendation model based on graph convolutional network. It allows the users to decide the level of privacy-preserving by them-selves. The proposed model utilizes a personalized-oriented local differential privacy during collecting user data to protect user data. Then a clustering denoising module rebuilds the connections between users and items by deleting unreliable feedback. In this way, users can adjust their level of privacy-preserving locally and meanwhile, the server can still provide accurate personalized recommendations to those who prefer better recommendation performance. This work conducts comprehensive experiments on several real-world benchmark datasets and the experimental results validate the effectiveness of the proposed model.
CIMS

Collaborative aggregation and intelligent improving method in business process of industrial Internet

Bowen Liu, Xutong Jiang, and Wanchun Dou

In Computer Integrated Manufacturing Systems (CIMS) , 2023

Abs

In the business process of Industrial Internet, users and manufacturers, suppliers, products and equipment are closely connected through the Industrial Internet platform. When the industrial products purchased by users fail, they can quickly feed back the failure and request assistance from the manufacturer through the platform. However, the existing Industrial Internet platforms have not well supported the operation and maintenance process after user product failure. In order to meet this challenge, this paper proposes a collaborative aggregation method in the business process behavior of Industrial Internet, which provides the support of Industrial Internet platform for the whole process of user fault reporting, enterprise fault warehousing, and parts deployment. In addition, this paper uses genetic algorithm to implement an intelligent improvement method in business process behavior aggregation of Industrial Internet, which can effectively select suppliers for parts deployment. In this paper, a simulation experiment is carried out to test the performance of the method.

2022

UIC

An Intelligent Resource Scheduling Method With Edge Channel Deployment for BPM.

Bowen Liu, Wanchun Dou, Xiaokang Zhou , Xuyun Zhang, Lianyong Qi, Fei Dai, and Chaochao Chen.

In 19th IEEE International Conference on Ubiquitous Intelligence and Computing (UIC, Awards Outstanding Paper) , 2022

Abs

Edge computing is a novel computing paradigm that offers kinds of resources at the network edge. In edge computing, terminal users are connected to edge servers via the wireless network and there are various channels in each wireless link. These wireless channels are limited resource while different channel has different cost and service ability. The dynamic changes of users’ status make it harder to find an appropriate method to satisfy the BPM requirements of channel deployment. With this observation, it is a tricky challenge to make a trade-off between the system cost(rental price) and the service ability(number of users). In view of this challenge, an intelligent resource scheduling method, named EdgeIRS, is proposed in this paper. In the technical sense, the EdgeIRS method can accommodate most users at the edge with a minimum cost of deploying channel resources in an online way. Its performance is analyzed theoretically and the experiments verify the superiority of the method.
JPDC

A deep learning-based edge caching optimization method for cost-driven planning process over IIoT

Bowen Liu, Xutong Jiang, Xin He, Lianyong Qi, Xiaolong Xu , Xiaokang Wang, and Wanchun Dou

In Journal of Parallel and Distributed Computing (JPDC) , 2022

Abs

Edge computing has been considered as a leading paradigm to satisfy the low latency demand for some computation-intensive or data-intensive applications, especially for IIoT applications such as automatic line scheduling of the Internet of Vehicles, time-sensitive supply-chain supervision, and smart control of complex industrial processes. In the edge computing environment, app vendors prefer to cache their app data on edge servers to ensure low latency service. However, it is frequently a challenge in practice, because cache spaces on edge servers are limited and expensive. In view of this challenge, a deep learning-based edge caching optimization method, named DLECO, is proposed to reduce the cost during the cache planning process. In this paper, the edge app data caching problem is formulated as a constrained optimization problem. Then, the specific design of DLECO with a deep learning model is shown, which aims to minimize the overall system cost with lower service latency. The performance of DLECO is analyzed theoretically and experimentally with a collection of data from the real world. The experimental results show its superior performance through comparison with three representative methods.
SPE

SeeMe: An intelligent edge server selection method for location‐aware business task computing over IIoT

Wanchun Dou, Bowen Liu, Jirun Duan, Fei Dai, Lianyong Qi, and Xiaolong Xu

In Software: Practice and Experience (SPE) , 2022

Abs

In the past few years, latency-sensitive task computing over the industrial internet of things (IIoT) has played a key role in an increasing number of intelligent applications, such as intelligent self-driving vehicles and unmanned aircraft systems. The edge computing paradigm provides a basic functional infrastructure for across-domain business task computing on distributed edge servers. With this observation, a trade-off between the mobile devices and the fixed edge servers is needed to run moving task computing in a low-latency way. Given this challenge, an intelligent server selection method, named SeeMe, is proposed in this paper. Technically speaking, this method aims at minimizing the communication capacity and the transferring capacity in a multiobjective optimization way to find a low-latency edge server. The experiments and comparison analysis verify the availability of our method.
JSA

Architecture of virtual edge data center with intelligent metadata service of a geo-distributed file system

Wanchun Dou, Bowen Liu, Chuangwei Lin , Xiaokang Wang, Xutong Jiang, and Lianyong Qi

In Journal of systems architecture (JSA) , 2022

Abs

Metadata service of geo-distributed file systems plays an important role in many web-based applications, such as supply-chain management, process mining from log files, intelligent aggregation and collaboration among business processes deployed over Industrial Internet, etc. The edge computing paradigm provides an ideal infrastructure for some value-added services by using geo-distributed file systems. However, web-based applications among the geo-distributed file systems often need an intelligent metadata service mechanism for low-latency data visiting and processing. Unfortunately, it is often a challenge in practice, as the edge computing paradigm often lacks an edge data center that is indispensable for metadata storage and metadata operation. In view of this challenge, an architecture of a virtual edge data center with intelligent metadata service is investigated, in this paper, for taking full advantage of the geo-distributed file systems. Concretely, a prototype of the virtual edge data center is proposed by dynamically gathering the idle storage capability of edge servers into a virtual resource pool. Then, its intelligent metadata service mechanism is investigated The experimental results demonstrate the feasibility of the architecture and show the superior performances with comparison analysis.
AES

A Consortium Blockchain Based Reliable Task Offloading Approach in Edge Computing

Yueyue Xu, Bowen Liu , Chen Tian, Haipeng Dai, Jiaqi Zheng , Guihai Chen, and Wanchun Dou

In ACTA ELECTRONICA SINICA (AES) , 2022

Abs

With the rapid development of mobile terminals, especially the industrial Internet technique, the dense distribution of terminal devices and the limitation of wireless mobile bandwidth make it difficult for the centralized cloud resource scheduling of specific business processes to meet the low-latency and low-cost computing needs of remote terminal applications. Focusing on local servers linked to cloud data centers, edge computing provides an agile computing service model for these mobile applications. Although the service pattern of edge computing can effectively reduce the latency of mobile applications and the communication costs, task offloading between heterogeneous resources in the edge computing environment often leads to potential data security hazards and compromised quality of service. In response to the above challenges and technology development trends, we propose a consortium blockchain based reliable task offloading approach in edge computing. In this approach, we design a genetic algorithm-based offloading strategy using the consortium blockchain for identity verification and feedback of offloading results, and using task completion time, offloading cost and resource reliability as evaluation index. The results of simulation experiments show that our approach can improve task offload reliability while satisfying the latency constraint, providing an effective data security approach for mobile smart applications.
TII

CroApp: A CNN-based resource optimization approach in edge computing environment

Yongzhe Jia, Bowen Liu, Wanchun Dou, Xiaolong Xu, Xiaokang Zhou, Lianyong Qi, and Zheng Yan

In IEEE Transactions on Industrial Informatics (TII) , 2022

Abs

With the emergence of various convolutional neural network (CNN)-based applications and the rapid growth of CNN model scale, the resource-constricted end devices can hardly deploy CNN-based applications. Current work optimizes the CNN model on edge servers and deploys the optimized model on devices in an edge computing environment. However, most of them only optimize the resource consumption within or across models solely, whereas neglecting the other side. In this article, we propose a novel CNN-based resource optimization approach (CroApp) that not only optimizes the resource consumption within the CNN model but also pays attention to resource optimization across the applications. Specifically, we adopt model compression as the “inner-model” optimization method, as well as computation sharing as the “intermodel” optimization method. First, during “inner-model” optimization, the CroApp prunes unnecessary parameters within the model on edge servers to reduce the scale of the model. Then, during “intermodel” optimization, the CroApp trains a set of shareable models based on the pruned model and sends these shareable models to end devices. Finally, the CroApp adaptively adjusts the shared models to reduce resource consumption. The experimental results show that the CroApp outperforms the state-of-the-art approaches in terms of resource reduction, scalability, and application performance.