Research | Efficient and Robust Distributed Machine Learning Laboratory

My major research interests include

Distributed/decentralized machine learning
Federated learning
Complex distributed system security — attack defense and fault-tolerance
Blockchains
Safety-assured and correct by construction cyber physical system design

Federated learning

Federated learning with theoretical convergence behavior characterization:

Showed the convergence of FedAvg and FedProx (two well-adopted FL algorithms) to their statistical optimizers.
Characterized the benefits for a worker to join the federated learning training when facing both covariate heterogeneity and model heterogeneity — a novel notion of federation gain is introduced.
Two important messages are: (1) The unachievability of the empirical risk minimizers does not prevent the convergence of these two algorithms to the underlying truth. (2) In the presence of heterogeneity (both covariate and model), even if only a global mode is trained, workers could still benefit from joining FL in that the prediction error of the FL global model could be much smaller than the prediction error of a model trained based on limited amount of local data. The advantage of a FL global model compared with a local model with respect to a given worker is characterized by Federation Gain — a new notion introduced in our work.

Paper: [U3]Lili Su, Jiaming Xu, and Pengkun Yang, Achieving Statistical Optimality of Federated Learning: Beyond Stationary Points, June 2021.

Personalization: Study how to quickly and efficiently adapt the FL global model to a worker’s local data via a few shot local updates to handle heterogeneous data with both covariate (input feature) heterogeneity and model heterogeneity.

Ongoing work.

Fully distributed Federated Learning: Target at relaxing the requirement of a centralized coordinator as is the case in the traditional FL architecture and study the significantly more challenging fully distributed systems. The concrete focuses are (1) heterogeneity, (2) network/computation asynchrony, (3) the potential of adopting advanced networking/caching tools to FL.

Ongoing work.

Provable resilience of FL against Byzantine attacks (which include data poisoning as a special case)

Polynomial-time algorithm which can tolerate O(1) fraction of FL workers to suffer Byzantine attacks.

Paper: [C14] Lili Su and Jiaming Xu, Securing Distributed Gradient Descent in High Dimensional Statistical Learning, ACM SIGMETRICS 2019, June 2019.

One of the first paper studies the Byzantine-resilience in FL system.

Paper: [J3] Yudong Chen, Lili Su, Jiaming Xu, Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent, ACM SIGMETRICS 2017, June 2017.

Secure multi-agent networks: Adversarial Attacks Resilience and Fault-tolerance

We would like to design and build correct by construction, scalable, and efficient multi-agent networks that are resilient and robust to highly determined adversarial attacks and system failures — including but not limited to

(i) packet-dropping links, (ii) network asynchrony, (iii) crash agent attacks, and (iv) Byzantine agent attacks (which could be caused by informationally and computationally unbounded system adversary). The following concrete problems are addressed:

Design the first computationally-efficient and attack resilient multi-agent state estimation problem without assuming neither network-wide information fusion nor fully local observability

Paper: [J8] Lili Su and Shahin Shahrampour, Finite-Time Guarantees for Byzantine-Resilient Distributed State Estimation With Noisy Measurements, IEEE TAC, Sept. 2020.

Adversary-resilient multi-agent optimization: We are among the first to study the Byzantine-resilience in distributed machine learning.

Paper: [J9] Lili Su and Nitin H. Vaidya, Byzantine-Resilient Multi-Agent Optimization, IEEE TAC, May 2021.

Paper: [C7] Lili Su and Nitin H. Vaidya, Robust Multi-agent Optimization: Coping with Byzantine Agents with Input Redundancy, SSS 2016, Nov. 2016.

Social learning problem/distributed hypothesis testing: We are the first one to study crash and Byzantine attacks and network asynchrony.

Paper: [C8] Lili Su and Nitin H.Vaidya, Asynchronous Non-Bayesian Learning in the Presence of Crash Failures, SSS 2016, Nov. 2016.

Paper: [C6] Lili Su and Nitin H.Vaidya, Non-Bayesian Learning in the Presence of Byzantine Agents, DISC2016, Sept. 2016.

Propose an efficient distributed machine learning algorithm (in the framework of optimization) that works even if the network communication links may drop the under transmission packets and no feedback packet acknowledgement mechanism is available.

Paper: [C11] Lili Su, On the Convergence Rate of Average Consensus and Distributed Optimization over Unreliable Networks, 52nd Asilomar Conference on Signals, Systems, and Computers, Oct. 2018.

Safety-assured and correct by construction cyber physical system design

Hybrid traffic which consists of both autonomous vehicles and human-driven vehicles would be the norm for decades. The existence of human components causes significant safety threats and impairs the efficiency and the comfort of driving/passenger experience of autonomous vehicles. Our goal is to significantly increase the intelligence level and adaptability of autonomous vehicles to cope with highly uncertain and heterogeneous human-driving behaviors.

Abnormality detection: We focus on the practical yet extremely challenging contexts wherein there is no prior information on the statistical patterns of the human-driven vehicle behaviors. Particularly, we even allow the human-driven vehicles to first behave normally and suddenly (with no statistical structure) switch to abnormal driving behaviors threatening neighboring vehicles’ safety.

Paper: [U2] Jiangwei Wang, Lili Su, Songyang Han, Dongjin Song, and Fei Miao, Towards Safe Autonomy in Hybrid Traffic: The Power of Information Sharing in Detecting Abnormal Human Drivers Behaviors, Submitted, 2021

Real-time context awareness via resilient and privacy-preserving vehicle-vehicle coordination.

Ongoing work.

Blockchains

Nakamoto Consensus is the key blockchain technique that underlies the distributed ledger maintainance of Bitcoins. Theoretical understanding of this technique is still limited. We dispute two common beliefs about Nakamoto consensus. We found that the choice of symmetry breaking strategies could be crucial in system correctness, and that easy puzzles themselves, with the help of simple symmetry breaking, do not lead to heavy forking.

Paper: [C16] Lili Su, Quanquan C. Liu, and Neha Narula, The Power of Random Symmetry-Breaking in Nakamoto Consensus, DISC2021