One-shot Distributed Clustering
报告人:王昱栋 (University of Delaware)
时间:2026-03-12 14:00-15:00
地点:四元厅
Abstract: Probabilistic clustering based on mixture models is widely used in biomedical studies. With the increasing availability of multi-site data, joint analyses across multiple sites offer opportunities to improve the efficiency and generalizability of clustering methods. However, data-sharing restrictions and between-site heterogeneity present significant challenges for multi-site probabilistic clustering. To address between-site heterogeneity, we propose a heterogeneous mixture model in which different sites share the same mixture components but have site-specific mixing proportions. This structure accounts for site-level heterogeneities while leveraging shared information across sites. Due to data-sharing restrictions, distributed inference of this heterogeneous mixture model is challenging. One potential approach is the use of distributed Expectation-Maximization (EM) algorithms for federated maximum likelihood estimation (MLE). Nevertheless, the existing distributed EM algorithm requires cross-site communication in every iteration, leading to considerable communication overhead in terms of personnel and time. In this talk, we will introduce two novel one-shot distributed learning algorithms, the one-shot distributed EM algorithm and the one-shot distributed majorization-minimization (MM) algorithm, for communication-efficient inference of the heterogeneous mixture model. We provide theoretical guarantee that our approaches achieve the full-sample efficiency via only one single round of cross-site communication. The effectiveness of the approaches are demonstrated through an application to multi-site electronic health records (EHR) data, illustrating how they enable efficient probabilistic clustering and facilitate targeted interventions.
Bio: Yudong Wang received his PhD in 2023 from the National University of Singapore under the supervision of Professor Zhisheng Ye. He was a postdoctoral research fellow in the Department of Biostatistics, Epidemiology, and Informatics at the University of Pennsylvania from 2023 to 2025, where he worked with Professor Yong Chen. He will join the University of Delaware as a postdoctoral researcher in 2026. His research interests include distributed statistical inference, transfer learning, and semiparametric methods.
