With the increasing availability of electronic health records (EHR) data, it is important to effectively integrate evidence from multiple data sources to enable reproducible scientific discovery. However, we are still facing practical challenges in data integration, such as protection of data privacy, the high dimensionality of features, and heterogeneity across different datasets.
Aim to facilitate efficient multi-institutional data analysis without sharing IPD, we developed a toolbox of Privacy-preserving Distributed Algorithms (PDA in figure on the left) that conduct distributed learning and inference for various models, such as logistic regression, Cox model, Poisson model, and etc. Our algorithms do not require iterative communication across sites and are able to account for heterogeneity across different hospitals. In addition, PDA outperforms meta-analysis methods in many settings such as pharmacovigilance applications. The validity and efficiency of PDA are also demonstrated with real-world use cases in Penn Medicine Biobank, Observational Health Data Sciences and Informatics (OHDSI), a Pediatric Learning Health System (PEDSnet), and OneFlorida.