GAN-based Synthetic Heart ECG Data

University of Colorado
AI Health World Summit 2025

Introduction

Machine learning models have shown remarkable capabilities, often outperforming medical experts in various tasks. However, to reach this level of performance, they typically require large, high-quality datasets. Unfortunately, obtaining such datasets can be challenging due to privacy concerns, regulatory restrictions, and the timeconsuming process of expert annotation. This is where synthetic data comes into play. By simulating realistic and diverse cases, synthetic data helps fill gaps in underrepresented conditions and demographics, ultimately enhancing the robustness and generalization of models while protecting patient privacy.

Motivation

  • Privacy Laws Restrict Collaboration ”GDPR, HIPAA, and other regulations block cross-institutional medical data sharing, creating fragmented, siloed datasets.”
  • Re-identification Risks ”Even anonymized data can be reverse-engineered, exposing patient identities and violating compliance.”
  • Biased, Non-Generalizable Models ”Models trained on localized data might fail for underrepresented demographics (e.g., ethnic minorities, rare arrhythmia).”
  • High Costs of Compliance ”Legal and technical safeguards for sharing real data strain healthcare budgets and slow innovation.”
Synthetic data solves this issue

Method

For this project, cGAN model was used. It is pragmatic choice to generate labelled dataset
cGAN architecture To prevent deadlock between Generator and Discriminator, the discriminator training loop is increased after 1000th epoch. The challenge is balancing Generator and Discriminator. If it is too strong or to weak, it wouldn't be able to provide useful feedback for the generator.

Conclusion

While the results show promise, further refinements are needed. It is likely that more training is needed, it is also possible that halfway through the training the discriminator become too strong. The Savitzky-Golay filter could aid in denoising, but preprocessing challenges limit its effectiveness. Preprocessing the training data to sinus rhythm might be beneficial.

I couldn't explain how this improved model performance, so it was omitted.

Poster

BibTeX

BibTex Code Here