Applications of Machine Learning in Medical Research
22 Mar 202615:0015:30
3F Banquet Hall
Dee PeiTaiwanSpeakerThe roles of Machine Learning in Medical ResearchArtificial intelligence (AI) is fundamentally transforming clinical research by democratizing access to advanced analytical tools and establishing new standards for scientific publication. This presentation outlines a comprehensive framework for integrating machine learning (ML) methodologies into clinical studies—from raw data preparation to manuscript submission—while addressing critical challenges in model development, validation, and interpretation. We emphasize that financial barriers to sophisticated analysis have largely dissolved with the advent of open-source AI platforms, enabling researchers to move beyond legacy statistical software toward reproducible, transparent ML pipelines.
The workflow begins with strategic data preparation, including appropriate imputation techniques (k-NN, MissForest, MICE) and feature standardization. For binary classification tasks—common in clinical prediction—It is advocate a rigorous protocol encompassing stratified cross-validation, hyperparameter tuning via nested CV, and explicit overfitting controls (regularization, feature limitation to ≥10 events per variable). Model selection should prioritize algorithms matching the clinical question: logistic regression for interpretability, random forests for robust baselines, and gradient boosting for maximal performance—while acknowledging trade-offs in complexity and calibration risk.
Evaluation must extend beyond conventional ROC-AUC to include precision-recall curves (especially for imbalanced data), calibration assessment (slope ~1, intercept ~0), Brier score, and decision curve analysis for clinical utility. SHAP values provide essential interpretability for "black-box" models, translating complex predictions into clinically actionable insights. Crucially, it is stressed that accuracy alone is misleading in medical contexts; minimizing false negatives often carries greater clinical consequence than overall accuracy.
Reproducibility demands fixed random seeds, complete pipeline documentation, and packaging preprocessing steps with final models for deployment. As journals increasingly expect ML-enhanced analyses, studies relying solely on traditional statistics face diminished publication prospects. It is concluded that AI integration is no longer optional but essential for contemporary clinical research seeking impact, rigor, and real-world applicability in an era where algorithmic insight complements—not replaces—clinical expertise.