PharmaDS 2026 : Short Courses

Short Courses

Short Course 1. Information Extraction and Document Preparation in Clinical Trial Development Using RAG-based LLMs

March 23, 2026 (Full day, 9AM-4:30PM)

Instructors: Zhaohua Lu, Daiichi Sankyo; Arlina Shen, Biomedical Data Science Masters Student at Stanford University School of Medicine

Abstract:

This short course presents a modular workflow that leverages Retrieval-Augmented Generation (RAG) and local large language models (LLMs) to streamline the drafting of Statistical Analysis Plan (SAP) sections from clinical trial protocols. We will introduce a hybrid retrieval framework and a hierarchy-aware document parser that enhance contextual accuracy, traceability, and reproducibility. A local LLM is then used to generate grounded text constrained to retrieved evidence, minimizing hallucinations and preserving traceability to source documents. This approach is particularly valuable for SAP development in Phase I–IV studies and during protocol amendments.

Target Audience

This course is intended for biostatisticians, statistical programmers, and medical writers involved in clinical trial document development. It is also suitable for clinical data scientists and AI/ML practitioners interested in applying LLMs to regulatory and clinical documentation workflows.

Prerequisite Knowledge

Participants should have a basic understanding of natural language processing (NLP) concepts and experience with tools such as R or Python. Prior exposure to large language models, retrieval-based AI systems, or familiarity with documentation preparation or statistical principles in clinical trials is beneficial but not required.

Short Bios:

Zhaohua Lu is an Associate Director of Biostatistics at Daiichi Sankyo and a Ph.D.-trained statistician with over ten years of experience in statistical modeling, data science, and clinical trial design. Before joining industry, he served for five years as a Faculty Member in the Department of Biostatistics at St. Jude Children’s Research Hospital, where he worked extensively with large-scale neuroimaging, genomics, and natural language datasets. He has authored more than 40 peer-reviewed clinical papers and 30 methodological publications, advancing innovative statistical and AI/ML methods and translating them into practical biomedical and clinical applications.

Arlina Shen is a second-year Master’s student in Biomedical Data Science at Stanford University, where she specializes in statistical methodology for clinical research. Her work centers on designing and analyzing clinical trials with innovative approaches, including adaptive and rank-based methods for composite endpoints in progressive diseases such as ALS. Arlina is particularly interested in improving trial efficiency and fairness by integrating real-world evidence with randomized controlled trials, bridging the gap between traditional study designs and practical clinical applications. Her research aims to advance drug evaluation strategies and contribute to more equitable and effective therapeutic development.

Short Course 2. Trust by Design: Applied AI Governance for Pharma with Hands-On Application Exercises

March 23, 2026 (Morning, 9AM-12PM)

Instructors: Rebecca D. Jones-Taha, WaterworksAI

Abstract:

Organizations face growing expectations to demonstrate transparency, accountability, and regulatory alignment. This short course provides a practical roadmap for implementing AI governance in GxP and regulated environments. Attendees will learn how to translate governance principles such as fairness, traceability, and robustness into actionable oversight practices that satisfy regulatory and quality standards. Through guided individual exercises, participants will assess AI risk within workflows, develop governance logs and validation checklists, and outline lightweight documentation that supports audit readiness. By the end of the session, participants will have completed an AI Governance Starter Kit, including customizable templates and a personalized mini governance plan.

Short Bios:

Dr. Rebecca Taha Rebecca Taha, PhD, MBA, CEO & Founder of waterworksAI, is a strategic leader in the life sciences industry. With a deep understanding of scientific research and business operations, Dr. Taha guides waterworksAI in delivering innovative, technology-based solutions that address critical challenges in drug development. With over 20 years of experience in the industry, she has a proven track record of delivering impactful solutions.

Dr. Taha and her team develop and evaluate innovative GenAI applications in the pharmaceutical industry, focusing on ensuring that AI-powered tools deliver reliable and safe outcomes while improving the efficiency, effectiveness, and cost of drug development. Prior to founding waterworksAI, Dr. Taha served small, medium, and large pharma and biotech organizations in the strategic development and implementation of their clinical development programs. Dr. Taha received her MS and PhD from the University of Kentucky in Statistics and Gerontology, respectively, and her MBA from the Kelley School of Business.

Short course 3. Personally Identifiable Information Redaction Agent: LLM as a Judge, REGEX

March 23, 2026 (Afternoon, 1:30PM-4:30PM)

Instructors: Andrew Semmes, Moderna

Abstract:

As artificial intelligence systems increasingly process sensitive biomedical and clinical data, ensuring the effective redaction of Personally Identifiable Information (PII) is a prerequisite for compliance and patient trust. This short course introduces a hybrid redaction framework that combines deterministic Regular Expressions (REGEX) with Large Language Models (LLMs) acting as evaluative “judges.” Participants will learn how LLMs can assess and improve REGEX-based redaction outputs through adjudication loops that quantify coverage, accuracy, and false-positive rates. Attendees will leave with practical design patterns and validation strategies for implementing AI-assisted PII redaction pipelines in regulated environments.

Short Bios:

Andrew Semmes is the Associate Director of Pharmacovigilance Artificial Intelligence and PV Transformation at Moderna, where he leads AI adoption and digital transformation initiatives within Clinical Safety & Pharmacovigilance (CSPV). His work focuses on leveraging AI to enhance pharmacovigilance processes, automate workflows, and improve operational efficiency while ensuring GxP compliance. He spearheads enterprise-wide initiatives to enable Moderna’s AI infrastructure to scale effectively in highly regulated environments. Prior to joining Moderna, Andrew was a strategy and analytics consultant at Deloitte, where he helped pharmaceutical and biotech companies integrate AI into pharmacovigilance, automate adverse event triage, and optimize regulatory workflows. He led efforts to develop safety and compliance systems, streamline business processes within digital transformations, and drive data strategy initiatives that generated global cost savings. Andrew holds a Master’s in Information Science with a focus on Data Science and a Bachelor’s in Information Science with a concentration in User Experience, both from Cornell University.

Short course 4. CGM-AI: Causal Generalist Medical AI

March 23, 2026 (Full day, 9AM-4:30PM)

Instructors: Hongtu Zhu, UNC; Qiao Liu, Yale

Abstract:

The rapid evolution of flexible and reusable artificial intelligence (AI) models is transforming medical science. This short course introduces Causal Generalist Medical AI (Causal GMAI)—a paradigm that integrates causal inference with generalist AI models to enhance interpretability, robustness, and generalizability in medical decision-making. Causal GMAI employs self-supervised, semi-supervised, and supervised learning on diverse multimodal datasets—including imaging, electronic health records, clinical trials, laboratory results, genomics, knowledge graphs, and medical text—to perform a wide range of tasks with minimal task-specific supervision. By embedding causal reasoning, these models go beyond prediction to infer underlying causal relationships, improving diagnostic accuracy, treatment recommendations, and personalized medicine. The course covers key technical components such as causal discovery, counterfactual reasoning, and domain adaptation, alongside real-world applications. We will also explore challenges in regulation, validation, and dataset curation to ensure clinical reliability and ethical deployment. Designed for researchers, clinicians, data scientists, and AI practitioners, this course provides a foundation for advancing the next generation of trustworthy and interpretable medical AI.

Short Bios:

Dr. Hongtu Zhu is the Kenan Distinguished Professor of Biostatistics, Statistics, Radiology, Computer Science and Genetics at the University of North Carolina at Chapel Hill. He was a DiDi Fellow and Chief Scientist of Statistics at DiDi Chuxing between 2018 and 2020 and held the Endowed Bao-Shan Jing Professorship in Diagnostic Imaging at MD Anderson Cancer Center between 2016 and 2018. He is an internationally recognized expert in statistical learning, medical image analysis, precision medicine, biostatistics, artificial intelligence, and big data analytics. He received an established investigator award from the Cancer Prevention Research Institute of Texas in 2016, the INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice in 2019, the ICSA 2025 Distinguished Achievement Award, the IMS 2027 Medallion award and Lecture, and the COPSS 2025 Snedecor Award. He has published more than 360 papers in top journals, including Nature, Science, Cell, Nature Genetics, Nature Communication, PNAS, AOS, JASA, Biometrika, and JRSSB, as well as presenting 58+ conference papers at top conferences, including meetings for Neurips, ICLR, ICML, AAAI, and KDD. He is the coordinating editor of JASA and the editor of JASA ACS.

Dr. Qiao Liu is an Assistant Professor in the Department of Biostatistics at Yale University. His is also the core faculty member of Yale Computational Biology and Biomedical Informatics Program. His research lies at the intersection of statistics, artificial intelligence, and computational biology, where he develops practical statistical and AI-driven tools with both theoretical and applied significance. His work leverages advances in generative AI to address high-dimensional statistical challenges, including Bayesian computation and causal inference, with broad applications in single-cell genomics, multi-omics data integration, pharmacogenomics, and genomic large language models. Dr. Liu has authored over 40 publications in leading international journals and conferences. His contributions have been recognized with prestigious honors, including the NIH Pathway to Independence Award.

Short course 5. Python for Clinical Development with AI Applications

March 23, 2026 (Morning, 9AM-12PM)

Instructors: Yilong Zhang, Meta; Pingye (Eric) Zhang, Eikon Therapeutics; Yuting Xu, Merck

Abstract:

Open source programming languages are rapidly transforming drug discovery, research, and development by offering powerful capabilities for study design, data analysis, visualization, and clinical reporting. The emergence of AI tools is also creating new opportunities. This workshop introduces practical strategies for using Python to prepare tables, listings, and figures (TLFs) in clinical study reports (CSRs), with a focus on how AI can accelerate the development lifecycle.

This workshop is designed for clinical programmers, statisticians, and data scientists interested in exploring Python as an alternative approach for clinical trial analysis and reporting. Participants will gain hands-on experience with reproducible workflows and AI in the loop. By the end of the session, attendees will have a clear roadmap for starting a Python project with AI for clinical trial analysis and reporting.

The workshop is based on the open source book Python for Clinical Study Reports and Submission and is organized into three modules:

Python Environment Setup: Learn to use uv to create and manage reproducible Python projects, develop and collaborate in GitHub Codespaces, and get an overview of data processing in Python (e.g., polars and plotnine).
Clinical Reporting: Take a guided tour of using Python for TLF creation commonly used in clinical trials and CSR project management.
AI Applications: Explore how AI tools can be applied to clinical data analysis and trial design from a statistician’s perspective.

Short Bios:

Yilong Zhang, PhD, is the Manager of the Health Quantitative Science team at Meta. He previously worked as a statistician at Merck & Co., Inc. on late-stage clinical trial development. His interests include developing R and Python packages to improve clinical trial analysis, reporting, and regulatory submission. Yilong has published over 25 peer-reviewed articles and holds a PhD in Biostatistics from New York University.

Eric Zhang is the Director of Biostatistics at Eikon Therapeutics, where he supports oncology programs spanning early to late-stage clinical development. He earned his Ph.D. in Biostatistics from the University of Southern California and began his career at Merck, where he served as the lead statistician for late-stage oncology trials and played a pivotal role in the regulatory filing and approval of pembrolizumab (KEYTRUDA) for RCC and HN cancer. Following his time at Merck, Eric worked at BeiGene and Gilead Sciences. In addition to his project work, Eric is also an active participant in scientific discussions across industry and regulatory bodies, and has published extensively in both statistical and medical journals.

Yuting Xu is a statistician at Merck & Co., where she supports preclinical development programs with innovative methodologies. She received her M.S.E. in Computer Science and Ph.D. in Biostatistics from Johns Hopkins University. Her professional interests include designing quantitative methods and implementing practical solutions to tackle complex challenges across the pharmaceutical R&D landscape.