Chat with Jeff Leek

Biostatistician and Data Science Educator

About Jeff Leek

In 2013, while leading the Data Science Specialization on Coursera, Jeff Leek co-authored one of the first widely adopted open-source curricula that taught statistical inference through real clinical trial data, not textbook examples, forcing students to confront missingness, batch effects, and confounding as lived problems. His lab at Johns Hopkins pioneered methods for quantifying 'data provenance bias' in electronic health records, revealing how coding practices, not biology, often drive apparent treatment effects in observational studies. He doesn’t believe statistics is about formulas; it’s about forensic listening, reading what the data *refuses* to say, especially when clinicians and regulators demand certainty. His book 'The Elements of Data Analytic Style' treats p-values like punctuation: useful only when placed with intention, never as proof. You won’t find Bayesian priors explained via coin flips here, instead, he walks you through how a single misclassified ICD-10 code reshaped a decade of sepsis mortality estimates.

Why Chat with Jeff Leek?

Jeff Leek is one of the most influential figures in Science & Technology. Through AI conversation, you can explore their ideas, ask questions you've always wondered about, and gain unique perspectives on biostatistician and data science educator topics. It's like having a personal conversation with one of the greats, powered by AI and completely free.

Start Your Conversation with Jeff Leek

Ask questions, explore ideas, and learn something new. Free, no signup required.

Chat with Jeff Leek Now

Conversation Starters

Not sure where to begin? Try asking Jeff Leek:

  • “How do you spot batch effects in ICU vitals data before running a survival model?”
  • “What’s the most common statistical mistake you see in published oncology RCTs?”
  • “Can you walk me through re-analyzing the SPRINT trial using your 'provenance-aware' framework?”
  • “How would you teach confidence intervals to a hospital quality improvement team?”

Frequently Asked Questions

What is Jeff Leek's 'data provenance bias' framework?
It's a diagnostic workflow for identifying how data collection artifacts—like EHR template changes, coder training shifts, or device calibration drift—systematically distort associations in observational health data. Leek formalized it in a 2019 JAMA Internal Medicine methods paper, showing how 'sepsis incidence' spikes correlated tightly with ICD-10 implementation dates, not patient physiology. The framework uses temporal metadata audits and sensitivity analyses anchored to administrative policy timelines.
Did Jeff Leek contribute to the Bioconductor project?
Yes—he co-developed the 'sva' (surrogate variable analysis) package in 2007, now cited over 4,800 times. It was designed specifically to remove hidden technical variation from high-throughput genomics datasets, enabling more accurate detection of biological signals in cancer transcriptomics. Unlike generic PCA, sva models latent variables directly from control probe data, making it foundational for TCGA analyses.
What makes Leek's approach to teaching p-values different from traditional stats pedagogy?
He rejects the 'threshold thinking' model entirely. In his courses, p-values are taught as continuous measures of evidence *against a specific null model*, not binary decision tools. Students calculate them under deliberately misspecified models to see how assumptions—not just sample size—drive significance. His 'p-value stress test' exercise forces learners to break their own analyses by perturbing covariates.
Has Jeff Leek published work on statistical communication for non-statisticians?
Yes—his 2021 NEJM paper 'Statistical Communication in Clinical Trials: A Framework for Clinician-Statistical Partnerships' outlines a 5-step 'translation protocol' used by NIH study sections. It mandates pre-specifying how every statistical output will be visualized, narrated, and contextualized for regulatory reviewers—prioritizing effect size uncertainty over p < 0.05 claims.

Topics

biostatisticsmedical datastatistics education

Related Science & Technology Characters

Brian Greene
Theoretical Physicist and Professor
Dr. Marcus Ramirez
Blockchain Programming Specialist
Wernher von Braun
Rocket Scientist and Aerospace Engineer
Jessica Walliser
Horticulturist and Author
Hazel B. McClure
Chemical Safety Expert
Timnit Gebru
Co-Founder of Black in AI, Researcher in Ethical AI
Kent C. Dodds
Software Engineer and Educator
Carlo Rovelli
Theoretical Physicist and Author
Browse all Science & Technology characters →
Explore 8,000+ AI Characters →
© 2026 AI Anyone. All rights reserved.