Your data-driven claims must still be probed severely

Vagelos Education Center

Below are the slides from my talk today at Columbia University at a session, Philosophy of Science and the New Paradigm of Data-Driven Science, at an American Statistical Association Conference on Statistical Learning and Data Science/Nonparametric Statistics. Todd was brave to sneak in philosophy of science in an otherwise highly mathematical conference.

Philosophy of Science and the New Paradigm of Data-Driven Science : (Room VEC 902/903)
Organizer and Chair: Todd Kuffner (Washington U)

  1. Deborah Mayo (Virginia Tech) “Your Data-Driven Claims Must Still be Probed Severely”
  2.  Ian McKeague (Columbia) “On the Replicability of Scientific Studies”
  3.  Xiao-Li Meng (Harvard) “Conducting Highly Principled Data Science: A Statistician’s Job and Joy


Categories: slides, Statistics and Data Science

Post navigation

5 thoughts on “Your data-driven claims must still be probed severely

  1. Posting the full thread that I originally posted on Twitter (

    In data science, it is frequently the case that the metric that is being optimised in an ML model’s cost function is not what you *really* want to optimise for, because your problem is usually a function of the ML model’s metric (e.g. optimise log loss to improve accuracy).

    Therefore the best H becomes really the best *observed* H of correlated (e.g. acc is somewhat correlated with log loss) or, worse, accidental but desirable properties of the model.

    To severely probe an ML model in the context of a *specific problem*, one needs to show that a change in the model was expected to influence the problem solution in a specific way and not others.

    People also tune hyperparameters to death via so-called ‘grid search’. This is a prototypical example in my opinion of how to *not* learn from error. For me, a severe test for hyperparam tuning is to show a plausible *path* of your hyperparam search.

    PS: Important position paper on machine learning practices: “On Pace, Progress, and Empirical Rigor”

    • This conversation began with a tweet by Frank Harrell:

Blog at