Below are the slides from my talk today at Columbia University at a session, Philosophy of Science and the New Paradigm of Data-Driven Science, at an American Statistical Association Conference on Statistical Learning and Data Science/Nonparametric Statistics. Todd was brave to sneak in philosophy of science in an otherwise highly mathematical conference.
Philosophy of Science and the New Paradigm of Data-Driven Science : (Room VEC 902/903)
Organizer and Chair: Todd Kuffner (Washington U)
- Deborah Mayo (Virginia Tech) “Your Data-Driven Claims Must Still be Probed Severely”
- Ian McKeague (Columbia) “On the Replicability of Scientific Studies”
- Xiao-Li Meng (Harvard) “Conducting Highly Principled Data Science: A Statistician’s Job and Joy
Posting the full thread that I originally posted on Twitter (
https://twitter.com/orestistsinalis/status/1004097202887757824)
In data science, it is frequently the case that the metric that is being optimised in an ML model’s cost function is not what you *really* want to optimise for, because your problem is usually a function of the ML model’s metric (e.g. optimise log loss to improve accuracy).
Therefore the best H becomes really the best *observed* H of correlated (e.g. acc is somewhat correlated with log loss) or, worse, accidental but desirable properties of the model.
To severely probe an ML model in the context of a *specific problem*, one needs to show that a change in the model was expected to influence the problem solution in a specific way and not others.
People also tune hyperparameters to death via so-called ‘grid search’. This is a prototypical example in my opinion of how to *not* learn from error. For me, a severe test for hyperparam tuning is to show a plausible *path* of your hyperparam search.
PS: Important position paper on machine learning practices: “On Pace, Progress, and Empirical Rigor” https://t.co/A2qMyMx204 https://t.co/M1XeC4ncbr
This conversation began with a tweet by Frank Harrell: