Panel Discussion Questions from my Neyman Lecture: “Severity as a basic concept in philosophy of statistics”

 

Giordano, Snow, Yu, Stark, Recht

My Neyman Seminar in the Statistics Department at Berkeley was followed by a lively panel discussion including 4 Berkeley faculty, orchestrated by Ryan Giordano (Dept of Statistics):

  • Xueyin Snow Zhang (Dept. of Philosophy)
  • Bin Yu (Depts. of Statistics, Electrical Engineering and Computer Sciences)
  • Philip Stark (Dept. of Statistics)
  • Ben Recht (Dept. of Electrical Engineering and Computer Sciences)

I want to share the many fascinating (and difficult) questions put forward by panelists earlier on the day of my talk. Of course, once the live discussion began it took on a life of its own, and we rarely looked back at this list. I’m only now returning to many of those that we didn’t cover–and I’m interested in reader comments. I’m extremely grateful to the organizers, panelists, and audience for creating such a uniquely enriching and provocative exchange of ideas–one that is sure to be continued!

Philip Stark:

  • When and how did Statistics lose its way and become (largely) a mechanical way to bless results rather than a serious attempt to avoid fooling ourselves and others?
  • To what extent have statisticians been complicit in the corruption of Statistics?
  • Are there any clear turning points where things got noticeably worse?
  • Is this a problem of statistics instruction (teaching methodology rather than teaching how to answer scientific questions, deemphasizing assumptions, encouraging mechanical calculations and ignoring the interpretation of those calculations), of disciplinary myopia (to publish in the literature of particular disciplines, you are required to use inappropriate methods), of moral hazard (statisticians are often funded on scientific projects and have a strong incentive to do whatever it takes to bless “discoveries”), or something else?
  • What can academic statisticians do to help get the train back on the tracks? Can you point to good examples?

Snow Zhang

  • How does severity testing inform practical deliberation (e.g. policy-making, data-gathering)? More generally, what do you take to be the relationship between inference and decision-making?
  • Do you think we can be pluralists about “evidence”/”confirmation”/”warrant”? (E.g. Can it depend on the stakeholder (experimenter vs. policymaker vs. informant) and practical context, e.g. courtroom vs. academia vs. industry?)
  • In philosophy, Bayesianism is often taken to be a normative ideal (in two somewhat conflicting senses: 1. The closer we are to approximating Bayesian inference, the more rational we are; Idealized agents with no cognitive bounds should practice Bayesian inference, though the same is not necessarily true of bounded agents like us.) How do you think this version of Bayesianism relates to Bayesianism in statistical inference, and do you think it is subject to similar criticisms that you raised in your book?

Bin Yu

  • What does probability mean in severity testing? Could it go beyond the stochastic generative modeling? What does severity testing say specifically about model checking?
  • How does severity testing relate to the other sources of uncertainty in a data science life cycle such as those from data cleaning choices and modeling choices made by analysts?
  • Is severity testing necessary in most physical science data problems, say climate modeling?
  • What would be an example that you can share where severity testing are largely not needed?
  • Statistics is How do you see severity testing evolve into the AI age?

Ben Recht

  • What are your favorite examples of statistical methods being employed to definitively prove or disprove the effectiveness of an intervention? What was it about this application that elevates it above the common misapplications we all harp on?
  • Even if the epistemological value of statistical tests is highly questionable, is it reasonable to use statistical tests as benchmarks for regulatory approval (say for drugs or policy)?

I’ve had a few ‘blogologues’ with Recht recently on his blog and mine, e.g., here, here, and here.

Slides from my presentation are in my previous post. Please share your thoughts, and proposed replies, in the comments.

 

Categories: Berkeley Neyman Seminar | 4 Comments

Post navigation

4 thoughts on “Panel Discussion Questions from my Neyman Lecture: “Severity as a basic concept in philosophy of statistics”

  1. rkenett

    Philip Stark comments are right on the dot and deserve in depth considerations.

    Two comments:

    1. Box, Hunter and Hunter in the preface of their book on Statistics for experimenters write: “Even more important than learning about statistical techniques is the development of what might be called a capability for statistical thinking” Implicitly this advise goes against the mechanization of statistics. My take on this is that one should focus on statistical thinking per se with methods and examples. Several of my earlier inputs to this blog series are derived from such motivation.
    2. The evolution of statistics has been wonderfully depicted in the book by Efron and Hastie. It provides some answers to Stark’s comments. See this very short video which mentions this: https://user-images.githubusercontent.com/8720575/180794703-c6f05f40-eefd-4e1a-93f9-42cb78e6a6b4.mp4
  2. Edward Ionides

    This is a response to the Neyman Seminar, and specifically Philip Stark’s comments, but also related to thoughts I’ve had as a bystander watching other Mayo and Stark contributions over the past few years.

    I have spent much time developing theory, methodology and software for fitting general classes of partially observed stochastic dynamic models to data. However, we know that powerful methods for complex models are double-edged in practice. So, my collaborators and I wrote two case study papers to develop and demonstrate good practices:

    Wheeler, J., Rosengart, A., Jiang, Z., Tan, K., Treutle, N., & Ionides, E. L. (2024). Informing policy via dynamic models: Cholera in Haiti. PLOS Computational Biology, 20(4), e1012032. https://doi.org/10.1371/journal.pcbi.1012032

    Li, J., Ionides, E. L., King, A. A., Pascual, M., & Ning, N. (2024). Inference on spatiotemporal dynamics for coupled biological populations. Journal of the Royal Society Interface, 21(216), 20240217. https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2024.0217

    I am wondering the extent to which these address your concerns, at least the ones you and your coauthors raised in Saltelli, et al. (2020), “Five ways to ensure that models serve society: a manifesto”. Hopefully these papers are at least a step in a good direction.

    I understand the limitations of fitting mechanistic models to observational data – the model may seem causal, but ultimately the data analysis is an observational study and so causal interpretation of estimated parameters cannot be guaranteed. “Association is not causation” applies to sophisticated models as much as to simple linear models. Li et al. (2024) originally made this point explicitly, but a referee asked for the comment to be removed because it was not the main focus of the paper.

    An important issue with inference for complex models is the empirical rule (according to my observation) that the more time people spend on the computing involved in a data analysis, the less mental effort they spend criticizing the numbers that come out. Most reasonably careful and well-trained researchers can reason about the causal plausibility of a correlation coefficient. The result of a computation that took months to develop and days to run tends to receive less scrutiny.

    We address computation-induced myopia by comparing fit against simple statistical benchmarks, and via the use of “plug-and-play” algorithms (a specific term which loosely means simulation-based) used for likelihood-based inference on the mechanistic models. Benchmarks require scientists to keep improving their models, from the perspective of statistical fit, until the fit is tolerable. Plug-and-play inference algorithms give them the tools to do this, by letting them consider a flexible class of model variations. Likelihood-based inference maintains statistical efficiency and simplifies comparison against benchmarks provided by simple statistical models. Working to find a mechanistic model that statistically fits the data is, in my experience, more worthwhile than worrying about what it means to fit a model that does not statistically fit the data. Many published models for complex systems fit poorly as statistical models, if you care to look. Even for a model that does fit statistically, one still has to be careful about interpretation. The issues are analogous to familiar tasks with linear models: we investigate residuals and model variations, and we pay careful attention to the interpretation of collinearity in parameter estimates (or multimodal likelihood for nonlinear models) and potential confounding.

    There is a damned-if-you-do damned-if-you-don’t aspect to inferring the mechanisms of complex systems from observational data. Our obligation as statisticians is to develop improved methods as well as highlighting limitations of existing methods. A good start is to take advantage of lessons learned from the task of analyzing simpler observational studies.

    This is all only indirectly related to the topic of severity, though the classical data analysis approach we propose seems more compatible than Bayesian methods with the severity perspective.

    • Edward:

      Thank you so much for your comment and links! I will post each of the panelists’ comments in separate blogs, beginning with Stark. I’ll be extremely interested in his response to you.

  3. Edward:
    Thank you so much for your comment and references! I plan to post each of the panelist’s questions in separate blogposts, beginning with Stark, and we’ll definitely want to focus on your recommendations in detail.

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension.

Blog at WordPress.com.