Announcement

 
 

2023 Syllabus for Philosophy of Inductive-Statistical Inference

PHIL 6014 (crn: 20919): Spring 2023 

Philosophy of Inductive-Statistical Inference
(This is an IN-PERSON class*)
Wed 4:00-6:30 pm, McBryde 223
(Office hours: Tuesdays 3-4; Wednesdays 1:30-2:30)

Syllabus: Second Installment (PDF)

D. Mayo (2018) Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST) CUP, 2018: SIST (electronic and paper provided to those taking the class; proofs are at errorstatistics.com, see below).
Supplemental text: Hacking, I. (2001). An introduction to probability and inductive logic. Cambridge University Press.
Articles from the Captain’s Bibliography (links to new articles will be provided). Other useful information can be found on the SIST Abstracts & Keywords and this post with SIST Excerpts & Mementos)

Date Themes/readings
1. 1/18       Introduction to the Course:
How to tell what’s true about statistical inference

(1/18/23 SLIDES here)

Reading: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST): Preface, Excursion 1 Tour I 1.1-1.3, 9-29

MISC: Souvenir A; SIST Abstracts & Keywords for all excursions and tours
2. 1/25
Q #2
 
Error Probing Tools vs Comparative Evidence: Likelihood & Probability
What counts as cheating?
Intro to Logic: arguments validity & soundness

(1/25/23 SLIDES here)

Reading: SIST: Excursion 1 Tour II 1.4-1.5, 30-55
Session #2 Questions: (PDF)

MISC: NOTES on Excursion 1, SIST: Souvenirs B, C & D, Logic Primer (PDF)
3. 2/1
   Q #3
UPDATED
Induction and Confirmation: PhilStat & Formal Epistemology
The Traditional Problem of Induction
Is Probability a Good Measure of Confirmation? Tacking Paradox

(2/1/23 SLIDES here)

Reading: SIST: Excursion 2, Tour I: 2.1-2.2, 59-74
Hacking “The Basic Rules of Probability” Hand Out (PDF)
UPDATED: Session #3 Questions: (PDF)

MISC: Excursion 2 Tour I Blurb & notes
4. 2/8 &
5. 2/15
Assign 1 2/15 
Falsification, Science vs Pseudoscience, Induction
Statistical Crises of Replication in Psychology & other sciences
Popper, severity and novelty, array of problems and models
Fallacies of rejection, Duhem’s problem; solving induction now

(/2/8/23 SLIDES here)

Reading for 2/8: Popper, Ch 1 from Conjectures and Refutations up to p. 59. (PDF),
This class overlaps with the next, so if you have time read Excursion 2, Tour II: (p. 75-82); Exhibit vi. (p. 82); and p. 108

Session #4 Questions: (PDF)
MISC (2/8): Self-quiz on Popper for Fun! (PDF); Cartoon Guide to Statistics (Link to VT Library link is here)
———————-
Reading for 2/15: SIST: Excursion 2, Tour II: read sections that interest you from those not covered last week. You can choose the example in 2.6 (or one from your field) or the discussion of solving induction in 2.7. Optional for 2/15: Gelman & Loken (2014)

(2/15/23 SLIDES here)

ASSIGNMENT 1 (due 2/15) (PDF)
MISC (2/15): SIST Souvenirs (E), (F), (G), (H); Excursion 2 Tour II Blurb & notes
  Fisher Birthday: February 17: Celebration on 2/22
6. 2/22
 Q #6
&
7. 3/1

 

Ingenious and Severe Tests: Fisher, Neyman-Pearson, Cox: Concepts of Tests


Reading for 2/22 from SIST: Excursion 3 Tour I: 3.1-3.3: read the sections that interest you, choosing to focus on the statistical tests, the history and philosophy of Fisher, Neiman and Pearson, the example of GTR. Choose 2 from the Triad (they’re very short): Fisher (1955), Pearson (1955), Neyman (1956)

(2/22/23 SLIDES here)

Session #6 Questions: (PDF)

Optional: The pathological Fisher (fiducial) and Neyman (performance) battle: SIST 388-391

——————————————-

Reading for 3/1: Sections from SIST skipped last week: Excursion 3 Tour I: (If time, look at the discussion of trade-offs 328-330) If interested in fiducial frequencies, see Neyman’s Performance and Fisher’s fiducial Section 5.8
Optional: Excursion 3 tour II: It’s the methods, stupid!

(3/1/23 SLIDES here)


MISC: Excursion 3 Tour I Blurb & notes; Souvenirs (I), (J), (K)
Morey app including Examples & Instructions (here);(Morey app) (SEV Apps)

SPRING BREAK Statistical Exercises While Sunning (March 4-12)

Sessions #11-14 are tentative;  please have a look at what’s in them so we can decide which to skip 
8. 3/15
Assign 2
Deeper Concepts (2 parts): Stat in the Higg’s discovery, and Confidence intervals and their duality with tests

Reading (for first part): Excursion 3 Tour III, 3.8 Higgs Discovery (See the ASA 6 principles on P-values: Note 4, P. 216, and Live Exhibit (ix) p. 200: Souv. N p. 201
Reading (for second part): Excursion 3 Tour III, 3.7: pp. 189-195

Assignment 2
(PDF) due 3/17/23

(3/15/23 (revised) SLIDES here)

Misc. Excursion 3 Tour III blurb & notes
9. 3/22

Testing Assumptions of Statistical Models (Guest Speaker: Aris Spanos on misspecification testing in statistics)

Reading: Excursion 4 Tour IV 4.8

(3/22/23 A. Spanos’ SLIDES here)

Misc. Excursion 4 Tour IV blurb & notes

10. 3/29

 

Who’s Exaggerating what? Bayes factors and Bayes/Fisher Disagreement, Jeffreys-Lindley Paradox (Guest Speaker: Richard Morey on Bayes Factors)

Reading. Excursion 4 Tour II 
(We are spend 2 weeks on this.)

Misc. Excursion 4 Tour II blurb & notes

11. 4/5

Mini essay

More on: Bayes factors and Bayes/Fisher Disagreement, Jeffreys-Lindley Paradox

Optional for those interested in objectivity in statistics:
Excursion 4 Tour I: 4.1, 4.2; 221-238
Peek Ahead: 6.7 Farewell Keepsake: 436-444 
 
Mini-essay (PDF)
12. 4/12

Biasing Selection Effects and Randomization

Reading: Excursion 4 Tour III  

(optional 5.7 Statistical Theatre: “Les Miserables Citations”: 371-381)
13. 4/19

Assign 3

Power: Pre-data and Post-data

Reading: Excursion 5 Tour I
14. 4/26 Positive Predictive Value and Probabilistic Instantiation

Controversies about inferring probabilities from frequencies (in law and epistemology)

Reading: Selection from Section 5.6 (excursion 5 Tour II); C. Howson (1997)
15. 5/3 Current Reforms and Stat Activism: Practicing our skills on some well-known papers
   Final Paper
Categories: Announcement, new course | 2 Comments

I’m teaching a New Intro to PhilStat Course Starting Wednesday:

Ship StatInfasst (Statistical Inference as Severe Testing: SIST) will set sail on Wednesday January 18 when I begin a weekly seminar on the Philosophy of Inductive-statistical inference. I’m planning to write a new edition and/or companion to SIST (Mayo 2018, CUP), so it will be good to retrace the journey. I’m not requiring a statistics or philosophy background. All materials will be on this blog, and around halfway through there may be an opportunity to zoom, if there’s interest. Continue reading

Categories: Announcement, new course | 2 Comments

Final Session: The Statistics Wars and Their Casualties: 8 December, Session 4

Thursday, December 8 will be the Final Session (Session 4) of my workshop, The Statistics Wars and Their Casualties. There will be 4 new speakers. It’s not too late to register:

registration form

At the end of this post is “A recap of recaps”, the short video we showed at the beginning of Session 3 last week that summarizes the presentations from Sessions 1 & 2 back in September 22-23. Continue reading

Categories: Announcement, Stistics Wars and Their Casualties Workshop | Leave a comment

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

It’s not too late to register for Sessions #3 and #4 of our online Workshop. There will be 7 new (live) speakers and, for the the first time ever, the (short) movie; “The Recap of recaps” will be shown at the start of session #3. registration form

Categories: Announcement, Stistics Wars and Their Casualties Workshop | Leave a comment

Final Sessions: The Statistics Wars and Their Casualties: 1 December and 8 December

The Statistics Wars

and Their Casualties

1 December and 8 December 2022
Sessions #3 and #4

15:00-18:15 pm London Time/10:00am-1:15pm EST
ONLINE
(London School of Economics, CPNSS)
registration form

For slides and videos of Sessions #1 and #2: see the workshop page

1 December

Session 3 (Moderator: Daniël Lakens, Eindhoven University of Technology)

OPENING 

  • “What Happened So Far”: A medley (20 min) of recaps from Sessions 1 & 2: Deborah Mayo (Virginia Tech), Richard Morey (Cardiff), Stephen Senn (Edinburgh), Daniël Lakens (Eindhoven), Christian Hennig (Bologna) & Yoav Benjamini (Tel Aviv).

SPEAKERS

  • Daniele Fanelli (London School of Economics and Political Science) The neglected importance of complexity in statistics and Metascience  (Abstract)
  • Stephan Guttinger (University of Exeter) What are questionable research practices? (Abstract)
  • David J. Hand (Imperial College, London) What’s the question? (Abstract)

DISCUSSIONS:

  • Closing Panel: “Where Should Stat Activists Go From Here (Part i)?”: Yoav Benjamini, Daniele Fanelli, Stephan Guttinger, David Hand, Christian Hennig, Daniël Lakens, Deborah Mayo, Richard Morey, Stephen Senn

8 December

Session 4 (Moderator: Deborah Mayo, Virginia Tech)

SPEAKERS

  • Jon Williamson (University of Kent) Causal inference is not statistical inference (Abstract)
  • Margherita Harris (London School of Economics and Political Science) On Severity, the Weight of Evidence, and the Relationship Between the Two (Abstract)
  • Aris Spanos (Virginia Tech) Revisiting the Two Cultures in Statistical Modeling and Inference as they relate to the Statistics Wars and Their Potential Casualties (Abstract)
  • Uri Simonsohn (Esade Ramon Llull University) Mathematically Elegant Answers to Research Questions No One is Asking (meta-analysis, random effects models, and Bayes factors) (Abstract)

DISCUSSIONS;

  • Closing Panel: “Where Should Stat Activists Go From Here (Part ii)?”: Workshop Participants: Yoav Benjamini, Alexander Bird, Mark Burgman, Daniele Fanelli, Stephan Guttinger, David Hand, Margherita Harris, Christian Hennig, Daniël Lakens, Deborah Mayo, Richard Morey, Stephen Senn, Uri Simonsohn, Aris Spanos, Jon Williamson

**********************************************************************

  • DESCRIPTION: While the field of statistics has a long history of passionate foundational controversy, the last decade has, in many ways, been the most dramatic. Misuses of statistics, biasing selection effects, and high-powered methods of big-data analysis, have helped to make it easy to find impressive-looking but spurious results that fail to replicate. As the crisis of replication has spread beyond psychology and social sciences to biomedicine, genomics, machine learning and other fields, the need for critical appraisal of proposed reforms is growing. Many are welcome (transparency about data, eschewing mechanical uses of statistics); some are quite radical. The experts do not agree on the best ways to promote trustworthy results, and these disagreements often reflect philosophical battles–old and new– about the nature of inductive-statistical inference and the roles of probability in statistical inference and modeling. Intermingled in the controversies about evidence are competing social, political, and economic values. If statistical consumers are unaware of assumptions behind rival evidence-policy reforms, they cannot scrutinize the consequences that affect them. What is at stake is a critical standpoint that we may increasingly be in danger of losing. Critically reflecting on proposed reforms and changing standards requires insights from statisticians, philosophers of science, psychologists, journal editors, economists and practitioners from across the natural and social sciences. This workshop will bring together these interdisciplinary insights–from speakers as well as attendees.

Speakers/Panellists:

Sponsors/Affiliations:

  • The Foundation for the Study of Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science (E.R.R.O.R.S.); Centre for Philosophy of Natural and Social Science (CPNSS), London School of Economics; Virginia Tech Department of Philosophy
  • Organizers: D. Mayo, R. Frigg and M. Harris
    Logistician
    (chief logistics and contact person): Jean Miller
    Executive Planning Committee: Y. Benjamini, D. Hand, D. Lakens, S. Senn
Categories: Announcement, Stistics Wars and Their Casualties Workshop | Leave a comment

Multiplicity, Data-Dredging, and Error Control Symposium at PSA 2022: Mayo, Thornton, Glymour, Mayo-Wilson, Berger

.

Some claim that no one attends Sunday morning (9am) sessions at the Philosophy of Science Association. But if you’re attending the PSA (in Pittsburgh), we hope you’ll falsify this supposition and come to hear us (Mayo, Thornton, Glymour, Mayo-Wilson, Berger) wrestle with some rival views on the trenchant problems of multiplicity, data-dredging, and error control. Coffee and donuts to all who show up.

Multiplicity, Data-Dredging, and Error Control
November 13, 9:00 – 11:45 AM
(link to symposium on PSA website)

Speakers: Continue reading

Categories: Announcement, PSA | Leave a comment

Upcoming Workshop: The Statistics Wars and Their Casualties workshop

The Statistics Wars
and Their Casualties

22-23 September 2022
15:00-18:00 pm London Time*
ONLINE
(London School of Economics, CPNSS)

To register for the  workshop,
please fill out the registration form here.

For schedules and updated details, please see the workshop webpage: phil-stat-wars.com.

*These will be sessions 1 & 2, there will be two more
online sessions (3 & 4) on December 1 & 8.

Continue reading

Categories: Announcement, stat wars and their casualties | 1 Comment

Free access to Statistical Inference as Severe Testing: How to Get Beyond the Stat Wars” (CUP, 2018) for 1 more week

.

Thanks to CUP, the electronic version of my book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018), is available for free for one more week (through August 31) at this link:  https://www.cambridge.org/core/books/statistical-inference-as-severe-testing/D9DF409EF568090F3F60407FF2B973B2  Blurbs of the 16 tours in the book may be found here: blurbs of the 16 tours.

Categories: Announcement, SIST | Leave a comment

Read It Free: “Stat Inference as Severe Testing: How to Get Beyond the Stat Wars” during August

CUP will make the electronic version of my book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018), available to access for free from August 1-31 at this link:  https://www.cambridge.org/core/books/statistical-inference-as-severe-testing/D9DF409EF568090F3F60407FF2B973B2 However, they will confirm the link closer to August, so check this blog on Aug 1 for any update, if you’re interested. (July 31, the link works!) (August 5, the link is working. Let me know if you have problems getting in.) Blurbs of the 16 tours in the book may be found here: blurbs of the 16 tours.

Here’s a CUP interview from when the book first came out.

Categories: Announcement, SIST | Leave a comment

The Statistics Wars and Their Casualties Workshop-Now Online

The Statistics Wars
and Their Casualties 

22-23 September 2022
15:00-18:00 pm London Time*

ONLINE 

To register for the workshop, please fill out the registration form here.

*These will be sessions 1 & 2, there will be two more
The future online sessions (3 & 4)  at 15:00-18:00 pm London Time on December 1 & 8.

Yoav Benjamini (Tel Aviv University), Alexander Bird (University of Cambridge), Mark Burgman (Imperial College London),  Daniele Fanelli (London School of Economics and Political Science), Roman Frigg (London School of Economics and Political Science),
Stephan Guttinger
(University of Exeter), David Hand (Imperial College London), Margherita Harris (London School of Economics and Political Science), Christian Hennig (University of Bologna), Daniël Lakens (Eindhoven University of Technology), Deborah Mayo (Virginia Tech), Richard Morey (Cardiff University), Stephen Senn (Edinburgh, Scotland), Jon Williamson (University of Kent) Continue reading

Categories: Announcement, Error Statistics | Leave a comment

Philosophy of socially aware data science conference

I’ll be speaking at this conference in Philly tomorrow. My slides are also below.

 

PDF of my slides: Statistical “Reforms”: Fixing Science or Threats to Replication and Falsification. Continue reading

Categories: Announcement, Philosophy of Statistics, socially aware data science | Leave a comment

Philosophy of Science Association (PSA) 22 Call for Contributed Papers

PSA2022: Call for Contributed Papers

https://psa2022.dryfta.com/

Twenty-Eighth Biennial Meeting of the Philosophy of Science Association
November 10 – November 13, 2022
Pittsburgh, Pennsylvania

 

Submissions open on March 9, 2022 for contributed papers to be presented at the PSA2022 meeting in Pittsburgh, Pennsylvania, on November 10-13, 2022. The deadline for submitting a paper is 11:59 PM Pacific Standard Time on April 6, 2022. 

Contributed papers may be on any topic in the philosophy of science. The PSA2022 Program Committee is committed to assembling a program with high-quality papers on a variety of topics and diverse presenters that reflects the full range of current work in the philosophy of science. Continue reading

Categories: Announcement

“Should Science Abandon Statistical Significance?” Session at AAAS Annual Meeting, Feb 18

Karen Kafadar, Yoav Benjamini, and Donald Macnaughton will be in a session:

Should Science Abandon Statistical Significance?

Friday, Feb 18 from 2-2:45 PM (EST) at the AAAS 2022 annual meeting.

The general program is here. To register*, go to this page.

Synopsis

The concept of statistical significance is central in scientific research. However, the concept is often poorly understood and thus is often unfairly criticized. This presentation includes three independent but overlapping arguments about the usefulness of the concept of statistical significance to reliably detect “effects” in frontline scientific research data. We illustrate the arguments with examples of scientific importance from genomics, physics, and medicine. We explain how the concept of statistical significance provides a cost-efficient objective way to empower scientific research with evidence.

Papers Continue reading

Categories: AAAS, Announcement, statistical significance | Tags:

ENBIS Webinar: Statistical Significance and p-values

Yesterday’s event video recording is available at:
https://www.youtube.com/watch?v=2mWYbcVflyE&t=10s

European Network for Business and Industrial Statistics (ENBIS) Webinar:
Statistical Significance and p-values
Europe/Amsterdam (CET); 08:00-09:30 am (EST)

ENBIS will dedicate this webinar to the memory of Sir David Cox, who sadly passed away in January 2022.

Continue reading

Categories: Announcement, significance tests, Sir David Cox | Tags: ,

January 11: Phil Stat Forum (remote): Statistical Significance Test Anxiety

Special Session of the (remote)
Phil Stat Forum:

11 January 2022

“Statistical Significance Test Anxiety”

TIME: 15:00-17:00 (London, GMT); 10:00-12:00 (EST)

Presenters: Deborah Mayo (Virginia Tech) &
Yoav Benjamini (Tel Aviv University)

Moderator: David Hand (Imperial College London)

Deborah Mayo       Yoav Benjamini        David Hand

Continue reading

Categories: Announcement, David Hand, Phil Stat Forum, significance tests, Yoav Benjamini

January 11: Phil Stat Forum (remote)

Special Session of the (remote)
Phil Stat Forum:

11 January 2022

“Statistical Significance Test Anxiety”

TIME: 15:00-17:00 (London, GMT); 10:00-12:00 (EST)

Presenters: Deborah Mayo (Virginia Tech) &
Yoav Benjamini (Tel Aviv University)

Moderator: David Hand (Imperial College London)

Deborah Mayo       Yoav Benjamini        David Hand


Focus of the Session: 

Continue reading

Categories: Announcement, David Hand, Phil Stat Forum, significance tests, Yoav Benjamini

Our session is now remote: Philo of Sci Association (PSA): Philosophy IN Science (PinS): Can Philosophers of Science Contribute to Science?

.

Philosophy in Science: Can Philosophers of Science Contribute to Science?
     on November 13, 2-4 pm

 

OUR SESSION HAS BECOME REMOTE: PLEASE JOIN US on ZOOM! This session revolves around the intriguing question: Can Philosophers of Science Contribute to Science? They’re calling it philosophy “in” science–when philosophical ministrations actually intervene in a science itself.  This is the session I’ll be speaking in. I hope you will come to our session if you’re there–it’s hybrid, so you can’t see it through a remote link. But I’d like to hear what you think about this question–in the comments to this post. Continue reading

Categories: Announcement, PSA 2021

CUNY zoom talk on Wednesday: Evidence as Passing a Severe Test

If interested, write to me for the zoom link (error@vt.edu).

Categories: Announcement

Why hasn’t the ASA Board revealed the recommendations of its new task force on statistical significance and replicability?

something’s not revealed

A little over a year ago, the board of the American Statistical Association (ASA) appointed a new Task Force on Statistical Significance and Replicability (under then president, Karen Kafadar), to provide it with recommendations. [Its members are here (i).] You might remember my blogpost at the time, “Les Stats C’est Moi”. The Task Force worked quickly, despite the pandemic, giving its recommendations to the ASA Board early, in time for the Joint Statistical Meetings at the end of July 2020. But the ASA hasn’t revealed the Task Force’s recommendations, and I just learned yesterday that it has no plans to do so*. A panel session I was in at the JSM, (P-values and ‘Statistical Significance’: Deconstructing the Arguments), grew out of this episode, and papers from the proceedings are now out. The introduction to my contribution gives you the background to my question, while revealing one of the recommendations (I only know of 2). Continue reading

Categories: 2016 ASA Statement on P-values, JSM 2020, replication crisis, statistical significance tests, straw person fallacy

The Statistics Debate (NISS) in Transcript Form

I constructed, together with Jean Miller, a transcript from the October 15 Statistics Debate (with me, J. Berger and D. Trafimow and moderator D. Jeske), sponsored by NISS. It’s so much easier to access the material this way rather than listening to it on the video. Using this link, you can see the words and hear the video at the same time, as well as pause and jump around. Below, I’ve pasted our responses to Question #1. Have fun and please share your comments.

Dan Jeske: [QUESTION 1] Given the issues surrounding the misuses and abuse of p values, do you think they should continue to be used or not? Why or why not?

Deborah Mayo  03:46

Thank you so much. And thank you for inviting me, I’m very pleased to be here. Yes, I say we should continue to use p values and statistical significance tests. Uses of p values are really just a piece in a rich set of tools intended to assess and control the probabilities of misleading interpretations of data, i.e., error probabilities. They’re the first line of defense against being fooled by randomness as Yoav Benjamini puts it. If even larger, or more extreme effects than you observed are frequently brought about by chance variability alone, i.e., p value not small, clearly you don’t have evidence of incompatibility with the mere chance hypothesis. It’s very straightforward reasoning. Even those who criticize p values you’ll notice will employ them, at least if they care to check their assumptions of their models. And this includes well known Bayesian such as George Box, Andrew Gelman, and Jim Berger. Critics of p values often allege it’s too easy to obtain small p values. But notice the whole replication crisis is about how difficult it is to get small p values with preregistered hypotheses. This shows the problem isn’t p values, but those selection effects and data dredging. However, the same data drenched hypothesis can occur in other methods, likelihood ratios, Bayes factors, Bayesian updating, except that now we lose the direct grounds to criticize inferences for flouting error statistical control. The introduction of prior probabilities, which may also be data dependent, offers further researcher flexibility. Those who reject p values are saying we should reject the method because it can be used badly. And that’s a bad argument. We should reject misuses of p values. But there’s a danger of blindly substituting alternative tools that throw out the error control baby with the bad statistics bathwater.

Dan Jeske  05:58

Thank you, Deborah, Jim, would you like to comment on Deborah’s remarks and offer your own?

Jim Berger  06:06

Okay, yes. Well, I certainly agree with much of what Deborah said, after all, a p value is simply a statistic. And it’s an interesting statistic that does have many legitimate uses, when properly calibrated. And Deborah mentioned one such case is model checking where Bayesians freely use some version of p values for model checking. You know, on the other hand, that one interprets this question, should they continue to be used in the same way that they’re used today? Then my, my answer would be somewhat different. I think p values are commonly misinterpreted today, especially when when they’re used to test a sharp null hypothesis. For instance, of a p value of .05, is commonly interpreted as by many is indicating the evidence is 20 to one in favor of the alternative hypothesis. And that just that just isn’t true. You can show for instance, that if I’m testing with a normal mean of zero versus nonzero, the odds of the alternative hypothesis to the null hypothesis can at most be seven to one. And that’s just a probabilistic fact, doesn’t involve priors or anything. It just is, is a is an answer covering all probability. And so that 20 to one cannot be if it’s, if it’s, if a p value of .05 is interpreted as 20 to one, it’s just, it’s just being interpreted wrongly, and the wrong conclusions are being reached. I’m reminded of an interesting paper that was published some time ago now, which was reporting on a survey that was designed to determine whether clinical practitioners understood what a p value was. The results of the survey were published and were not surprising. Most clinical practitioners interpreted the p value as something like a p value of .05 as something like 20 to one odds against the null hypothesis, which again, is incorrect. The fascinating aspect of the paper is that the authors also got it wrong. Deborah pointed out that the p value is the probability under the null hypothesis of the data or something more extreme. The author’s stated that the correct answer was the p value is the probability of the data under the null hypothesis, they forgot the more extreme. So, I love this article, because the scientists who set out to show that their colleagues did not understand the meaning of p values themselves did not understand the meaning of p values. 

Dan Jeske  08:42

David?

David Trafimow  08:44

Okay. Yeah, Um, like Deborah and Jim, I’m delighted to be here. Thanks for the invitation. Um and I partly agree with what both Deborah and Jim said, um, it’s certainly true that people misuse p values. So, I agree with that. However, I think p values are more problematic than the other speakers have mentioned. And here’s here’s the problem for me. We keep talking about p values relative to hypotheses, but that’s not really true. P values are relative to hypotheses plus additional assumptions. So, if we call, if we use the term model to describe the null hypothesis, plus additional assumptions, then p values are based on models, not on hypotheses, or only partly on hypotheses. Now, here’s the thing. What are these other assumptions? An example would be random selection from the population, an assumption that is not true in any one of the thousands of papers I’ve read in psychology. And there are other assumptions, a lack of systematic error, linearity, and then we can go on and on, people have even published taxonomies of the assumptions because there are so many of them. See, it’s tantamount to impossible that the model is correct, which means that the model is wrong. And so, what you’re in essence doing then, is you’re using the p value to index evidence against a model that is already known to be wrong. And even the part about indexing evidence is questionable, but I’ll go with it for the moment. But the point is the model was wrong. And so, there’s no point in indexing evidence against it. So given that, I don’t really see that there’s any use for them. There’s, p values don’t tell you how close the model is to being right. P values don’t tell you how valuable the model is. P values pretty much don’t tell you anything that researchers might want to know, unless you misuse them. Anytime you draw a conclusion from a p value, you are guilty of misuse. So, I think the misuse problem is much more subtle than is perhaps obvious at firsthand. So, that’s really all I have to say at the moment.

Dan Jeske  11:28

Thank you. Jim, would you like to follow up?

Jim Berger  11:32

Yes,  so, so,  I certainly agree that that assumptions are often made that are wrong. I won’t say that that’s always the case. I mean, I know many scientific disciplines where I think they do a pretty good job, and work with high energy physicists, and they do a pretty good job of checking their assumptions. Excellent job. And they use p values. It’s something to watch out for. But any statistical analysis, you know, can can run into this problem. If the assumptions are wrong, it’s, it’s going to be wrong.

Dan Jeske  12:09

Deborah…

Deborah Mayo  12:11

Okay. Well, Jim thinks that we should evaluate the p value by looking at the Bayes factor when he does, and he finds that they’re exaggerating, but we really shouldn’t expect agreement on numbers from methods that are evaluating different things. This is like supposing that if we switch from a height to a weight standard, that if we use six feet with the height, we should now require six stone, to use an example from Stephen Senn. On David, I think he’s wrong about the worrying assumptions with using the p value since they have the least assumptions of any other method, which is why people and why even Bayesians will say we need to apply them when we need to test our assumptions. And it’s something that we can do, especially with randomized controlled trials, to get the assumptions to work. The idea that we have to misinterpret p values to have them be relevant, only rests on supposing that we need something other than what the p value provides.

Dan Jeske  13:19

David, would you like to give some final thoughts on this question?

David Trafimow  13:23

Sure. As it is, as far as Jim’s point, and Deborah’s point that we can do things to make the assumptions less wrong. The problem is the model is wrong or it isn’t wrong. Now if the model is close, that doesn’t justify the p value because the p value doesn’t give the closeness of the model. And that’s the, that’s the problem. We’re not we’re not using, for example, a sample mean, to estimate a population mean, in which case, yeah, you wouldn’t expect the sample mean to be exactly right. If it’s close, it’s still useful. The problem is that p values don’t tell you p values aren’t being used to estimate anything. So, if you’re not estimating anything, then you’re stuck with either correct or incorrect, and the answer is always incorrect that, you know, this is especially true in psychology, but I suspect it might even be true in physics. I’m not the physicist that Jim is. So, I can’t say that for sure.

Dan Jeske  14:35

Jim, would you like to offer Final Thoughts?

Jim Berger  14:37

Let me comment on Deborah’s comment about Bayes factors are just a different scale of measurement. My my point was that it seems like people invariably think of p values as something like odds or probability of the null hypothesis, if that’s the way they’re thinking, because that’s the way their minds reason. I believe we should provide them with odds. And so, I try to convert p values into odds or Bayes factors, because I think that’s much more readily understandable by people.

Dan Jeske  15:11

Deborah, you have the final word on this question.

Deborah Mayo  15:13

I do think that we need a proper philosophy of statistics to interpret p values. But I think also that what’s missing in the reject p values movement is a major reason for calling in statistics in science is to give us tools to inquire whether an observed phenomena can be a real effect, or just noise in the data and the P values have intrinsic properties for this task, if used properly, other methods don’t, and to reject them is to jeopardize this important role. As Fisher emphasizes, we need randomized control trials precisely to ensure the validity of statistical significance tests, to reject them because they don’t give us posterior probabilities is illicit. In fact, I think that those claims that we want such posteriors need to show for any way we can actually get them, why. 

You can watch the debate at the NISS website or in this blog post.

You can find the complete audio transcript at this LINK: https://otter.ai/u/hFILxCOjz4QnaGLdzYFdIGxzdsg
[There is a play button at the bottom of the page that allows you to start and stop the recording. You can move about in the transcript/recording by using the pause button and moving the cursor to another place in the dialog and then clicking the play button to hear the recording from that point. (The recording is synced to the cursor.)]

Categories: D. Jeske, D. Trafimow, J. Berger, NISS, statistics debate

Blog at WordPress.com.