Slides from my March 17 presentation on “Severe Testing: The Key to Error Correction” given at the Boston Colloquium for Philosophy of Science Alfred I.Taub forum on “Understanding Reproducibility and Error Correction in Science.”

Slides from my March 17 presentation on “Severe Testing: The Key to Error Correction” given at the Boston Colloquium for Philosophy of Science Alfred I.Taub forum on “Understanding Reproducibility and Error Correction in Science.”

57th Annual Program

Download the 57th Annual Program

**The Alfred I. Taub forum:**

**UNDERSTANDING REPRODUCIBILITY & ERROR CORRECTION IN SCIENCE**

Cosponsored by GMS and BU’s BEST at Boston University.

Friday, March 17, 2017

1:00 p.m. – 5:00 p.m.

The Terrace Lounge, George Sherman Union

775 Commonwealth Avenue

- Reputation, Variation, &, Control: Historical Perspectives

**Jutta Schickore**History and Philosophy of Science & Medicine, Indiana University, Bloomington. - Crisis in Science: Time for Reform?

**Arturo Casadevall**Molecular Microbiology & Immunology, Johns Hopkins - Severe Testing: The Key to Error Correction

**Deborah Mayo**Philosophy, Virginia Tech - Replicate That…. Maintaining a Healthy Failure Rate in Science

**Stuart Firestein**Biological Sciences, Columbia

**
I. The myth of objectivity.** Whenever you come up against blanket slogans such as “no methods are objective” or “all methods are equally objective and subjective,” it is a good guess that the problem is being trivialized into oblivion. Yes, there are judgments, disagreements, and values in any human activity, which alone makes it too trivial an observation to distinguish among very different ways that threats of bias and unwarranted inferences may be controlled. Is the objectivity-subjectivity distinction really toothless as many will have you believe? I say no.

Cavalier attitudes toward objectivity are in tension with widely endorsed movements to promote replication, reproducibility, and to come clean on a number of sources behind illicit results: multiple testing, cherry picking, failed assumptions, researcher latitude, publication bias and so on. The moves to take back science–if they are not mere lip-service–are rooted in the supposition that we can more objectively scrutinize results,even if it’s only to point out those that are poorly tested. The fact that the term “objectivity” is used equivocally should not be taken as grounds to oust it, but rather to engage in the difficult work of identifying what there is in “objectivity” that we won’t give up, and shouldn’t. Continue reading

Allan Birnbaum died 40 years ago today. He lived to be only 53 [i]. From the perspective of philosophy of statistics and philosophy of science, Birnbaum is best known for his work on likelihood, the Likelihood Principle [ii], and for his attempts to blend concepts of likelihood with error probability ideas to arrive at what he termed “concepts of statistical evidence”. Failing to find adequate concepts of statistical evidence, Birnbaum called for joining the work of “interested statisticians, scientific workers and philosophers and historians of science”–an idea I have heartily endorsed. While known for a result that the (strong) Likelihood Principle followed from sufficiency and conditionality principles (a result that Jimmy Savage deemed one of the greatest breakthroughs in statistics), a few years after publishing it, he turned away from it, perhaps discovering gaps in his argument. A post linking to a 2014 *Statistical Science* issue discussing Birnbaum’s result is here. Reference [5] links to the *Synthese* 1977 volume dedicated to his memory. The editors describe it as their way of “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. Ample weekend reading! Continue reading

Categories: Birnbaum, Likelihood Principle, phil/history of stat, Statistics
Tags: Birnbaum
62 Comments

*Today is Allan Birnbaum’s birthday. In honor of his birthday this year, I’m posting the articles in the *Synthese* volume that was dedicated to his memory in 1977. The editors describe it as their way of “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. I paste a few snippets from the articles by Giere and Birnbaum. If you’re interested in statistical foundations, and are unfamiliar with Birnbaum, here’s a chance to catch up.(Even if you are,you may be unaware of some of these key papers.)*

**HAPPY BIRTHDAY ALLAN!**

*Synthese* Volume 36, No. 1 Sept 1977: *Foundations of Probability and Statistics*, Part I

**Editorial Introduction:**

This special issue of

Syntheseon the foundations of probability and statistics is dedicated to the memory of Professor Allan Birnbaum. Professor Birnbaum’s essay ‘The Neyman-Pearson Theory as Decision Theory; and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory’ was received by the editors ofSynthesein October, 1975, and a decision was made to publish a special symposium consisting of this paper together with several invited comments and related papers. The sad news about Professor Birnbaum’s death reached us in the summer of 1976, but the editorial project could nevertheless be completed according to the original plan. By publishing this special issue we wish to pay homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics. We are grateful to Professor Ronald Giere who wrote an introductory essay on Professor Birnbaum’s concept of statistical evidence and who compiled a list of Professor Birnbaum’s publications.THE EDITORS

**Today is Allan Birnbaum’s Birthday. B**irnbaum’s (1962) classic “On the Foundations of Statistical Inference,” in *Breakthroughs in Statistics (volume I 1993), *concerns a principle that remains at the heart of today’s controversies in statistics–even if it isn’t obvious at first: the Likelihood Principle (LP) (also called the strong likelihood Principle SLP, to distinguish it from the weak LP [1]). According to the LP/SLP, given the statistical model, the information from the data are fully contained in the likelihood ratio. Thus, *properties of the sampling distribution of the test statistic vanish *(as I put it in my slides from my last post)! But error probabilities are all properties of the sampling distribution. Thus, embracing the LP (SLP) blocks our error statistician’s direct ways of taking into account “biasing selection effects” (slide #10).

* Intentions is a New Code Word: *Where, then,

Memory Lane: 3 years ago. Oxford Jail (also called Oxford Castle) is an entirely fitting place to be on (and around) Halloween! Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba! (It is now a boutique hotel, though many of the rooms are still too jail-like for me.) My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should, I think, be shelved, not that names should matter.

In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory. Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended. But for (most) Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort)

Criticisms then follow readily: the form of one or both:

- Error probabilities do not supply posterior probabilities in hypotheses, interpreted as if they do (and some say we just can’t help it), they lead to inconsistencies
- Methods with good long-run error rates can give rise to counterintuitive inferences in particular cases.
- I have proposed an alternative philosophy that replaces these tenets with different ones:
- the role of probability in inference is to quantify how reliably or severely claims (or discrepancies from claims) have been tested
- the severity goal directs us to the relevant error probabilities, avoiding the oft-repeated statistical fallacies due to tests that are overly sensitive, as well as those insufficiently sensitive to particular errors.
- Control of long run error probabilities, while necessary is not sufficient for good tests or warranted inferences.

**Abbreviated Table of Contents:**

Here are some items for your Saturday-Sunday reading.

**Link to complete discussion: **

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). *Statistical Science* 29 (2014), no. 2, 227-266.

**Links to individual papers:**

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle. *Statistical Science* 29 (2014), no. 2, 227-239.

Dawid, A. P. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. *Statistical Science* 29 (2014), no. 2, 240-241.

Evans, Michael. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. *Statistical Science* 29 (2014), no. 2, 242-246.

Martin, Ryan; Liu, Chuanhai. Discussion: Foundations of Statistical Inference, Revisited. *Statistical Science* 29 (2014), no. 2, 247-251.

Fraser, D. A. S. Discussion: On Arguments Concerning Statistical Principles. *Statistical Science* 29 (2014), no. 2, 252-253.

Hannig, Jan. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. *Statistical Science* 29 (2014), no. 2, 254-258.

Bjørnstad, Jan F. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. *Statistical Science* 29 (2014), no. 2, 259-260.

Mayo, Deborah G. Rejoinder: “On the Birnbaum Argument for the Strong Likelihood Principle”. *Statistical Science* 29 (2014), no. 2, 261-266.

**Abstract:** An essential component of inference based on familiar frequentist notions, such as p-values, significance and confidence levels, is the relevant sampling distribution. This feature results in violations of a principle known as the strong likelihood principle (SLP), the focus of this paper. In particular, if outcomes *x*^{∗} and *y*^{∗} from experiments *E*_{1} and *E*_{2} (both with unknown parameter *θ*), have different probability models* f*_{1}( . ),* f*_{2}( . ), then even though *f*_{1}(*x*^{∗}; *θ*) = c*f*_{2}(*y*^{∗}; *θ*) for all* θ*, outcomes *x*^{∗} and *y*^{∗}may have different implications for an inference about *θ*. Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox [Ann. Math. Statist. 29 (1958) 357–372] proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which *E _{i}* produced the measurement, the assessment should be in terms of the properties of

**Key words:** Birnbaumization, likelihood principle (weak and strong), sampling theory, sufficiency, weak conditionality

Regular readers of this blog know that the topic of the “Strong Likelihood Principle (SLP)” has come up quite frequently. Numerous informal discussions of earlier attempts to clarify where Birnbaum’s argument for the SLP goes wrong may be found on this blog. [SEE PARTIAL LIST BELOW.[i]] These mostly stem from my initial paper Mayo (2010) [ii]. I’m grateful for the feedback.

In the months since this paper has been accepted for publication, I’ve been asked, from time to time, to reflect informally on the overall journey: (1) Why was/is the Birnbaum argument so convincing for so long? (Are there points being overlooked, even now?) (2) What would Birnbaum have thought? (3) What is the likely upshot for the future of statistical foundations (if any)?

I’ll try to share some responses over the next week. (Naturally, additional questions are welcome.)

[i] A quick take on the argument may be found in the appendix to: “A Statistical Scientist Meets a Philosopher of Science: A conversation between David Cox and Deborah Mayo (as recorded, June 2011)”

- Midnight with Birnbaum (Happy New Year).
- New Version: On the Birnbaum argument for the SLP: Slides for my JSM talk.
- Don’t Birnbaumize that experiment my friend*–updated reblog.
- Allan Birnbaum, Philosophical Error Statistician: 27 May 1923 – 1 July 1976 .
- LSE seminar
- A. Birnbaum: Statistical Methods in Scientific Inference
- ReBlogging the Likelihood Principle #2: Solitary Fishing: SLP Violations
- Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle.

- Forthcoming paper on the strong likelihood principle.

**UPhils and responses**

- U-PHIL: Gandenberger & Hennig : Blogging Birnbaum’s Proof
- U-Phil: Mayo’s response to Hennig and Gandenberger
- Mark Chang (now) gets it right about circularity
- U-Phil: Ton o’ Bricks
- Blogging (flogging?) the SLP: Response to Reply- Xi’an Robert
- U-Phil: J. A. Miller: Blogging the SLP

- [ii]
- Mayo, D. G. (2010). “An Error in the Argument from Conditionality and Sufficiency to the Likelihood Principle” in
*Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science*(D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 305-14.

- Mayo, D. G. (2010). “An Error in the Argument from Conditionality and Sufficiency to the Likelihood Principle” in

**1.An Assumed Law of Statistical Evidence (law of likelihood)**

Nearly all critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with the following general assumption about the nature of inductive evidence or support:

Data ** x** are better evidence for hypothesis

Ian Hacking (1965) called this the * logic of support: x* supports hypotheses

Pr(*x;** H _{1}*) > Pr(

[With likelihoods, the data ** x** are fixed, the hypotheses vary.]*

Or,

** x** is evidence for

It is given in other ways besides, but it’s the same general idea. (Some will take the LR as actually quantifying the support, others leave it qualitative.)

In terms of rejection:

“An hypothesis should be rejected if and only if there is some rival hypothesis much better supported [i.e., much more likely] than it is.” (Hacking 1965, 89)

**2. Barnard (British Journal of Philosophy of Science )**

But this “law” will immediately be seen to fail on our minimal *severity requirement*. Hunting for an impressive fit, or trying and trying again, it’s easy to find a rival hypothesis *H _{1}* much better “supported” than

**3.Breaking the law (of likelihood) by going to the “second,” error statistical level:**

How does it fail our severity requirement? First look at what the frequentist error statistician must always do to critique an inference: she must consider the capability of the inference method that *purports* to provide evidence for a claim. She goes to a higher level or metalevel, as it were. In this case, the likelihood ratio plays the role of the needed statistic *d*(** X**). To put it informally, she asks:

What’s the probability the method would yield an LR disfavoring

Hcompared to some alternative_{0}Heven if_{1}His true?_{0 }

Today is **Egon Pearson’s birthday: 11 August 1895-12 June, 1980.**

E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

(Nowadays, some people concentrate to an absurd extent on “science-wise error rates in dichotomous screening”.)

When Erich Lehmann, in his review of my “Error and the Growth of Experimental Knowledge” (EGEK 1996), called Pearson “the hero of Mayo’s story,” it was because I found in E.S.P.’s work, if only in brief discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of N-P statistics. Granted, these “evidential” attitudes and practices have never been explicitly codified to guide the interpretation of N-P tests. If they had been, I would not be on about providing an inferential philosophy all these years.[i] Nevertheless, “Pearson and Pearson” statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect. One of the few sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It begins like this: Continue reading

Four ~~score~~ years ago (!) we held the conference “Statistical Science and Philosophy of Science: Where Do (Should) They meet?” at the London School of Economics, Center for the Philosophy of Natural and Social Science, CPNSS, where I’m visiting professor [1] Many of the discussions on this blog grew out of contributions from the conference, and conversations initiated soon after. The conference site is here; my paper on the general question is here.[2]

*My main contribution was “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. **It begins like this: *

**1. Comedy Hour at the Bayesian Retreat[3]**

** **Overheard at the comedy hour at the Bayesian retreat: Did you hear the one about the frequentist… Continue reading

Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference” is in *Breakthroughs in Statistics (volume I 1993). *I’ve a hunch that Birnbaum would have liked my rejoinder to discussants of my forthcoming paper (*Statistical Science*): **Bjornstad, Dawid, Evans, Fraser, Hannig, **and** Martin and Liu. **I hadn’t realized until recently that all of this is up under “future papers” here [1]. You can find the rejoinder: **STS1404-004RA0-2**. That takes away some of the surprise of having it all come out at once (and in final form). For those unfamiliar with the argument, at the end of this entry are slides from a recent, entirely informal, talk that I never posted, as well as some links from this blog. * Happy Birthday Birnbaum!* Continue reading

Categories: Birnbaum, Birnbaum Brakes, Likelihood Principle, Statistics
Leave a comment

At the start of our seminar, I said that “on weekends this spring (in connection with Phil 6334, but not limited to seminar participants) I will post some of my ‘deconstructions‘ of articles”. I began with Andrew Gelman‘s note “Ethics and the statistical use of prior information”[i], but never posted my deconstruction of it. So since it’s Saturday night, and the seminar is just ending, here it is, along with related links to Stat and ESP research (including me, Jack Good, Persi Diaconis and Pat Suppes). Please share comments especially in relation to current day ESP research. Continue reading

Categories: Background knowledge, Gelman, Phil6334, Statistics
35 Comments

Central Identification Laboratory

JPAC

*Guest, March 27, PHil 6334*

“Statistical Considerations of the Histomorphometric Test Protocol for Determination of Human Origin of Skeletal Remains”

By:

John E. Byrd, Ph.D. D-ABFA

Maria-Teresa Tersigni-Tarrant, Ph.D.

Central Identification Laboratory

JPAC

Categories: Phil6334, Philosophy of Statistics, Statistics
1 Comment

** **

“Philosophy majors rule” according to this recent article. We philosophers should be getting the word out. Admittedly, the type of people inclined to do well in philosophy are already likely to succeed in analytic areas. Coupled with the chuzpah of taking up an “outmoded and impractical” major like philosophy in the first place, innovative tendencies are not surprising. But can the study of philosophy also promote these capacities? I think it can and does; yet it could be far more effective than it is, if it was less hermetic and more engaged with problem-solving across the landscape of science,statistics,law,medicine,and evidence-based policy. Here’s the article: Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics
1 Comment

Slides (2 sets) from Phil 6334 2/27/14 class (Day#6).

D. Mayo:

“Frequentist Statistics as a Theory of Inductive Inference”

A. Spanos

“Probability/Statistics Lecture Notes 4: Hypothesis Testing”

Categories: P-values, Phil 6334 class material, Philosophy of Statistics, Statistics
Tags: David Cox
Leave a comment

PHIL 6334 – “Probability/Statistics Lecture Notes 3 for 2/20/14: Estimation (Point and Interval)”:(Prof. Spanos)*

*This is Day #5 on the Syllabus, as Day #4 had to be made up (Feb 24, 2014) due to snow. Slides for Day #4 will go up Feb. 26, 2014. (See the revised Syllabus Second Installment.)

Categories: Phil6334, Philosophy of Statistics, Spanos
5 Comments

** **

**Class, Part 2: A. Spanos:**

Probability/Statistics Lecture Notes 1: Introduction to Probability and Statistical Inference

Day #1 slides are here.

Categories: Phil 6334 class material, Philosophy of Statistics, Statistics
8 Comments

54th Annual Program

Download the 54th Annual Program

**REVISITING THE FOUNDATIONS OF STATISTICS IN THE ERA OF BIG DATA: SCALING UP TO MEET THE CHALLENGE**

Cosponsored by the Department of Mathematics & Statistics at Boston University.

Friday, February 21, 2014

10 a.m. – 5:30 p.m.

Photonics Center, 9th Floor Colloquium Room (Rm 906)

8 St. Mary’s Street

10 a.m.–noon

- Computational Challenges in Genomic Medicine

**Jill Mesirov**Computational Biology and Bioinformatics, Broad Institute - Selection, Significance, and Signification: Issues in High Energy Physics

**Kent Staley**Philosophy, Saint Louis University

1:30–5:30 p.m.

- Multi-Resolution Inference: An Engineering (Engineered?) Foundation of Statistical Inference

**Xiao-Li Meng Statistics**, Harvard University - Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?

**Deborah Mayo**Philosophy, Virginia Tech - Targeted Learning from Big Data

**Mark van der Laan**Biostatistics and Statistics, UC Berkeley

Panel Discussion

On weekends this spring (in connection with Phil 6334, but not limited to seminar participants) I will post relevant “comedy hours”, invites to analyze short papers or blogs (“U-Phils”, as in “U-philosophize”), and some of my “deconstructions” of articles. To begin with a “U-Phil”, consider a note by Andrew Gelman: “Ethics and the statistical use of prior information,”[i].

**I invite you to send (to error@vt.edu) informal analyses (“U-Phil”, ~500-750 words) by February 10) [iv]. Indicate if you want your remarks considered for possible posting on this blog.**

Writing philosophy differs from other types of writing: Some links to earlier U-Phils are here. Also relevant is this note: “So you want to do a philosophical analysis?”

* U-Phil (2/10/14): *In section 3 Gelman comments on some of David Cox’s remarks in a (highly informal and non-scripted) conversation we recorded:

* *“A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo,” published in *Rationality, Markets and Morals* [iii] (Section 2 has some remarks on Larry Wasserman, by the way.)

Here’s the relevant portion of the conversation:

COX:Deborah, in some fields foundations do not seem very important, but we both think foundations of statistical inference are important; why do you think that is?

MAYO:I think because they ask about fundamental questions of evidence, inference, and probability. I don’t think that foundations of different fields are all alike; because in statistics we’re so intimately connected to the scientific interest in learning about the world, we invariably cross into philosophical questions about empirical knowledge and inductive inference.

COX:One aspect of it is that it forces us to say what it is that we really want to know when we analyze a situation statistically. Do we want to put in a lot of information external to the data, or as little as possible. It forces us to think about questions of that sort.

MAYO:But key questions, I think, are not so much a matter of putting in a lot or a little information. …What matters is the kind of information, and how to use it to learn. This gets to the question of how we manage to be so successful in learning about the world, despite knowledge gaps, uncertainties and errors. To me that’s one of the deepest questions and it’s the main one I care about. I don’t think a (deductive) Bayesian computation can adequately answer it.…..

COX:There’s a lot of talk about what used to be called inverse probability and is now called Bayesian theory. That represents at least two extremely different approaches. How do you see the two? Do you see them as part of a single whole? Or as very different? Continue reading

Categories: Background knowledge, Philosophy of Statistics, U-Phil
Tags: D. G. Mayo, Sir David Cox
2 Comments