Er, about those other approaches, hold off until a balanced appraisal is in

Posted on April 1, 2017 by Mayo

I could have told them that the degree of accordance enabling the ASA’s “6 principles” on p-values was unlikely to be replicated when it came to most of the “other approaches” with which some would supplement or replace significance tests– notably Bayesian updating, Bayes factors, or likelihood ratios (confidence intervals are dual to hypotheses tests). [My commentary is here.] So now they may be advising a “hold off” or “go slow” approach until some consilience is achieved. Is that it? I don’t know. I was tweeted an article about the background chatter taking place behind the scenes; I wasn’t one of people interviewed for this. Here are some excerpts, I may add more later after it has had time to sink in. (check back later)

“Reaching for Best Practices in Statistics: Proceed with Caution Until a Balanced Critique Is In”

J. Hossiason

“[A]ll of the other approaches*, as well as most statistical tools, may suffer from many of the same problems as the p-values do. What level of likelihood ratio in favor of the research hypothesis will be acceptable to the journal? Should scientific discoveries be based on whether posterior odds pass a specific threshold (P3)? Does either measure the size of an effect (P5)?…How can we decide about the sample size needed for a clinical trial—however analyzed—if we do not set a specific bright-line decision rule? 95% confidence intervals or credence intervals…offer no protection against selection when only those that do not cover 0, are selected into the abstract (P4). (Benjamini, ASA commentary, pp. 3-4)

What’s sauce for the goose is sauce for the gander right? Many statisticians seconded George Cobb who urged “the board to set aside time at least once every year to consider the potential value of similar statements” to the recent ASA p-value report. Disappointingly, a preliminary survey of leaders in statistics, many from the original p-value group, aired striking disagreements on best and worst practices with respect to these other approaches. The Executive Board is contemplating a variety of recommendations, minimally, that practitioners move with caution until they can put forward at least a few agreed upon principles for interpreting and applying Bayesian inference methods. The words we heard ranged from “go slow” to “moratorium“ [emphasis mine]. Having been privy to some of the results of this survey, we at Stat Report Watch decided to contact some of the individuals involved. Continue reading →

Categories: P-values, reforming the reformers, Statistics | 6 Comments

Slides from the Boston Colloquium for Philosophy of Science: “Severe Testing: The Key to Error Correction”

Posted on March 26, 2017 by Mayo

Slides from my March 17 presentation on “Severe Testing: The Key to Error Correction” given at the Boston Colloquium for Philosophy of Science Alfred I.Taub forum on “Understanding Reproducibility and Error Correction in Science.”

Categories: fallacy of rejection, Fisher, fraud, frequentist/Bayesian, Likelihood Principle, reforming the reformers | 16 Comments

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Understanding Reproducibility & Error Correction in Science

Posted on March 13, 2017 by Mayo

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2016–2017
57th Annual Program

Download the 57th Annual Program

The Alfred I. Taub forum:

UNDERSTANDING REPRODUCIBILITY & ERROR CORRECTION IN SCIENCE

Cosponsored by GMS and BU’s BEST at Boston University.
Friday, March 17, 2017
1:00 p.m. – 5:00 p.m.
The Terrace Lounge, George Sherman Union
775 Commonwealth Avenue

Reputation, Variation, &, Control: Historical Perspectives
Jutta Schickore History and Philosophy of Science & Medicine, Indiana University, Bloomington.
Crisis in Science: Time for Reform?
Arturo Casadevall Molecular Microbiology & Immunology, Johns Hopkins
Severe Testing: The Key to Error Correction
Deborah Mayo Philosophy, Virginia Tech
Replicate That…. Maintaining a Healthy Failure Rate in Science
Stuart Firestein Biological Sciences, Columbia

Categories: Announcement, Statistical fraudbusting, Statistics | Leave a comment

The ASA Document on P-Values: One Year On

Posted on March 8, 2017 by Mayo

I’m surprised it’s a year already since posting my published comments on the ASA Document on P-Values. Since then, there have been a slew of papers rehearsing the well-worn fallacies of tests (a tad bit more than the usual rate). Doubtless, the P-value Pow Wow raised people’s consciousnesses. I’m interested in hearing reader reactions/experiences in connection with the P-Value project (positive and negative) over the past year. (Use the comments, share links to papers; and/or send me something slightly longer for a possible guest post.)
Some people sent me a diagram from a talk by Stephen Senn (on “P-values and the art of herding cats”). He presents an array of different cat commentators, and for some reason Mayo cat is in the middle but way over on the left side,near the wall. I never got the key to interpretation. My contribution is below:

Chart by S.Senn

“Don’t Throw Out The Error Control Baby With the Bad Statistics Bathwater”

D. Mayo*[1]

The American Statistical Association is to be credited with opening up a discussion into p-values; now an examination of the foundations of other key statistical concepts is needed. Continue reading →

Categories: Bayesian/frequentist, P-values, science communication, Statistics, Stephen Senn | 14 Comments

3 YEARS AGO (FEBRUARY 2014): MEMORY LANE

Posted on February 28, 2017 by Mayo

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: February 2014. I normally mark in red three posts from each month that seem most apt for general background on key issues in this blog, but I decided just to list these as they are (some are from a seminar I taught with Aris Spanos 3 years ago; several on Fisher were recently reblogged). I hope you find something of interest!

February 2014

(2/1) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs B-boosts)
(2/3) PhilStock: Bad news is bad news on Wall St. (rejected post)
(2/5) “Probabilism as an Obstacle to Statistical Fraud-Busting” (draft iii)
(2/9) Phil6334: Day #3: Feb 6, 2014
(2/10) Is it true that all epistemic principles can only be defended circularly? A Popperian puzzle
(2/12) Phil6334: Popper self-test
(2/13) Phil 6334 Statistical Snow Sculpture
(2/14) January Blog Table of Contents
(2/15) Fisher and Neyman after anger management?
(2/17) R. A. Fisher: how an outsider revolutionized statistics
(2/18) Aris Spanos: The Enduring Legacy of R. A. Fisher
(2/20) R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
(2/21) STEPHEN SENN: Fisher’s alternative to the alternative
(2/22) Sir Harold Jeffreys’ (tail-area) one-liner: Sat night comedy [draft ii]
(2/24) Phil6334: February 20, 2014 (Spanos): Day #5
(2/26) Winner of the February 2014 palindrome contest (rejected post)
(2/26) Phil6334: Feb 24, 2014: Induction, Popper and pseudoscience (Day #4)

Categories: 3-year memory lane, Statistics | 2 Comments

R.A Fisher: “It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based”

Posted on February 26, 2017 by Mayo

A final entry in a week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962). Fisher is among the very few thinkers I have come across to recognize this crucial difference between induction and deduction:

In deductive reasoning all knowledge obtainable is already latent in the postulates. Rigorous is needed to prevent the successive inferences growing less and less accurate as we proceed. The conclusions are never more accurate than the data. In inductive reasoning we are performing part of the process by which new knowledge is created. The conclusions normally grow more and more accurate as more data are included. It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based. Statistical data are always erroneous, in greater or less degree. The study of inductive reasoning is the study of the embryology of knowledge, of the processes by means of which truth is extracted from its native ore in which it is infused with much error. (Fisher, “The Logic of Inductive Inference,” 1935, p 54).

Reading/rereading this paper is very worthwhile for interested readers. Some of the fascinating historical/statistical background may be found in a guest post by Aris Spanos: “R.A.Fisher: How an Outsider Revolutionized Statistics”

Categories: Fisher, phil/history of stat | 30 Comments

Guest Blog: STEPHEN SENN: ‘Fisher’s alternative to the alternative’

Posted on February 22, 2017 by Mayo

“You May Believe You Are a Bayesian But You Are Probably Wrong”

As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog a guest post by Stephen Senn from 2012. (I will comment in the comments.)

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading →

Categories: Fisher, S. Senn, Statistics | 13 Comments

R.A. Fisher: “Statistical methods and Scientific Induction”

Posted on February 20, 2017 by Mayo

I continue a week of Fisherian posts in honor of his birthday (Feb 17). This is his contribution to the “Triad”–an exchange between Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. They are each very short.

17 February 1890 — 29 July 1962

“Statistical Methods and Scientific Induction”

by Sir Ronald Fisher (1955)

SUMMARY

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

“Repeated sampling from the same population”,
Errors of the “second kind”,
“Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

The most noteworthy feature is Fisher’s position on Fiducial inference, typically downplayed. I’m placing a summary and link to Neyman’s response below–it’s that interesting. Continue reading →

Categories: fiducial probability, Fisher, Neyman, phil/history of stat | 6 Comments

Guest Blog: ARIS SPANOS: The Enduring Legacy of R. A. Fisher

Posted on February 19, 2017 by Mayo

By Aris Spanos

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most remarkable, but least recognized, achievement was to initiate the recasting of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

M_θ(x)={f(x;θ); θ∈Θ}; x∈Rⁿ;Θ⊂R^m; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the probabilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily conﬁned to the description of the distributional features of the data in hand using the histogram and the ﬁrst few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x₀:=(x₁,x₂,…,x_n) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Fisher was able to recast statistical inference by turning Karl Pearson’s approach, proceeding from data x₀in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespeciﬁed M_θ(x) (a ‘hypothetical inﬁnite population’), and view x₀as a ‘typical’ realization thereof; see Spanos (1999). Continue reading →

Categories: Fisher, Spanos, Statistics | Tags: E S Pearson, Frequentist inference, induction, Jerzy Neyman, Models/Modelling, Ronald Fisher, statistical model | Leave a comment

R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

Posted on February 17, 2017 by Mayo

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I’ll post some different Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!

“Two New Properties of Mathematical Likelihood“

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H₀ is more powerful than any other equivalent test, with regard to an alternative hypothesis H₁, when it rejects H₀ in a set of samples having an assigned aggregate frequency ε when H₀ is true, and the greatest possible aggregate frequency when H₁ is true. Continue reading →

Categories: Fisher, phil/history of stat, Statistics | Tags: Bayesianism, induction, Ronald Fisher, significance tests | 2 Comments

Winner of the January 2017 Palindrome contest: Cristiano Sabiu

Posted on February 16, 2017 by Mayo

Winner of January 2017 Palindrome Contest: (a dozen book choices)

Cristiano Sabiu: Postdoctoral researcher in Cosmology and Astrophysics

Palindrome: El truth supremo nor tsar is able, Elba Sir Astronomer push turtle.

The requirement: A palindrome using “astronomy” or “(astronomer/astronomical” (and Elba, of course).

Book choice: Error and the Growth of Experimental Knowledge (D. Mayo 1996, Chicago)

Bio: Cristiano Sabiu is a postdoctoral researcher in Cosmology and Astrophysics, working on Dark Energy and testing Einstein’s theory of General Relativity. He was born in Scotland with Italian roots and currently resides in Daejeon, South Korea.

Statement: This was my first palindrome! I was never very interested in writing when I was younger (I almost failed English at school!). However, as my years progress I feel that writing/poetry may be the easiest way for us non-artists to express that which cannot easily be captured by our theorems and logical frameworks. Constrained writing seems to open some of those internal mental doors, I think I am hooked now. Thanks for organising this!

Mayo Comment: Thanks for entering Cristiano, you just made the “time extension” for this month. That means we won’t have a second month of “astronomy” and the judges will have to come up with a new word. I’m glad you’re hooked. Good choice of book! I especially like the “truth supremo/push turtle” . I’m also very interested in experimental testing of GTR–we’ll have to communicate on this.

Mayo’s January attempts (selected):

Elba rap star comedy: Mr. Astronomy. Testset tests etymon or tsar, my democrats’ parable.
Parable for astronomy gym, on or tsar of Elba rap.

Categories: Palindrome | Leave a comment

Cox’s (1958) weighing machine example

Posted on February 12, 2017 by Mayo

A famous chestnut given by Cox (1958) recently came up in conversation. The example “is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992, p. 582). When I describe it, you’ll find it hard to believe many regard it as causing an earthquake in statistical foundations, unless you’re already steeped in these matters. If half the time I reported my weight from a scale that’s always right, and half the time use a scale that gets it right with probability .5, would you say I’m right with probability ¾? Well, maybe. But suppose you knew that this measurement was made with the scale that’s right with probability .5? The overall error probability is scarcely relevant for giving the warrant of the particular measurement,knowing which scale was used. Continue reading →

Categories: Error Statistics, Sir David Cox, Statistics, strong likelihood principle | 1 Comment

Hocus pocus! Adopt a magician’s stance, if you want to reveal statistical sleights of hand

Posted on February 7, 2017 by Mayo

Here’s the follow-up post to the one I reblogged on Feb 3 (please read that one first). When they sought to subject Uri Geller to the scrutiny of scientists, magicians had to be brought in because only they were sufficiently trained to spot the subtle sleight of hand shifts by which the magician tricks by misdirection. We, too, have to be magicians to discern the subtle misdirections and shifts of meaning in the discussions of statistical significance tests (and other methods)—even by the same statistical guide. We needn’t suppose anything deliberately devious is going on at all! Often, the statistical guidebook reflects shifts of meaning that grow out of one or another critical argument. These days, they trickle down quickly to statistical guidebooks, thanks to popular articles on the “statistics crisis in science”. The danger is that their own guidebooks contain inconsistencies. To adopt the magician’s stance is to be on the lookout for standard sleights of hand. There aren’t that many.[0]

I don’t know Jim Frost, but he gives statistical guidance at the minitab blog. The purpose of my previous post is to point out that Frost uses the probability of a Type I error in two incompatible ways in his posts on significance tests. I assumed he’d want to clear this up, but so far he has not. His response to a comment I made on his blog is this: Continue reading →

Categories: frequentist/Bayesian, P-values, reforming the reformers, S. Senn, Statistics | 39 Comments

High error rates in discussions of error rates: no end in sight

Posted on February 3, 2017 by Mayo

27D0BB5300000578-3168627-image-a-27_1437433320306

waiting for the other shoe to drop…

“Guides for the Perplexed” in statistics become “Guides to Become Perplexed” when “error probabilities” (in relation to statistical hypotheses tests) are confused with posterior probabilities of hypotheses. Moreover, these posteriors are neither frequentist, subjectivist, nor default. Since this doublespeak is becoming more common in some circles, it seems apt to reblog a post from one year ago (you may wish to check the comments).

Do you ever find yourself holding your breath when reading an exposition of significance tests that’s going swimmingly so far? If you’re a frequentist in exile, you know what I mean. I’m sure others feel this way too. When I came across Jim Frost’s posts on The Minitab Blog, I thought I might actually have located a success story. He does a good job explaining P-values (with charts), the duality between P-values and confidence levels, and even rebuts the latest “test ban” (the “Don’t Ask, Don’t Tell” policy). Mere descriptive reports of observed differences that the editors recommend, Frost shows, are uninterpretable without a corresponding P-value or the equivalent. So far, so good. I have only small quibbles, such as the use of “likelihood” when meaning probability, and various and sundry nitpicky things. But watch how in some places significance levels are defined as the usual error probabilities —indeed in the glossary for the site—while in others it is denied they provide error probabilities. In those other places, error probabilities and error rates shift their meaning to posterior probabilities, based on priors representing the “prevalence” of true null hypotheses.

Begin with one of his kosher posts “Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics” (blue is Frost): Continue reading →

Categories: highly probable vs highly probed, J. Berger, reforming the reformers, Statistics | 1 Comment

3 YEARS AGO (JANUARY 2014): MEMORY LANE

Posted on January 28, 2017 by Mayo

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: January 2014. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 3 others I’d recommend[2]. Posts that are part of a “unit” or a group count as one. This month, I’m grouping the 3 posts from my seminar with A. Spanos, counting them as 1.

January 2014

(1/2) Winner of the December 2013 Palindrome Book Contest (Rejected Post)
(1/3) Error Statistics Philosophy: 2013
(1/4) Your 2014 wishing well. …
(1/7) “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech)
(1/11) Two Severities? (PhilSci and PhilStat)
(1/14) Statistical Science meets Philosophy of Science: blog beginnings
(1/16) Objective/subjective, dirty hands and all that: Gelman/Wasserman blogolog (ii)
(1/18) Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]
(1/22) Phil6334: “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos (Virginia Tech) UPDATE: JAN 21
(1/24) Phil 6334: Slides from Day #1: Four Waves in Philosophy of Statistics
(1/25) U-Phil (Phil 6334) How should “prior information” enter in statistical inference?
(1/27) Winner of the January 2014 palindrome contest (rejected post)
(1/29) BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics

.
(1/31) Phil 6334: Day #2 Slides

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30, 2016-very convenient.

Save

Categories: 3-year memory lane, Bayesian/frequentist, Statistics | 1 Comment

The “P-values overstate the evidence against the null” fallacy

Posted on January 19, 2017 by Mayo

The allegation that P-values overstate the evidence against the null hypothesis continues to be taken as gospel in discussions of significance tests. All such discussions, however, assume a notion of “evidence” that’s at odds with significance tests–generally Bayesian probabilities of the sort used in Jeffrey’s-Lindley disagreement (default or “I’m selecting from an urn of nulls” variety). Szucs and Ioannidis (in a draft of a 2016 paper) claim “it can be shown formally that the definition of the p value does exaggerate the evidence against H0” (p. 15) and they reference the paper I discuss below: Berger and Sellke (1987). It’s not that a single small P-value provides good evidence of a discrepancy (even assuming the model, and no biasing selection effects); Fisher and others warned against over-interpreting an “isolated” small P-value long ago. But the formulation of the “P-values overstate the evidence” meme introduces brand new misinterpretations into an already confused literature! The following are snippets from some earlier posts–mostly this one–and also includes some additions from my new book (forthcoming).

1. What you should ask…

When you hear the familiar refrain, “We all know that P-values overstate the evidence against the null hypothesis”, what you should ask is:

“What do you mean by overstating the evidence against a hypothesis?”

One honest answer is: Continue reading →

Categories: Bayesian/frequentist, fallacy of rejection, highly probable vs highly probed, P-values, Statistics | 47 Comments

Winners of December Palindrome: Kyle Griffiths & Eileen Flanagan

Posted on January 14, 2017 by Mayo

Winners of the December 2016 Palindrome contest

Since both November and December had the contest word verifies/reverifies, the judges decided to give two prizes this month. Thank you both for participating!

Kyle Griffiths

Palindrome: Sleep, raw Elba, ere verified ire; Sir, rise, ride! If I revere able war peels.

The requirement: A palindrome using “verifies” (reverifies) or “verified” (reverified) and Elba, of course.

Statement: Here’s my December submission, hope you like it, it has a kind of revolutionary war theme. I have no particular history of palindrome-writing or contest-entering. Instead, I found Mayo’s work via the recommendation of Jeremy Fox of Dynamic Ecology. I am interested in her take on modern statistical practices in ecology, and generally in understanding what makes scientific methods robust and reliable. I’m an outsider to philosophy and stats (I have an MS in Biology), so I appreciate the less-formal tone of the blog. I’m really looking forward to Mayo’s next book.

Book choice (out of 12 or more): Principles of Applied Statistics (D. R. Cox and C. A. Donnelly 2011, Cambridge: Cambridge University Press)

Bio: Part-time Biology Instructor, Scientific Aide for California Dept. of Fish & Wildlife. Interested in aquatic ecology, fish population dynamics.

*******************************************************************************************

Eileen Flanagan

Palindrome: Elba man, error reels inanities. I verified art I trade, if I revise it in an isle. Error renamable.

The requirement: A palindrome using “verifies” (reverifies) or “verified” (reverified) and Elba, of course.

Bio: Retired civil servant with a philosophy Ph.D; a bit camera shy so used a stand-in for my photo. 🙂

Statement: I found your blog searching for information on fraud in science a few years ago, and now that I am retired, I am enjoying twisting my mind around palindromes and other word games that I find on-line. 🙂

Book choice (out of 12 or more): For my book, I would like a copy of Error and the Growth of Experimental Knowledge (D. G. Mayo, 1996, Chicago: Chicago University Press).

*******************************************************************************************

Some of Mayo’s attempts, posted through Nov-Dec:

Elba felt busy, reverifies use. I fire very subtle fable.

To I: disabled racecar ties. I verified or erode, if I revise it. Race card: Elba’s idiot.

Elba, I rave to men: “I felt busy!” Reverified, I hide, I fire very subtle fine mote variable.

I deified able deities. I verified a rap parade. If I revise, I tied. Elba deified I.

Categories: Announcement, Palindrome | Leave a comment

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Understanding Reproducibility & Error Correction in Science

Posted on January 7, 2017 by Mayo

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2016–2017
57th Annual Program

Download the 57th Annual Program

The Alfred I. Taub forum:

UNDERSTANDING REPRODUCIBILITY & ERROR CORRECTION IN SCIENCE

Cosponsored by GMS and BU’s BEST at Boston University.
Friday, March 17, 2017
1:00 p.m. – 5:00 p.m.
The Terrace Lounge, George Sherman Union
775 Commonwealth Avenue

Reputation, Variation, &, Control: Historical Perspectives
Jutta Schickore History and Philosophy of Science & Medicine, Indiana University, Bloomington.
Crisis in Science: Time for Reform?
Arturo Casadevall Molecular Microbiology & Immunology, Johns Hopkins
Severe Testing: The Key to Error Correction
Deborah Mayo Philosophy, Virginia Tech
Replicate That…. Maintaining a Healthy Failure Rate in Science
Stuart Firestein Biological Sciences, Columbia

Categories: Announcement, philosophy of science, Philosophy of Statistics, Statistical fraudbusting, Statistics | Leave a comment

Midnight With Birnbaum (Happy New Year 2016)

Posted on December 31, 2016 by Mayo

Just as in the past 5 years since I’ve been blogging, I revisit that spot in the road at 11p.m., just outside the Elbar Room, get into a strange-looking taxi, and head to “Midnight With Birnbaum”. (The pic on the left is the only blurry image I have of the club I’m taken to.) I wonder if the car will come for me this year, given that my Birnbaum article has been out since 2014… The (Strong) Likelihood Principle–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics (and cognate methods). Yet as Birnbaum insisted, the “confidence concept” is the “one rock in a shifting scene” of statistical foundations, insofar as there’s interest in controlling the frequency of erroneous interpretations of data. (See my rejoinder.) Birnbaum bemoaned the lack of an explicit evidential interpretation of N-P methods. Maybe in 2017? Anyway, it’s 6 hrs later here, so I’m about to leave for that spot in the road… If I’m picked up, I’ll add an update at the end.

You know how in that (not-so) recent Woody Allen movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf? He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve ~~2011~~ ~~2012~~, ~~2013~~, ~~2014~~, ~~2015~~, 2016) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i] There are a couple of brief (12/31/14 & 15) updates at the end.

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics. I happen to be writing on your famous argument about the likelihood principle (LP). (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept. Continue reading →

Categories: Birnbaum Brakes, Statistics, strong likelihood principle | Tags: Birnbaum, fallacious argument, Likelihood Principle, proofs | 21 Comments

Szucs & Ioannidis Revive the Limb-Sawing Fallacy

Posted on December 30, 2016 by Mayo

When logical fallacies of statistics go uncorrected, they are repeated again and again…and again. And so it is with the limb-sawing fallacy I first posted in one of my “Overheard at the Comedy Hour” posts.* It now resides as a comic criticism of significance tests in a paper by Szucs and Ioannidis (posted this week), Here’s their version:

“[P]aradoxically, when we achieve our goal and successfully reject H₀we will actually be left in complete existential vacuum because during the rejection of H₀NHST ‘saws off its own limb’ (Jaynes, 2003; p. 524): If we manage to reject H₀then it follows that pr(data or more extreme data|H₀) is useless because H₀ is not true” (p.15).

Here’s Jaynes (p. 524):

“Suppose we decide that the effect exists; that is, we reject [null hypothesis] H₀. Surely, we must also reject probabilities conditional on H₀, but then what was the logical justification for the decision? Orthodox logic saws off its own limb.’ “

Ha! Ha! By this reasoning, no hypothetical testing or falsification could ever occur. As soon as H is falsified, the grounds for falsifying disappear! If H: all swans are white, then if I see a black swan, H is falsified. But according to this criticism, we can no longer assume the deduced prediction from H! What? Continue reading →

Categories: Error Statistics, P-values, reforming the reformers, Statistics | 14 Comments

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2016–2017 57th Annual Program

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2016–2017 57th Annual Program

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

2016–2017
57th Annual Program

2016–2017
57th Annual Program