Can’t Take the Fiducial Out of Fisher (if you want to understand the N-P performance philosophy) [i]

imgres

R.A. Fisher: February 17, 1890 – July 29, 1962

Continuing with posts in recognition of R.A. Fisher’s birthday, I post one from a couple of years ago on a topic that had previously not been discussed on this blog: Fisher’s fiducial probability

[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).

The entire episode of fiducial probability is fraught with minefields. Many say it was Fisher’s biggest blunder; others suggest it still hasn’t been understood. The majority of discussions omit the side trip to the Fiducial Forest altogether, finding the surrounding brambles too thorny to penetrate. Besides, a fascinating narrative about the Fisher-Neyman-Pearson divide has managed to bloom and grow while steering clear of fiducial probability–never mind that it remained a centerpiece of Fisher’s statistical philosophy. I now think that this is a mistake. It was thought, following Lehman (1993) and others, that we could take the fiducial out of Fisher and still understand the core of the Neyman-Pearson vs Fisher (or Neyman vs Fisher) disagreements. We can’t. Quite aside from the intrinsic interest in correcting the “he said/he said” of these statisticians, the issue is intimately bound up with the current (flawed) consensus view of frequentist error statistics.

So what’s fiducial inference? I follow Cox (2006), adapting for the case of the lower limit: Continue reading

Categories: fiducial probability, Fisher, Statistics | Leave a comment

R.A. Fisher: “Statistical methods and Scientific Induction”

I continue a week of Fisherian posts in honor of his birthday (Feb 17). This is his contribution to the “Triad”–an exchange between  Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. They are each very short and bear rereading

17 February 1890 — 29 July 1962

“Statistical Methods and Scientific Induction”

by Sir Ronald Fisher (1955)

SUMMARY

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of  acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

  1. “Repeated sampling from the same population”,
  2. Errors of the “second kind”,
  3. “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

 

Note on an Article by Sir Ronald Fisher

by Jerzy Neyman (1956)

Neyman

Neyman

Summary

(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation.  (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible.  (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values.  The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight.  (4)  The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

 

E.S. Pearson

“Statistical Concepts in Their Relation to Reality”.

by E.S. Pearson (1955)

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”. There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!…

To continue reading, “Statistical Concepts in Their Relation to Reality” click HERE

Categories: E.S. Pearson, fiducial probability, Fisher, Neyman, phil/history of stat | 3 Comments

R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)

A SPANOS

.

In recognition of R.A. Fisher’s birthday on February 17….

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

Categories: Fisher, phil/history of stat, Spanos, Statistics | 3 Comments

Guest Blog: STEPHEN SENN: ‘Fisher’s alternative to the alternative’

“You May Believe You Are a Bayesian But You Are Probably Wrong”

.

As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog a guest post by Stephen Senn from 2012/2017.  The comments from 2017 lead to a troubling issue that I will bring up in the comments today.

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading

Categories: Fisher, S. Senn, Statistics | 1 Comment

Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I’ll post some Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!

Two New Properties of Mathematical Likelihood

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

  The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

Categories: Fisher, phil/history of stat, Statistics | Tags: , , , | 1 Comment

3 YEARS AGO (FEBRUARY 2015): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: February 2015 [1]. Here are some items to for your Saturday night reading and rereading. Three are in preparation for Fisher’s birthday next week (Feb 17). One is a Saturday night comedy where Jeffreys appears to substitute for Jay Leno. The 2/25 entry lets you go back 6 years where there’s more on Fisher, a bit of statistical theatre (of the absurd), Misspecification tests, and a guest post (by Schachtman) on that infamous Matrixx court case (wherein the Supreme Court is thought to have weighed in on statistical significance tests). The comments are often the most interesting parts of these old posts.

February 2015

  • 02/05 Stephen Senn: Is Pooling Fooling? (Guest Post)
  • 02/10 What’s wrong with taking (1 – β)/α, as a likelihood ratio comparing H0 and H1?
  • 02/13 Induction, Popper and Pseudoscience
  • 02/16 Continuing the discussion on truncation, Bayesian convergence and testing of priors
  • 02/16 R. A. Fisher: ‘Two New Properties of Mathematical Likelihood’: Just before breaking up (with N-P)
  • 02/17 R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)

  • 02/19 Stephen Senn: Fisher’s Alternative to the Alternative
  • 02/21 Sir Harold Jeffreys’ (tail area) one-liner: Saturday night comedy (b)
  • 02/25 3 YEARS AGO: (FEBRUARY 2012) MEMORY LANE

    .

  • 02/27 Big Data is the New Phrenology?

 

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

 

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Categories: 3-year memory lane | Leave a comment

S. Senn: Evidence Based or Person-centred? A Statistical debate (Guest Post)

.

Stephen Senn
Head of  Competence Center
for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Evidence Based or Person-centred? A statistical debate

It was hearing Stephen Mumford and Rani Lill Anjum (RLA) in January 2017 speaking at the Epistemology of Causal Inference in Pharmacology conference in Munich organised by Jürgen Landes, Barbara Osmani and Roland Poellinger, that inspired me to buy their book, Causation A Very Short Introduction[1]. Although I do not agree with all that is said in it and also could not pretend to understand all it says, I can recommend it highly as an interesting introduction to issues in causality, some of which will be familiar to statisticians but some not at all.

Since I have a long-standing interest in researching into ways of delivering personalised medicine, I was interested to see a reference on Twitter to a piece by RLA, Evidence based or person centered? An ontological debate, in which she claims that the choice between evidence based or person-centred medicine is ultimately ontological[2]. I don’t dispute that thinking about health care delivery in ontological terms might be interesting. However, I do dispute that there is any meaningful choice between evidence based medicine (EBM) and person centred healthcare (PCH). To suggest so is to commit a category mistake by suggesting that means are alternatives to ends.

In fact, EBM will be essential to delivering effective PCH, as I shall now explain. Continue reading

Categories: personalized medicine, RCTs, S. Senn | 7 Comments

3 YEARS AGO (JANUARY 2015): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: January 2015. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green 2-3 others of general relevance to philosophy of statistics (in months where I’ve blogged a lot)[2].  Posts that are part of a “unit” or a group count as one.

 

January 2015

  • 01/02 Blog Contents: Oct.- Dec. 2014
  • 01/03 No headache power (for Deirdre)
  • 01/04 Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)
  • 01/07 “When Bayesian Inference Shatters” Owhadi, Scovel, and Sullivan (reblog)
  • 01/08 On the Brittleness of Bayesian Inference–An Update: Owhadi and Scovel (guest post).
  • 01/12 “Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)
  • 01/16 Winners of the December 2014 Palindrome Contest: TWO!
  • 01/18 Power Analysis and Non-Replicability: If bad statistics is prevalent in your field, does it follow you can’t be guilty of scientific fraud?
  • 01/21 Some statistical dirty laundry.
  • 01/24 What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri
  • 01/26 Trial on Anil Potti’s (clinical) Trial Scandal Postponed Because Lawyers Get the Sniffles (updated)
  • 01/27 3 YEARS AGO: (JANUARY 2012) MEMORY LANE
  • 01/31 Saturday Night Brainstorming and Task Forces: (4th draft)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 -a very convenient way to allow data-dependent choices (note why it’s legit in selecting blog posts, on severity grounds).

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Categories: 3-year memory lane | Leave a comment

S. Senn: Being a statistician means never having to say you are certain (Guest Post)

.

Stephen Senn
Head of  Competence Center
for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Being a statistician means never having to say you are certain

A recent discussion of randomised controlled trials[1] by Angus Deaton and Nancy Cartwright (D&C) contains much interesting analysis but also, in my opinion, does not escape rehashing some of the invalid criticisms of randomisation with which the literatures seems to be littered. The paper has two major sections. The latter, which deals with generalisation of results, or what is sometime called external validity, I like much more than the former which deals with internal validity. It is the former I propose to discuss.

Continue reading

Categories: Error Statistics, RCTs, Statistics | 26 Comments

3 YEARS AGO (DECEMBER 2014): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: December 2014. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green 3- 4 others of general relevance to philosophy of statistics (in months where I’ve blogged a lot)[2].  Posts that are part of a “unit” or a group count as one.

December 2014

  • 12/02 My Rutgers Seminar: tomorrow, December 3, on philosophy of statistics
  • 12/04 “Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance” (Dec 3 Seminar slides)
  • 12/06 How power morcellators inadvertently spread uterine cancer
  • 12/11 Msc. Kvetch: What does it mean for a battle to be “lost by the media”?
  • 12/13 S. Stanley Young: Are there mortality co-benefits to the Clean Power Plan? It depends. (Guest Post)
  • 12/17 Announcing Kent Staley’s new book, An Introduction to the Philosophy of Science (CUP)

  • 12/21 Derailment: Faking Science: A true story of academic fraud, by Diederik Stapel (translated into English)
  • 12/23 All I want for Chrismukkah is that critics & “reformers” quit howlers of testing (after 3 yrs of blogging)! So here’s Aris Spanos “Talking Back!”
  • 12/26 3 YEARS AGO: MONTHLY (Dec.) MEMORY LANE
  • 12/29 To raise the power of a test is to lower (not raise) the “hurdle” for rejecting the null (Ziliac and McCloskey 3 years on)
  • 12/31 Midnight With Birnbaum (Happy New Year)

 

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 -a very convenient way to allow data-dependent choices (note why it’s legit in selecting blog posts, on severity grounds).

 

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Categories: 3-year memory lane | Leave a comment

Midnight With Birnbaum (Happy New Year 2017)

 Just as in the past 6 years since I’ve been blogging, I revisit that spot in the road at 11p.m., just outside the Elbar Room, look to get into a strange-looking taxi, to head to “Midnight With Birnbaum”. (The pic on the left is the only blurry image I have of the club I’m taken to.) I wondered if the car would come for me this year, as I waited out in the cold, given that my Birnbaum article has been out since 2014. The (Strong) Likelihood Principle–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics (and cognate methods). 2018 will be the 60th birthday of Cox’s “weighing machine” example, which was the start of Birnbaum’s attempted proof. Yet as Birnbaum insisted, the “confidence concept” is the “one rock in a shifting scene” of statistical foundations, insofar as there’s interest in controlling the frequency of erroneous interpretations of data. (See my rejoinder.) Birnbaum bemoaned the lack of an explicit evidential interpretation of N-P methods. Maybe in 2018? Anyway, the cab is finally here…the rest is live. Happy New Year! Continue reading

Categories: Birnbaum Brakes, strong likelihood principle | Tags: , , , | 3 Comments

60 yrs of Cox’s (1958) weighing machine, & links to binge-read the Likelihood Principle

IMG_0079

.

2018 will mark 60 years since the famous chestnut from Sir David Cox (1958). The example  “is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992, p. 582). When I describe it, you’ll find it hard to believe many regard it as causing an earthquake in statistical foundations, unless you’re already steeped in these matters. A simple version: If half the time I reported my weight from a scale that’s always right, and half the time use a scale that gets it right with probability .5, would you say I’m right with probability ¾? Well, maybe. But suppose you knew that this measurement was made with the scale that’s right with probability .5? The overall error probability is scarcely relevant for giving the warrant of the particular measurement, knowing which scale was used. So what’s the earthquake? First a bit more on the chestnut. Here’s an excerpt from Cox and Mayo (2010, 295-8): Continue reading

Categories: Sir David Cox, Statistics, strong likelihood principle | 4 Comments

Why significance testers should reject the argument to “redefine statistical significance”, even if they want to lower the p-value*

.

An argument that assumes the very thing that was to have been argued for is guilty of begging the question; signing on to an argument whose conclusion you favor even though you cannot defend its premises is to argue unsoundly, and in bad faith. When a whirlpool of “reforms” subliminally alter  the nature and goals of a method, falling into these sins can be quite inadvertent. Start with a simple point on defining the power of a statistical test.

I. Redefine Power?

Given that power is one of the most confused concepts from Neyman-Pearson (N-P) frequentist testing, it’s troubling that in “Redefine Statistical Significance”, power gets redefined too. “Power,” we’re told, is a Bayes Factor BF “obtained by defining H1 as putting ½ probability on μ = ± m for the value of m that gives 75% power for the test of size α = 0.05. This H1 represents an effect size typical of that which is implicitly assumed by researchers during experimental design.” (material under Figure 1). Continue reading

Categories: Bayesian/frequentist, fallacy of rejection, P-values, reforming the reformers, spurious p values | 15 Comments

How to avoid making mountains out of molehills (using power and severity)

.

In preparation for a new post that takes up some of the recent battles on reforming or replacing p-values, I reblog an older post on power, one of the most misunderstood and abused notions in statistics. (I add a few “notes on howlers”.)  The power of a test T in relation to a discrepancy from a test hypothesis H0 is the probability T will lead to rejecting H0 when that discrepancy is present. Power is sometimes misappropriated to mean something only distantly related to the probability a test leads to rejection; but I’m getting ahead of myself. This post is on a classic fallacy of rejection. Continue reading

Categories: CIs and tests, Error Statistics, power | 4 Comments

The Conversion of Subjective Bayesian, Colin Howson, & the problem of old evidence (i)

.

“The subjective Bayesian theory as developed, for example, by Savage … cannot solve the deceptively simple but actually intractable old evidence problem, whence as a foundation for a logic of confirmation at any rate, it must be accounted a failure.” (Howson, (2017), p. 674)

What? Did the “old evidence” problem cause Colin Howson to recently abdicate his decades long position as a leading subjective Bayesian? It seems to have. I was so surprised to come across this in a recent perusal of Philosophy of Science that I wrote to him to check if it is really true. (It is.) I thought perhaps it was a different Colin Howson, or the son of the one who co-wrote 3 editions of Howson and Urbach: Scientific Reasoning: The Bayesian Approach espousing hard-line subjectivism since 1989.[1] I am not sure which of the several paradigms of non-subjective or default Bayesianism Howson endorses (he’d argued for years, convincingly, against any one of them), nor how he handles various criticisms (Kass and Wasserman 1996), I put that aside. Nor have I worked through his, rather complex, paper to the extent necessary, yet. What about the “old evidence” problem, made famous by Clark Glymour 1980?  What is it? Continue reading

Categories: Bayesian priors, objective Bayesians, Statistics | Tags: | 25 Comments

Erich Lehmann’s 100 Birthday: Neyman Pearson vs Fisher on P-values

Erich Lehmann 20 November 1917 – 12 September 2009

Erich Lehmann was born 100 years ago today! (20 November 1917 – 12 September 2009). Lehmann was Neyman’s first student at Berkeley (Ph.D 1942), and his framing of Neyman-Pearson (NP) methods has had an enormous influence on the way we typically view them.*

.

I got to know Erich in 1997, shortly after publication of EGEK (1996). One day, I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that).  He began by telling me that he was sitting in a very large room at an ASA (American Statistical Association) meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, wood table sat just one book, all alone, shiny red.

He said ” I wonder if it might be of interest to me!”  So he walked up to it….  It turned out to be my Error and the Growth of Experimental Knowledge (1996, Chicago), which he reviewed soon after[0]. (What are the chances?) Some related posts on Lehmann’s letter are here and here.

Continue reading

Categories: Fisher, P-values, phil/history of stat | 3 Comments

3 YEARS AGO (NOVEMBER 2014): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: November 2014. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green 3- 4 others of general relevance to philosophy of statistics (in months where I’ve blogged a lot)[2].  Posts that are part of a “unit” or a group count as one (11/1/14 & 11/09/14 and 11/15/14 & 11/25/14 are grouped). The comments are worth checking out.

 

November 2014

  • 11/01 Philosophy of Science Assoc. (PSA) symposium on Philosophy of Statistics in the Higgs Experiments “How Many Sigmas to Discovery?”
  • 11/09 “Statistical Flukes, the Higgs Discovery, and 5 Sigma” at the PSA
  • 11/11 The Amazing Randi’s Million Dollar Challenge
  • 11/12 A biased report of the probability of a statistical fluke: Is it cheating?
  • 11/15 Why the Law of Likelihood is bankrupt–as an account of evidence

     

  • 11/18 Lucien Le Cam: “The Bayesians Hold the Magic”
  • 11/20 Erich Lehmann: Statistician and Poet
  • 11/22 Msc Kvetch: “You are a Medical Statistic”, or “How Medical Care Is Being Corrupted”
  • 11/25 How likelihoodists exaggerate evidence from statistical tests
  • 11/30 3 YEARS AGO: MONTHLY (Nov.) MEMORY LANE

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 -a very convenient way to allow data-dependent choices (note why it’s legit in selecting blog posts, on severity grounds).

 

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Categories: 3-year memory lane | 1 Comment

Yoav Benjamini, “In the world beyond p < .05: When & How to use P < .0499…"

.

These were Yoav Benjamini’s slides,”In the world beyond p<.05: When & How to use P<.0499…” from our session at the ASA 2017 Symposium on Statistical Inference (SSI): A World Beyond p < 0.05. (Mine are in an earlier post.) He begins by asking:

However, it’s mandatory to adjust for selection effects, and Benjamini is one of the leaders in developing ways to carry out the adjustments. Even calling out the avenues for cherry-picking and multiple testing, long known to invalidate p-values, would make replication research more effective (and less open to criticism). Continue reading

Categories: Error Statistics, P-values, replication research, selection effects | 22 Comments

Going round and round again: a roundtable on reproducibility & lowering p-values

.

There will be a roundtable on reproducibility Friday, October 27th (noon Eastern time), hosted by the International Methods Colloquium, on the reproducibility crisis in social sciences motivated by the paper, “Redefine statistical significance.” Recall, that was the paper written by a megateam of researchers as part of the movement to require p ≤ .005, based on appraising significance tests by a Bayes Factor analysis, with prior probabilities on a point null and a given alternative. It seems to me that if you’re prepared to scrutinize your frequentist (error statistical) method on grounds of Bayes Factors, then you must endorse using Bayes Factors (BFs) for inference to begin with. If you don’t endorse BFs–and, in particular, the BF required to get the disagreement with p-values–*, then it doesn’t make sense to appraise your non-Bayesian method on grounds of agreeing or disagreeing with BFs. For suppose you assess the recommended BFs from the perspective of an error statistical account–that is, one that checks how frequently the method would uncover or avoid the relevant mistaken inference.[i] Then, if you reach the stipulated BF level against a null hypothesis, you will find the situation is reversed, and the recommended BF exaggerates the evidence!  (In particular, with high probability, it gives an alternative H’ fairly high posterior probability, or comparatively higher probability, even though H’ is false.) Failing to reach the BF cut-off, by contrast, can find no evidence against, and even finds evidence for, a null hypothesis with high probability, even when non-trivial discrepancies exist. They’re measuring very different things, and it’s illicit to expect an agreement on numbers.[ii] We’ve discussed this quite a lot on this blog (2 are linked below [iii]).

If the given list of panelists is correct, it looks to be 4 against 1, but I’ve no doubt that Lakens can handle it.

Continue reading

Categories: Announcement, P-values, reforming the reformers, selection effects | 5 Comments

Deconstructing “A World Beyond P-values”

.A world beyond p-values?

I was asked to write something explaining the background of my slides (posted here) in relation to the recent ASA “A World Beyond P-values” conference. I took advantage of some long flight delays on my return to jot down some thoughts:

The contrast between the closing session of the conference “A World Beyond P-values,” and the gist of the conference itself, shines a light on a pervasive tension within the “Beyond P-Values” movement. Two very different debates are taking place. First there’s the debate about how to promote better science. This includes welcome reminders of the timeless demands of rigor and integrity required to avoid deceiving ourselves and others–especially crucial in today’s world of high-powered searches and Big Data. That’s what the closing session was about. [1] Continue reading

Categories: P-values, Philosophy of Statistics, reforming the reformers | 8 Comments

Blog at WordPress.com.