Statistics

The Conversion of Subjective Bayesian, Colin Howson, & the problem of old evidence (i)

.

“The subjective Bayesian theory as developed, for example, by Savage … cannot solve the deceptively simple but actually intractable old evidence problem, whence as a foundation for a logic of confirmation at any rate, it must be accounted a failure.” (Howson, (2017), p. 674)

What? Did the “old evidence” problem cause Colin Howson to recently abdicate his decades long position as a leading subjective Bayesian? It seems to have. I was so surprised to come across this in a recent perusal of Philosophy of Science that I wrote to him to check if it is really true. (It is.) I thought perhaps it was a different Colin Howson, or the son of the one who co-wrote 3 editions of Howson and Urbach: Scientific Reasoning: The Bayesian Approach espousing hard-line subjectivism since 1989.[1] I am not sure which of the several paradigms of non-subjective or default Bayesianism Howson endorses (he’d argued for years, convincingly, against any one of them), nor how he handles various criticisms (Kass and Wasserman 1996), I put that aside. Nor have I worked through his, rather complex, paper to the extent necessary, yet. What about the “old evidence” problem, made famous by Clark Glymour 1980?  What is it?

Consider Jay Kadane, a well-known subjective Bayesian statistician. According to Kadane, the probability statement: Pr(d(X) ≥ 1.96) = .025

“is a statement about d(X) before it is observed. After it is observed, the event {d(X) ≥ 1.96} either happened or did not happen and hence has probability either one or zero” (2011, p. 439).

Knowing d0= 1.96, (the specific value of the test statistic d(X)), Kadane is saying, there’s no more uncertainty about it.* But would he really give it probability 1? If the probability of the data x is 1, Glymour argues, then Pr(x|H) also is 1, but then Pr(H|x) = Pr(H)Pr(x|H)/Pr(x) = Pr(H), so there is no boost in probability for a hypothesis or model arrived at after x. So does that mean known data doesn’t supply evidence for H? (Known data are sometimes said to violate temporal novelty: data are temporally novel only if the hypothesis or claim of interest came first.) If it’s got probability 1, this seems to be blocked. That’s the old evidence problem. Subjective Bayesianism is faced with the old evidence problem if known evidence has probability 1, or so the argument goes.

What’s the accepted subjective Bayesian solution to this?  (I’m really asking.) One attempt is to subtract out, or try to, the fact that x  is known, and envision being in a context prior to knowing x. That’s not very satisfactory or realistic, in general. Subjective Bayesians in statistics, I assume, just use the likelihoods and don’t worry about this: known data are an instance of a general random variable X, and you just use the likelihood once it’s known that {Xx}. But can you do this and also hold, with Kadane, that it’s an event with probability 1? I’ve always presumed that the problem was mainly for philosophers who want to assign probabilities to statements in a language, rather than focusing on random variables and their distributions, or statistical models (a mistake in my opinion). I also didn’t think subjective Bayesians in statistics were prepared to say, with Kadane, that an event has probability 1 after it’s observed or known. Yet if probability measures your uncertainty in the event, Kadane seems right. So how does the problem of old evidence get solved by subjective Bayesian practitioners? I asked Kadane years ago, but did not get a reply.

Any case where the data are known prior to constructing or selecting a hypothesis to accord with them, strictly speaking, would count as cases where data are known, or so it seems.** The most well known cases in philosophy allude to a known phenomenon, such as Mercury’s perihelion, as evidence for Einstein’s GTR. (The perihelion was long known as anomalous for Newton, yet GTR’s predicting it, without adjustments, is widely regarded as evidence for GTR.)[2] You can read some attempted treatments by philosophers in Howson’s paper; I discuss Garber’s attempt in Chapter 10, Mayo 1996 [EGEK], 10.2.[3] I’d like to hear from readers, regardless of statistical persuasion, how it’s handled in practice (or why it’s deemed unproblematic).

But wait, are we sure it isn’t also a problem for non-subjective or default Bayesians? In this paradigm (and there are several varieties), the prior probabilities in hypotheses are not taken to express degrees of belief but are given by various formal assignments, so as to have minimal impact on the posterior probability. Although the holy grail of finding “uninformative” default priors has been given up, default priors are at least supposed to ensure the data dominate in some sense.[4] A true blue subjective Bayesian like Kadane is unhappy with non-subjective priors. Rather than quantify prior beliefs, non-subjective priors are viewed as primitives or conventions or references for obtaining posterior probabilities. How are they to be interpreted? It’s not clear, but let’s put this aside to focus on the “old evidence” problem.

OK, so how do subjective Bayesians get around  the old evidence problem?

*I thank Jay Kadane for noticing I used the inequality in my original post 11/27/17. I haven’t digested his general reaction yet, stay tuned.
**There’s a place where Glymour (or Glymour, Scheines, Spirtes, and Kelly 1987) slyly argues that, strictly speaking, the data are always known by the time you appraise some some model–or so I seem to recall. But I’d have to research that or ask him.

[1] I’ll have to add a footnote to my new book (Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, CUP, 2018), as I allude to him as a subjectivist Bayesian philosopher throughout.
[2] I argue that the reason it was good evidence for GTR is precisely because it was long known, and yet all attempts to explain it were ad hoc, so that they failed to pass severe tests. The deflection effect, by contrast, was new and no one had discerned it before, let alone tried to explain it. Note that this is at odds with the idea that novel results count more for a theory H when H (temporally) predicts them, than when H accounts for known results. (Here the known perihelion of Mercury is thought to be better evidence for GTR than the novel deflection effect.) But the issue isn’t the novelty of the results, it’s how well-tested H is, or so I argue. (EGEK Chapter 8, p. 288).
[3] I don’t see that the newer attempts avoid the key problem in Garber’s. I’m not sure if Howson is rescinding the remark I quote from him in EGEK, p. 333. Here he was trying to solve it by subtracting the data out from what’s known.
[4] Some may want to use “informative” priors as well, but their meaning/rationale is unclear. Howson mentions Wes Salmon’s style of Bayesianism in this paper, but Salmon was a frequentist.

REFERENCES

-Glymour, C. (1980), Theory and Evidence, Princeton University Press. I’ve added a link to the relevant chapter, “Why I am Not a Bayesian” (from Fitelson resources). The relevant pages are 85-93.
Howson, C (2017), “Putting on the Garber Style? Better Not”, Philosophy of Science, 84 (October 2017) pp. 659–676.
-Kass, R. and Wasserman, L. (1996), “The Selection of Prior Distributions By Formal Rules”,  JASA 91: 1343-70.

Further References to Solutions (to this or Related problems): 

-Garber, Daniel. 1983. “Old Evidence and Logical Omniscience in Bayesian Confirmation Theory.” In Minnesota Studies in the Philosophy of Science, ed. J. Earman, 99–131. Minneapolis: University of Minnesota Press.
-Hartmann, Stephan, and Branden Fitelson. 2015. “A New Garber-Style Solution to the Problem of Old Evidence.” Philosophy of Science 82 (4): 712–17. H
Seidenfeld, T., Schervish, M., and Kadane, T. 2012. “What kind of uncertainty is that ?” Journal of Philosophy, (2012), pp 516-533.

Categories: Bayesian priors, objective Bayesians, Statistics | Tags: | 25 Comments

Statistical skepticism: How to use significance tests effectively: 7 challenges & how to respond to them

Here are my slides from the ASA Symposium on Statistical Inference : “A World Beyond p < .05”  in the session, “What are the best uses for P-values?”. (Aside from me,our session included Yoav Benjamini and David Robinson, with chair: Nalini Ravishanker.)

7 QUESTIONS

  • Why use a tool that infers from a single (arbitrary) P-value that pertains to a statistical hypothesis H0 to a research claim H*?
  • Why use an incompatible hybrid (of Fisher and N-P)?
  • Why apply a method that uses error probabilities, the sampling distribution, researcher “intentions” and violates the likelihood principle (LP)? You should condition on the data.
  • Why use methods that overstate evidence against a null hypothesis?
  • Why do you use a method that presupposes the underlying statistical model?
  • Why use a measure that doesn’t report effect sizes?
  • Why do you use a method that doesn’t provide posterior probabilities (in hypotheses)?

 

Categories: P-values, spurious p values, statistical tests, Statistics | Leave a comment

Egon Pearson’s Heresy

E.S. Pearson: 11 Aug 1895-12 June 1980.

Here’s one last entry in honor of Egon Pearson’s birthday: “Statistical Concepts in Their Relation to Reality” (Pearson 1955). I’ve posted it several times over the years (6!), but always find a new gem or two, despite its being so short. E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

(Nowadays, some people concentrate to an absurd extent on “science-wise error rates in dichotomous screening”.) Continue reading

Categories: phil/history of stat, Philosophy of Statistics, Statistics | Tags: , , | Leave a comment

Performance or Probativeness? E.S. Pearson’s Statistical Philosophy

egon pearson

E.S. Pearson (11 Aug, 1895-12 June, 1980)

This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ll blog some E. Pearson items this week, including, my latest reflection on a historical anecdote regarding Egon and the woman he wanted marry, and surely would have, were it not for his father Karl!

HAPPY BELATED BIRTHDAY EGON!

Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson.  Continue reading

Categories: highly probable vs highly probed, phil/history of stat, Statistics | Tags: | Leave a comment

Allan Birnbaum: Foundations of Probability and Statistics (27 May 1923 – 1 July 1976)

27 May 1923-1 July 1976

27 May 1923-1 July 1976

Today is Allan Birnbaum’s birthday. In honor of his birthday, I’m posting the articles in the Synthese volume that was dedicated to his memory in 1977. The editors describe it as their way of  “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. I paste a few snippets from the articles by Giere and Birnbaum. If you’re interested in statistical foundations, and are unfamiliar with Birnbaum, here’s a chance to catch up. (Even if you are, you may be unaware of some of these key papers.)

HAPPY BIRTHDAY ALLAN!

Synthese Volume 36, No. 1 Sept 1977: Foundations of Probability and Statistics, Part I

Editorial Introduction:

This special issue of Synthese on the foundations of probability and statistics is dedicated to the memory of Professor Allan Birnbaum. Professor Birnbaum’s essay ‘The Neyman-Pearson Theory as Decision Theory; and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory’ was received by the editors of Synthese in October, 1975, and a decision was made to publish a special symposium consisting of this paper together with several invited comments and related papers. The sad news about Professor Birnbaum’s death reached us in the summer of 1976, but the editorial project could nevertheless be completed according to the original plan. By publishing this special issue we wish to pay homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics. We are grateful to Professor Ronald Giere who wrote an introductory essay on Professor Birnbaum’s concept of statistical evidence and who compiled a list of Professor Birnbaum’s publications.

THE EDITORS

Continue reading

Categories: Birnbaum, Likelihood Principle, Statistics, strong likelihood principle | Tags: | 1 Comment

3 YEARS AGO (APRIL 2014): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: April 2014. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 4 others I’d recommend[2].  Posts that are part of a “unit” or a group count as one. For this month, I’ll include all the 6334 seminars as “one”.

April 2014

  • (4/1) April Fool’s. Skeptical and enthusiastic Bayesian priors for beliefs about insane asylum renovations at Dept of Homeland Security: I’m skeptical and unenthusiastic
  • (4/3) Self-referential blogpost (conditionally accepted*)
  • (4/5) Who is allowed to cheat? I.J. Good and that after dinner comedy hour. . ..
     
  • (4/6) Phil6334: Duhem’s Problem, highly probable vs highly probed; Day #9 Slides
  • (4/8) “Out Damned Pseudoscience: Non-significant results are the new ‘Significant’ results!” (update)
  • (4/12) “Murder or Coincidence?” Statistical Error in Court: Richard Gill (TEDx video)
  • (4/14) Phil6334: Notes on Bayesian Inference: Day #11 Slides
  • (4/16) A. Spanos: Jerzy Neyman and his Enduring Legacy
  • (4/17) Duality: Confidence intervals and the severity of tests
  • (4/19) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill)
  • (4/21) Phil 6334: Foundations of statistics and its consequences: Day#12
  • (4/23) Phil 6334 Visitor: S. Stanley Young, “Statistics and Scientific Integrity”
  • (4/26) Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day #13)
  • (4/30) Able Stats Elba: 3 Palindrome nominees for April! (rejected post)

 

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 (moved to 4) -very convenient way to allow data-dependent choices.

Save

Save

Save

Save

Save

Categories: 3-year memory lane, Statistics | Leave a comment

Jerzy Neyman and “Les Miserables Citations” (statistical theater in honor of his birthday)

images-14

Neyman April 16, 1894 – August 5, 1981

For my final Jerzy Neyman item, here’s the post I wrote for his birthday last year: 

A local acting group is putting on a short theater production based on a screenplay I wrote:  “Les Miserables Citations” (“Those Miserable Quotes”) [1]. The “miserable” citations are those everyone loves to cite, from their early joint 1933 paper:

We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis.

But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong. (Neyman and Pearson 1933, pp. 290-1).

In this early paper, Neyman and Pearson were still groping toward the basic concepts of tests–for example, “power” had yet to be coined. Taken out of context, these quotes have led to knee-jerk (behavioristic) interpretations which neither Neyman nor Pearson would have accepted. What was the real context of those passages? Well, the paper opens, just five paragraphs earlier, with a discussion of a debate between two French probabilists—Joseph Bertrand, author of “Calculus of Probabilities” (1907), and Emile Borel, author of “Le Hasard” (1914)! According to Neyman, what served “as an inspiration to Egon S. Pearson and myself in our effort to build a frequentist theory of testing hypotheses”(1977, p. 103) initially grew out of remarks of Borel, whose lectures Neyman had attended in Paris. He returns to the Bertrand-Borel debate in four different papers, and circles back to it often in his talks with his biographer, Constance Reid. His student Erich Lehmann (1993), regarded as the authority on Neyman, wrote an entire paper on the topic: “The Bertrand-Borel Debate and the Origins of the Neyman Pearson Theory”. Continue reading

Categories: E.S. Pearson, Neyman, Statistics | Leave a comment

Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen

neyman

April 16, 1894 – August 5, 1981

I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of  statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are justified only by dint of long-run error control. Controlling the “frequency of misdirected activities” in the midst of finding something out, or solving a problem of inquiry, on the other hand, are epistemological goals. What do you think?

Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena
by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

I recommend, especially, the example on home ownership. Here are two snippets: Continue reading

Categories: Error Statistics, Neyman, Statistics | Tags: | 2 Comments

If you’re seeing limb-sawing in P-value logic, you’re sawing off the limbs of reductio arguments

images-2I was just reading a paper by Martin and Liu (2014) in which they allude to the “questionable logic of proving H0 false by using a calculation that assumes it is true”(p. 1704).  They say they seek to define a notion of “plausibility” that

“fits the way practitioners use and interpret p-values: a small p-value means H0 is implausible, given the observed data,” but they seek “a probability calculation that does not require one to assume that H0 is true, so one avoids the questionable logic of proving H0 false by using a calculation that assumes it is true“(Martin and Liu 2014, p. 1704).

Questionable? A very standard form of argument is a reductio (ad absurdum) wherein a claim C  is inferred (i.e., detached) by falsifying ~C, that is, by showing that assuming ~C entails something in conflict with (if not logically contradicting) known results or known truths [i]. Actual falsification in science is generally a statistical variant of this argument. Supposing Hin p-value reasoning plays the role of ~C. Yet some aver it thereby “saws off its own limb”! Continue reading

Categories: P-values, reforming the reformers, Statistics | 13 Comments

3 YEARS AGO (MARCH 2014): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: March 2014. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 4 others I’d recommend[2].  Posts that are part of a “unit” or a group count as one. 3/19 and 3/17 are one, as are  3/19, 3/12 and 3/4, and the 6334 items 3/11, 3/22 and 3/26. So that covers nearly all the posts!

March 2014

 

  • (3/1) Cosma Shalizi gets tenure (at last!) (metastat announcement)
  • (3/2) Significance tests and frequentist principles of evidence: Phil6334 Day #6
  • (3/3) Capitalizing on Chance (ii)
  • (3/4) Power, power everywhere–(it) may not be what you think! [illustration]
  • (3/8) Msc kvetch: You are fully dressed (even under you clothes)?
  • (3/8) Fallacy of Rejection and the Fallacy of Nouvelle Cuisine
  • (3/11) Phil6334 Day #7: Selection effects, the Higgs and 5 sigma, Power
  • (3/12) Get empowered to detect power howlers
  • (3/15) New SEV calculator (guest app: Durvasula)
  • (3/17) Stephen Senn: “Delta Force: To what extent is clinical relevance relevant?” (Guest Post)

     

     

  • (3/19) Power taboos: Statue of Liberty, Senn, Neyman, Carnap, Severity
  • (3/22) Fallacies of statistics & statistics journalism, and how to avoid them: Summary & Slides Day #8 (Phil 6334)
  • (3/25) The Unexpected Way Philosophy Majors Are Changing The World Of Business

     

  • (3/26) Phil6334:Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)
  • (3/28) Severe osteometric probing of skeletal remains: John Byrd
  • (3/29) Winner of the March 2014 palindrome contest (rejected post)
  • (3/30) Phil6334: March 26, philosophy of misspecification testing (Day #9 slides)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 (moved to 4) -very convenient way to allow data-dependent choices.

Save

Save

Save

Save

Save

Categories: 3-year memory lane, Error Statistics, Statistics | Leave a comment

Er, about those other approaches, hold off until a balanced appraisal is in

I could have told them that the degree of accordance enabling the ASA’s “6 principles” on p-values was unlikely to be replicated when it came to most of the “other approaches” with which some would supplement or replace significance tests– notably Bayesian updating, Bayes factors, or likelihood ratios (confidence intervals are dual to hypotheses tests). [My commentary is here.] So now they may be advising a “hold off” or “go slow” approach until some consilience is achieved. Is that it? I don’t know. I was tweeted an article about the background chatter taking place behind the scenes; I wasn’t one of people interviewed for this. Here are some excerpts, I may add more later after it has had time to sink in. (check back later)

“Reaching for Best Practices in Statistics: Proceed with Caution Until a Balanced Critique Is In”

J. Hossiason

“[A]ll of the other approaches*, as well as most statistical tools, may suffer from many of the same problems as the p-values do. What level of likelihood ratio in favor of the research hypothesis will be acceptable to the journal? Should scientific discoveries be based on whether posterior odds pass a specific threshold (P3)? Does either measure the size of an effect (P5)?…How can we decide about the sample size needed for a clinical trial—however analyzed—if we do not set a specific bright-line decision rule? 95% confidence intervals or credence intervals…offer no protection against selection when only those that do not cover 0, are selected into the abstract (P4). (Benjamini, ASA commentary, pp. 3-4)

What’s sauce for the goose is sauce for the gander right?  Many statisticians seconded George Cobb who urged “the board to set aside time at least once every year to consider the potential value of similar statements” to the recent ASA p-value report. Disappointingly, a preliminary survey of leaders in statistics, many from the original p-value group, aired striking disagreements on best and worst practices with respect to these other approaches. The Executive Board is contemplating a variety of recommendations, minimally, that practitioners move with caution until they can put forward at least a few agreed upon principles for interpreting and applying Bayesian inference methods. The words we heard ranged from “go slow” to “moratorium [emphasis mine]. Having been privy to some of the results of this survey, we at Stat Report Watch decided to contact some of the individuals involved. Continue reading

Categories: P-values, reforming the reformers, Statistics | 6 Comments

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Understanding Reproducibility & Error Correction in Science

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2016–2017
57th Annual Program

Download the 57th Annual Program

The Alfred I. Taub forum:

UNDERSTANDING REPRODUCIBILITY & ERROR CORRECTION IN SCIENCE

Cosponsored by GMS and BU’s BEST at Boston University.
Friday, March 17, 2017
1:00 p.m. – 5:00 p.m.
The Terrace Lounge, George Sherman Union
775 Commonwealth Avenue

  • Reputation, Variation, &, Control: Historical Perspectives
    Jutta Schickore History and Philosophy of Science & Medicine, Indiana University, Bloomington.
  • Crisis in Science: Time for Reform?
    Arturo Casadevall Molecular Microbiology & Immunology, Johns Hopkins
  • Severe Testing: The Key to Error Correction
    Deborah Mayo Philosophy, Virginia Tech
  • Replicate That…. Maintaining a Healthy Failure Rate in Science
    Stuart Firestein Biological Sciences, Columbia

 

boston-mayo-2017

Categories: Announcement, Statistical fraudbusting, Statistics | Leave a comment

The ASA Document on P-Values: One Year On

imgres-6

I’m surprised it’s a year already since posting my published comments on the ASA Document on P-Values. Since then, there have been a slew of papers rehearsing the well-worn fallacies of tests (a tad bit more than the usual rate). Doubtless, the P-value Pow Wow raised people’s consciousnesses. I’m interested in hearing reader reactions/experiences in connection with the P-Value project (positive and negative) over the past year. (Use the comments, share links to papers; and/or send me something slightly longer for a possible guest post.)
Some people sent me a diagram from a talk by Stephen Senn (on “P-values and the art of herding cats”). He presents an array of different cat commentators, and for some reason Mayo cat is in the middle but way over on the left side,near the wall. I never got the key to interpretation.  My contribution is below: 

Chart by S.Senn

“Don’t Throw Out The Error Control Baby With the Bad Statistics Bathwater”

D. Mayo*[1]

The American Statistical Association is to be credited with opening up a discussion into p-values; now an examination of the foundations of other key statistical concepts is needed. Continue reading

Categories: Bayesian/frequentist, P-values, science communication, Statistics, Stephen Senn | 14 Comments

3 YEARS AGO (FEBRUARY 2014): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: February 2014. I normally mark in red three posts from each month that seem most apt for general background on key issues in this blog, but I decided just to list these as they are (some are from a seminar I taught with Aris Spanos 3 years ago; several on Fisher were recently reblogged). I hope you find something of interest!    

February 2014

  • (2/1) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs B-boosts)
  • (2/3) PhilStock: Bad news is bad news on Wall St. (rejected post)
  • (2/5) “Probabilism as an Obstacle to Statistical Fraud-Busting” (draft iii)
  • (2/9) Phil6334: Day #3: Feb 6, 2014
  • (2/10) Is it true that all epistemic principles can only be defended circularly? A Popperian puzzle
  • (2/12) Phil6334: Popper self-test
  • (2/13) Phil 6334 Statistical Snow Sculpture
  • (2/14) January Blog Table of Contents
  • (2/15) Fisher and Neyman after anger management?
  • (2/17) R. A. Fisher: how an outsider revolutionized statistics
  • (2/18) Aris Spanos: The Enduring Legacy of R. A. Fisher
  • (2/20) R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
  • (2/21) STEPHEN SENN: Fisher’s alternative to the alternative
  • (2/22) Sir Harold Jeffreys’ (tail-area) one-liner: Sat night comedy [draft ii]
  • (2/24) Phil6334: February 20, 2014 (Spanos): Day #5
  • (2/26) Winner of the February 2014 palindrome contest (rejected post)
  • (2/26) Phil6334: Feb 24, 2014: Induction, Popper and pseudoscience (Day #4)

 

 

Categories: 3-year memory lane, Statistics | 2 Comments

Guest Blog: STEPHEN SENN: ‘Fisher’s alternative to the alternative’

“You May Believe You Are a Bayesian But You Are Probably Wrong”

.

As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog a guest post by Stephen Senn from 2012.  (I will comment in the comments.)

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading

Categories: Fisher, S. Senn, Statistics | 13 Comments

Guest Blog: ARIS SPANOS: The Enduring Legacy of R. A. Fisher

By Aris Spanos

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most re­markable, but least recognized, achievement was to initiate the recast­ing of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the proba­bilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily confined to the description of the distributional features of the data in hand using the histogram and the first few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Fisher was able to recast statistical inference by turning Karl Pear­son’s approach, proceeding from data x0 in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespecified Mθ(x) (a ‘hypothetical infinite population’), and view x0 as a ‘typical’ realization thereof; see Spanos (1999). Continue reading

Categories: Fisher, Spanos, Statistics | Tags: , , , , , , | Leave a comment

R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I’ll post some different Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!

Two New Properties of Mathematical Likelihood

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

  The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. Continue reading

Categories: Fisher, phil/history of stat, Statistics | Tags: , , , | 2 Comments

Cox’s (1958) weighing machine example

IMG_0079

.

A famous chestnut given by Cox (1958) recently came up in conversation. The example  “is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992, p. 582). When I describe it, you’ll find it hard to believe many regard it as causing an earthquake in statistical foundations, unless you’re already steeped in these matters. If half the time I reported my weight from a scale that’s always right, and half the time use a scale that gets it right with probability .5, would you say I’m right with probability ¾? Well, maybe. But suppose you knew that this measurement was made with the scale that’s right with probability .5? The overall error probability is scarcely relevant for giving the warrant of the particular measurement,knowing which scale was used. Continue reading

Categories: Error Statistics, Sir David Cox, Statistics, strong likelihood principle | 1 Comment

Hocus pocus! Adopt a magician’s stance, if you want to reveal statistical sleights of hand

images-3

.

Here’s the follow-up post to the one I reblogged on Feb 3 (please read that one first). When they sought to subject Uri Geller to the scrutiny of scientists, magicians had to be brought in because only they were sufficiently trained to spot the subtle sleight of hand shifts by which the magician tricks by misdirection. We, too, have to be magicians to discern the subtle misdirections and shifts of meaning in the discussions of statistical significance tests (and other methods)—even by the same statistical guide. We needn’t suppose anything deliberately devious is going on at all! Often, the statistical guidebook reflects shifts of meaning that grow out of one or another critical argument. These days, they trickle down quickly to statistical guidebooks, thanks to popular articles on the “statistics crisis in science”. The danger is that their own guidebooks contain inconsistencies. To adopt the magician’s stance is to be on the lookout for standard sleights of hand. There aren’t that many.[0]

I don’t know Jim Frost, but he gives statistical guidance at the minitab blog. The purpose of my previous post is to point out that Frost uses the probability of a Type I error in two incompatible ways in his posts on significance tests. I assumed he’d want to clear this up, but so far he has not. His response to a comment I made on his blog is this: Continue reading

Categories: frequentist/Bayesian, P-values, reforming the reformers, S. Senn, Statistics | 39 Comments

High error rates in discussions of error rates: no end in sight

27D0BB5300000578-3168627-image-a-27_1437433320306

waiting for the other shoe to drop…

“Guides for the Perplexed” in statistics become “Guides to Become Perplexed” when “error probabilities” (in relation to statistical hypotheses tests) are confused with posterior probabilities of hypotheses. Moreover, these posteriors are neither frequentist, subjectivist, nor default. Since this doublespeak is becoming more common in some circles, it seems apt to reblog a post from one year ago (you may wish to check the comments).

Do you ever find yourself holding your breath when reading an exposition of significance tests that’s going swimmingly so far? If you’re a frequentist in exile, you know what I mean. I’m sure others feel this way too. When I came across Jim Frost’s posts on The Minitab Blog, I thought I might actually have located a success story. He does a good job explaining P-values (with charts), the duality between P-values and confidence levels, and even rebuts the latest “test ban” (the “Don’t Ask, Don’t Tell” policy). Mere descriptive reports of observed differences that the editors recommend, Frost shows, are uninterpretable without a corresponding P-value or the equivalent. So far, so good. I have only small quibbles, such as the use of “likelihood” when meaning probability, and various and sundry nitpicky things. But watch how in some places significance levels are defined as the usual error probabilities —indeed in the glossary for the site—while in others it is denied they provide error probabilities. In those other places, error probabilities and error rates shift their meaning to posterior probabilities, based on priors representing the “prevalence” of true null hypotheses.

Begin with one of his kosher posts “Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics” (blue is Frost): Continue reading

Categories: highly probable vs highly probed, J. Berger, reforming the reformers, Statistics | 1 Comment

Blog at WordPress.com.