Error Statistics

Neyman vs the ‘Inferential’ Probabilists continued (a)

.

Today is Jerzy Neyman’s Birthday (April 16, 1894 – August 5, 1981).  I am posting a brief excerpt and a link to a paper of his that I hadn’t posted before: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”.  “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). [iii] I’ll explain in later stages of this post & in comments…(so please check back); I don’t want to miss the start of the birthday party in honor of Neyman, and it’s already 8:30 p.m in Berkeley!

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program. HAPPY BIRTHDAY NEYMAN!

 

 

Installment (a)4/17. What doesn’t Neyman like about Birnbaum’s advocacy of a Principle of Sufficiency S (p. 25)? He doesn’t like that it is advanced as a normative principle (e.g., about when evidence is or ought to be deemed equivalent) rather than a criterion that does something for you, such as control errors. (Presumably it is relevant to a type of context, say parametric inference within a model.) S is put forward as a kind of principle of rationality, rather than one with a rationale in solving some statistical problem

“The principle of sufficiency (S): If E is specified experiment, with outcomes x; if t = t (x) is any sufficient statistic; and if E’ is the experiment, derived from E, in which any outcome x of E is represented only by the corresponding value t = t (x) of the sufficient statistic; then for each x, Ev (E, x) = Ev (E’, t) where t = t (x)… (S) may be described informally as asserting the ‘irrelevance of observations independent of a sufficient statistic’.”

Ev(E, x) is a metalogical symbol referring to the evidence from experiment E with result x. The very idea that there is such a thing as an evidence function is never explained, but to Birnbaum “inferential theory” required such things. (At least that’s how he started out.) The view is very philosophical and it inherits much from logical positivism and logics of induction.The principle S, and also other principles of Birnbaum, have a normative character: Birnbaum considers them “compellingly appropriate”.

“The principles of Birnbaum appear as a kind of substitutes for known theorems” Neyman says. For example, various authors proved theorems to the general effect that the use of sufficient statistics will minimize the frequency of errors. But if you just start with the rationale (minimizing the frequency of errors, say) you wouldn’t need these”principles” from on high as it were. That’s what Neyman seems to be saying in his criticism of them in this paper. Do you agree? He has the same gripe concerning Cornfield’s conception of a default-type Bayesian account akin to Jeffreys. Why?

[i] I thank @omaclaran for reminding me of this paper on twitter recently.

[ii] Or so I argue in my forthcoming, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, 2018, CUP. (Expected this summer.)

[iii] Do you think Neyman is using “breakthrough” here in reference to Savage’s description of Birnbaum’s “proof” of the (strong) Likelihood Principle? Or is it the other way round? Or neither? Please weigh in.

REFERENCES

Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘, Revue De l’Institut International De Statistique / Review of the International Statistical Institute, 30(1), 11-27.

Categories: Bayesian/frequentist, Error Statistics, Neyman, Statistics | Leave a comment

S. Senn: Being a statistician means never having to say you are certain (Guest Post)

.

Stephen Senn
Head of  Competence Center
for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Being a statistician means never having to say you are certain

A recent discussion of randomised controlled trials[1] by Angus Deaton and Nancy Cartwright (D&C) contains much interesting analysis but also, in my opinion, does not escape rehashing some of the invalid criticisms of randomisation with which the literatures seems to be littered. The paper has two major sections. The latter, which deals with generalisation of results, or what is sometime called external validity, I like much more than the former which deals with internal validity. It is the former I propose to discuss.

Continue reading

Categories: Error Statistics, RCTs, Statistics | 25 Comments

How to avoid making mountains out of molehills (using power and severity)

.

In preparation for a new post that takes up some of the recent battles on reforming or replacing p-values, I reblog an older post on power, one of the most misunderstood and abused notions in statistics. (I add a few “notes on howlers”.)  The power of a test T in relation to a discrepancy from a test hypothesis H0 is the probability T will lead to rejecting H0 when that discrepancy is present. Power is sometimes misappropriated to mean something only distantly related to the probability a test leads to rejection; but I’m getting ahead of myself. This post is on a classic fallacy of rejection. Continue reading

Categories: CIs and tests, Error Statistics, power | 4 Comments

Yoav Benjamini, “In the world beyond p < .05: When & How to use P < .0499…"

.

These were Yoav Benjamini’s slides,”In the world beyond p<.05: When & How to use P<.0499…” from our session at the ASA 2017 Symposium on Statistical Inference (SSI): A World Beyond p < 0.05. (Mine are in an earlier post.) He begins by asking:

However, it’s mandatory to adjust for selection effects, and Benjamini is one of the leaders in developing ways to carry out the adjustments. Even calling out the avenues for cherry-picking and multiple testing, long known to invalidate p-values, would make replication research more effective (and less open to criticism). Continue reading

Categories: Error Statistics, P-values, replication research, selection effects | 22 Comments

Revisiting Popper’s Demarcation of Science 2017

28 July 1902- 17 Sept. 1994

Karl Popper died on September 17 1994. One thing that gets revived in my new book (Statistical Inference as Severe Testing, 2018, CUP) is a Popperian demarcation of science vs pseudoscience Here’s a snippet from what I call a “live exhibit” (where the reader experiments with a subject) toward the end of a chapter on Popper:

Live Exhibit. Revisiting Popper’s Demarcation of Science: Here’s an experiment: Try shifting what Popper says about theories to a related claim about inquiries to find something out. To see what I have in mind, join me in watching a skit over the lunch break:

Physicist: “If mere logical falsifiability suffices for a theory to be scientific, then, we can’t properly oust astrology from the scientific pantheon. Plenty of nutty theories have been falsified, so by definition they’re scientific. Moreover, scientists aren’t always looking to subject well corroborated theories to “grave risk” of falsification.”

Fellow traveler: “I’ve been thinking about this. On your first point, Popper confuses things by making it sound as if he’s asking: When is a theory unscientific? What he is actually asking or should be asking is: When is an inquiry into a theory, or an appraisal of claim H unscientific? We want to distinguish meritorious modes of inquiry from those that are BENT. If the test methods enable ad hoc maneuvering, sneaky face-saving devices, then the inquiry–the handling and use of data–is unscientific. Despite being logically falsifiable, theories can be rendered immune from falsification by means of cavalier methods for their testing. Adhering to a falsified theory no matter what is poor science. On the other hand, some areas have so much noise that you can’t pinpoint what’s to blame for failed predictions. This is another way that inquiries become bad science.”

She continues: Continue reading

Categories: Error Statistics, Popper, pseudoscience, science vs pseudoscience | Tags: | 10 Comments

A. Spanos: Egon Pearson’s Neglected Contributions to Statistics

11 August 1895 – 12 June 1980

Continuing with my Egon Pearson posts in honor of his birthday, I reblog a post by Aris Spanos:  Egon Pearson’s Neglected Contributions to Statistics“. 

    Egon Pearson (11 August 1895 – 12 June 1980), is widely known today for his contribution in recasting of Fisher’s significance testing into the Neyman-Pearson (1933) theory of hypothesis testing. Occasionally, he is also credited with contributions in promoting statistical methods in industry and in the history of modern statistics; see Bartlett (1981). What is rarely mentioned is Egon’s early pioneering work on:

(i) specification: the need to state explicitly the inductive premises of one’s inferences,

(ii) robustness: evaluating the ‘sensitivity’ of inferential procedures to departures from the Normality assumption, as well as

(iii) Mis-Specification (M-S) testing: probing for potential departures from the Normality  assumption.

Arguably, modern frequentist inference began with the development of various finite sample inference procedures, initially by William Gosset (1908) [of the Student’s t fame] and then Fisher (1915, 1921, 1922a-b). These inference procedures revolved around a particular statistical model, known today as the simple Normal model: Continue reading

Categories: E.S. Pearson, phil/history of stat, Spanos, Testing Assumptions | 2 Comments

“A megateam of reproducibility-minded scientists” look to lowering the p-value

.

Having discussed the “p-values overstate the evidence against the null fallacy” many times over the past few years, I leave it to readers to disinter the issues (pro and con), and appraise the assumptions, in the most recent rehearsal of the well-known Bayesian argument. There’s nothing intrinsically wrong with demanding everyone work with a lowered p-value–if you’re so inclined to embrace a single, dichotomous standard without context-dependent interpretations, especially if larger sample sizes are required to compensate the loss of power. But lowering the p-value won’t solve the problems that vex people (biasing selection effects), and is very likely to introduce new ones (see my comment). Kelly Servick, a reporter from Science, gives the ingredients of the main argument given by “a megateam of reproducibility-minded scientists” in an article out today: Continue reading

Categories: Error Statistics, highly probable vs highly probed, P-values, reforming the reformers | 55 Comments

On the current state of play in the crisis of replication in psychology: some heresies

.

The replication crisis has created a “cold war between those who built up modern psychology and those” tearing it down with failed replications–or so I read today [i]. As an outsider (to psychology), the severe tester is free to throw some fuel on the fire on both sides. This is a short update on my post “Some ironies in the replication crisis in social psychology” from 2014.

Following the model from clinical trials, an idea gaining steam is to prespecify a “detailed protocol that includes the study rationale, procedure and a detailed analysis plan” (Nosek et.al. 2017). In this new paper, they’re called registered reports (RRs). An excellent start. I say it makes no sense to favor preregistration and deny the relevance to evidence of optional stopping and outcomes other than the one observed. That your appraisal of the evidence is altered when you actually see the history supplied by the RR is equivalent to worrying about biasing selection effects when they’re not written down; your statistical method should pick up on them (as do p-values, confidence levels and many other error probabilities). There’s a tension between the RR requirements and accounts following the Likelihood Principle (no need to name names [ii]). Continue reading

Categories: Error Statistics, preregistration, reforming the reformers, replication research | 9 Comments

S. Senn: “Automatic for the people? Not quite” (Guest post)

Stephen Senn

Stephen Senn
Head of  Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Automatic for the people? Not quite

What caught my eye was the estimable (in its non-statistical meaning) Richard Lehman tweeting about the equally estimable John Ioannidis. For those who don’t know them, the former is a veteran blogger who keeps a very cool and shrewd eye on the latest medical ‘breakthroughs’ and the latter a serial iconoclast of idols of scientific method. This is what Lehman wrote

Ioannidis hits 8 on the Richter scale: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0173184 … Bayes factors consistently quantify strength of evidence, p is valueless.

Since Ioannidis works at Stanford, which is located in the San Francisco Bay Area, he has every right to be interested in earthquakes but on looking up the paper in question, a faint tremor is the best that I can afford it. I shall now try and explain why, but before I do, it is only fair that I acknowledge the very generous, prompt and extensive help I have been given to understand the paper[1] in question by its two authors Don van Ravenzwaaij and Ioannidis himself. Continue reading

Categories: Bayesian/frequentist, Error Statistics, S. Senn | 18 Comments

Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen

neyman

April 16, 1894 – August 5, 1981

I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of  statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are justified only by dint of long-run error control. Controlling the “frequency of misdirected activities” in the midst of finding something out, or solving a problem of inquiry, on the other hand, are epistemological goals. What do you think?

Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena
by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

I recommend, especially, the example on home ownership. Here are two snippets: Continue reading

Categories: Error Statistics, Neyman, Statistics | Tags: | 2 Comments

3 YEARS AGO (MARCH 2014): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: March 2014. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 4 others I’d recommend[2].  Posts that are part of a “unit” or a group count as one. 3/19 and 3/17 are one, as are  3/19, 3/12 and 3/4, and the 6334 items 3/11, 3/22 and 3/26. So that covers nearly all the posts!

March 2014

 

  • (3/1) Cosma Shalizi gets tenure (at last!) (metastat announcement)
  • (3/2) Significance tests and frequentist principles of evidence: Phil6334 Day #6
  • (3/3) Capitalizing on Chance (ii)
  • (3/4) Power, power everywhere–(it) may not be what you think! [illustration]
  • (3/8) Msc kvetch: You are fully dressed (even under you clothes)?
  • (3/8) Fallacy of Rejection and the Fallacy of Nouvelle Cuisine
  • (3/11) Phil6334 Day #7: Selection effects, the Higgs and 5 sigma, Power
  • (3/12) Get empowered to detect power howlers
  • (3/15) New SEV calculator (guest app: Durvasula)
  • (3/17) Stephen Senn: “Delta Force: To what extent is clinical relevance relevant?” (Guest Post)

     

     

  • (3/19) Power taboos: Statue of Liberty, Senn, Neyman, Carnap, Severity
  • (3/22) Fallacies of statistics & statistics journalism, and how to avoid them: Summary & Slides Day #8 (Phil 6334)
  • (3/25) The Unexpected Way Philosophy Majors Are Changing The World Of Business

     

  • (3/26) Phil6334:Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)
  • (3/28) Severe osteometric probing of skeletal remains: John Byrd
  • (3/29) Winner of the March 2014 palindrome contest (rejected post)
  • (3/30) Phil6334: March 26, philosophy of misspecification testing (Day #9 slides)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 (moved to 4) -very convenient way to allow data-dependent choices.

Save

Save

Save

Save

Save

Categories: 3-year memory lane, Error Statistics, Statistics | Leave a comment

Cox’s (1958) weighing machine example

IMG_0079

.

A famous chestnut given by Cox (1958) recently came up in conversation. The example  “is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992, p. 582). When I describe it, you’ll find it hard to believe many regard it as causing an earthquake in statistical foundations, unless you’re already steeped in these matters. If half the time I reported my weight from a scale that’s always right, and half the time use a scale that gets it right with probability .5, would you say I’m right with probability ¾? Well, maybe. But suppose you knew that this measurement was made with the scale that’s right with probability .5? The overall error probability is scarcely relevant for giving the warrant of the particular measurement,knowing which scale was used. Continue reading

Categories: Error Statistics, Sir David Cox, Statistics, strong likelihood principle | 1 Comment

Szucs & Ioannidis Revive the Limb-Sawing Fallacy

 

images-2

.

When logical fallacies of statistics go uncorrected, they are repeated again and again…and again. And so it is with the limb-sawing fallacy I first posted in one of my “Overheard at the Comedy Hour” posts.* It now resides as a comic criticism of significance tests in a paper by Szucs and Ioannidis (posted this week),  Here’s their version:

“[P]aradoxically, when we achieve our goal and successfully reject Hwe will actually be left in complete existential vacuum because during the rejection of HNHST ‘saws off its own limb’ (Jaynes, 2003; p. 524): If we manage to reject H0then it follows that pr(data or more extreme data|H0) is useless because H0 is not true” (p.15).

Here’s Jaynes (p. 524):

“Suppose we decide that the effect exists; that is, we reject [null hypothesis] H0. Surely, we must also reject probabilities conditional on H0, but then what was the logical justification for the decision? Orthodox logic saws off its own limb.’ 

Ha! Ha! By this reasoning, no hypothetical testing or falsification could ever occur. As soon as H is falsified, the grounds for falsifying disappear! If H: all swans are white, then if I see a black swan, H is falsified. But according to this criticism, we can no longer assume the deduced prediction from H! What? Continue reading

Categories: Error Statistics, P-values, reforming the reformers, Statistics | 14 Comments

3 YEARS AGO (DECEMBER 2013): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: December 2013. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 3 others I’d recommend[2].  Posts that are part of a “unit” or a group count as one. In this post, that makes 12/27-12/28 count as one.

December 2013

  • (12/3) Stephen Senn: Dawid’s Selection Paradox (guest post)
  • (12/7) FDA’s New Pharmacovigilance
  • (12/9) Why ecologists might want to read more philosophy of science (UPDATED)
  • (12/11) Blog Contents for Oct and Nov 2013
  • (12/14) The error statistician has a complex, messy, subtle, ingenious piece-meal approach
  • (12/15) Surprising Facts about Surprising Facts
  • (12/19) A. Spanos lecture on “Frequentist Hypothesis Testing
  • (12/24) U-Phil: Deconstructions [of J. Berger]: Irony & Bad Faith 3
  • (12/25) “Bad Arguments” (a book by Ali Almossawi)
  • (12/26) Mascots of Bayesneon statistics (rejected post)
  • (12/27) Deconstructing Larry Wasserman
  • (12/28) More on deconstructing Larry Wasserman (Aris Spanos)
  • (12/28) Wasserman on Wasserman: Update! December 28, 2013
  • (12/31) Midnight With Birnbaum (Happy New Year)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30, 2016-very convenient.

 

Save

Save

Save

Save

Save

Categories: 3-year memory lane, Bayesian/frequentist, Error Statistics, Statistics | 1 Comment

“Tests of Statistical Significance Made Sound”: excerpts from B. Haig

images-34

.

I came across a paper, “Tests of Statistical Significance Made Sound,” by Brian Haig, a psychology professor at the University of Canterbury, New Zealand. It hits most of the high notes regarding statistical significance tests, their history & philosophy and, refreshingly, is in the error statistical spirit! I’m pasting excerpts from his discussion of “The Error-Statistical Perspective”starting on p.7.[1]

The Error-Statistical Perspective

An important part of scientific research involves processes of detecting, correcting, and controlling for error, and mathematical statistics is one branch of methodology that helps scientists do this. In recognition of this fact, the philosopher of statistics and science, Deborah Mayo (e.g., Mayo, 1996), in collaboration with the econometrician, Aris Spanos (e.g., Mayo & Spanos, 2010, 2011), has systematically developed, and argued in favor of, an error-statistical philosophy for understanding experimental reasoning in science. Importantly, this philosophy permits, indeed encourages, the local use of ToSS, among other methods, to manage error. Continue reading

Categories: Bayesian/frequentist, Error Statistics, fallacy of rejection, P-values, Statistics | 12 Comments

3 YEARS AGO (NOVEMBER 2013): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: November 2013. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 3 others I’d recommend[2].  Posts that are part of a “unit” or a group count as one. Here I’m counting 11/9, 11/13, and 11/16 as one

November 2013

  • (11/2) Oxford Gaol: Statistical Bogeymen
  • (11/4) Forthcoming paper on the strong likelihood principle
  • (11/9) Null Effects and Replication (cartoon pic)
  • (11/9) Beware of questionable front page articles warning you to beware of questionable front page articles (iii)
  • (11/13) T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)
  • (11/16) PhilStock: No-pain bull
  • (11/16) S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)
  • (11/18) Lucien Le Cam: “The Bayesians hold the Magic”
  • (11/20) Erich Lehmann: Statistician and Poet
  • (11/23) Probability that it is a statistical fluke [i]
  • (11/27)The probability that it be a statistical fluke” [iia]
  • (11/30) Saturday night comedy at the “Bayesian Boy” diary (rejected post*)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30, 2016-very convenient.

 

Save

Save

Save

Save

Save

Categories: 3-year memory lane, Error Statistics, Statistics | Leave a comment

3 YEARS AGO (OCTOBER 2013): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: October 2013. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 3 others I’d recommend[2].  Posts that are part of a “unit” or a pair count as one.

October 2013

  • (10/3) Will the Real Junk Science Please Stand Up? (critical thinking)
     
  • (10/5) Was Janina Hosiasson pulling Harold Jeffreys’ leg?
  • (10/9) Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock
  • (10/12) Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”(10/5 and 10/12 are a pair)
     
  • (10/19) Blog Contents: September 2013
  • (10/19) Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*
  • (10/25) Bayesian confirmation theory: example from last post…(10/19 and 10/25 are a pair)
  • (10/26) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)
  • (10/31) WHIPPING BOYS AND WITCH HUNTERS (interesting to see how things have changed and stayed the same over the past few years, share comments)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30, 2016-very convenient.

 

Save

Save

Save

Save

Save

Categories: 3-year memory lane, Error Statistics, Statistics | 22 Comments

For Statistical Transparency: Reveal Multiplicity and/or Just Falsify the Test (Remark on Gelman and Colleagues)

images-31

.

Gelman and Loken (2014) recognize that even without explicit cherry picking there is often enough leeway in the “forking paths” between data and inference so that by artful choices you may be led to one inference, even though it also could have gone another way. In good sciences, measurement procedures should interlink with well-corroborated theories and offer a triangulation of checks– often missing in the types of experiments Gelman and Loken are on about. Stating a hypothesis in advance, far from protecting from the verification biases, can be the engine that enables data to be “constructed”to reach the desired end [1].

[E]ven in settings where a single analysis has been carried out on the given data, the issue of multiple comparisons emerges because different choices about combining variables, inclusion and exclusion of cases…..and many other steps in the analysis could well have occurred with different data (Gelman and Loken 2014, p. 464).

An idea growing out of this recognition is to imagine the results of applying the same statistical procedure, but with different choices at key discretionary junctures–giving rise to a multiverse analysis, rather than a single data set (Steegen, Tuerlinckx, Gelman, and Vanpaemel 2016). One lists the different choices thought to be plausible at each stage of data processing. The multiverse displays “which constellation of choices corresponds to which statistical results” (p. 797). The result of this exercise can, at times, mimic the delineation of possibilities in multiple testing and multiple modeling strategies. Continue reading

Categories: Bayesian/frequentist, Error Statistics, Gelman, P-values, preregistration, reproducibility, Statistics | 9 Comments

Peircean Induction and the Error-Correcting Thesis (Part I)

C. S. Peirce: 10 Sept, 1839-19 April, 1914

C. S. Peirce: 10 Sept, 1839-19 April, 1914

Today is C.S. Peirce’s birthday. He’s one of my all time heroes. You should read him: he’s a treasure chest on essentially any topic, and he anticipated several major ideas in statistics (e.g., randomization, confidence intervals) as well as in logic. I’ll reblog the first portion of a (2005) paper of mine. Links to Parts 2 and 3 are at the end. It’s written for a very general philosophical audience; the statistical parts are pretty informal. Happy birthday Peirce.

Peircean Induction and the Error-Correcting Thesis
Deborah G. Mayo
Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy, Volume 41, Number 2, 2005, pp. 299-319

Peirce’s philosophy of inductive inference in science is based on the idea that what permits us to make progress in science, what allows our knowledge to grow, is the fact that science uses methods that are self-correcting or error-correcting:

Induction is the experimental testing of a theory. The justification of it is that, although the conclusion at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error. (5.145)

Continue reading

Categories: Bayesian/frequentist, C.S. Peirce, Error Statistics, Statistics | 18 Comments

If you think it’s a scandal to be without statistical falsification, you will need statistical tests (ii)

Screen Shot 2016-08-09 at 2.55.33 PM

.

1. PhilSci and StatSci. I’m always glad to come across statistical practitioners who wax philosophical, particularly when Karl Popper is cited. Best of all is when they get the philosophy somewhere close to correct. So, I came across an article by Burnham and Anderson (2014) in Ecology:

While the exact definition of the so-called ‘scientific method’ might be controversial, nearly everyone agrees that the concept of ‘falsifiability’ is a central tenant [sic] of empirical science (Popper 1959). It is critical to understand that historical statistical approaches (i.e., P values) leave no way to ‘test’ the alternative hypothesis. The alternative hypothesis is never tested, hence cannot be rejected or falsified!… Surely this fact alone makes the use of significance tests and P values bogus. Lacking a valid methodology to reject/falsify the alternative science hypotheses seems almost a scandal.” (Burnham and Anderson p. 629)

Well I am (almost) scandalized by this easily falsifiable allegation! I can’t think of a single “alternative”, whether in a “pure” Fisherian or a Neyman-Pearson hypothesis test (whether explicit or implicit) that’s not falsifiable; nor do the authors provide any. I grant that understanding testability and falsifiability is far more complex than the kind of popularized accounts we hear about; granted as well, theirs is just a short paper.[1] But then why make bold declarations on the topic of the “scientific method and statistical science,” on falsifiability and testability? Continue reading

Categories: P-values, Severity, statistical tests, Statistics, StatSci meets PhilSci | 22 Comments

Blog at WordPress.com.