Mementos for Excursion 2 Tour II: Falsification, Pseudoscience, Induction (2.3-2.7)


Excursion 2 Tour II: Falsification, Pseudoscience, Induction*

Outline of Tour. Tour II visits Popper, falsification, corroboration, Duhem’s problem (what to blame in the case of anomalies) and the demarcation of science and pseudoscience (2.3). While Popper comes up short on each, the reader is led to improve on Popper’s notions (live exhibit (v)). Central ingredients for our journey are put in place via souvenirs: a framework of models and problems, and a post-Popperian language to speak about inductive inference. Defining a severe test, for Popperians, is linked to when data supply novel evidence for a hypothesis: family feuds about defining novelty are discussed (2.4). We move into Fisherian significance tests and the crucial requirements he set (often overlooked): isolated significant results are poor evidence of a genuine effect, and statistical significance doesn’t warrant substantive, e.g., causal inference (2.5). Applying our new demarcation criterion to a plausible effect (males are more likely than females to feel threatened by their partner’s success), we argue that a real revolution in psychology will need to be more revolutionary than at present. Whole inquiries might have to be falsified, their measurement schemes questioned (2.6). The Tour’s pieces are synthesized in (2.7), where a guest lecturer explains how to solve the problem of induction now, having redefined induction as severe testing.

Mementos from 2.3

There are four key, interrelated themes from Popper:

(1) Science and Pseudoscience. For a theory to be scientific it must be testable and falsifiable.

(2) Conjecture and Refutation. We learn not by enumerative induction but by trial and error: conjecture and refutation.

(3) Observations Are Not Given. If they are at the “foundation,” it is only because there are apt methods for testing their validity. We dub claims observable because or to the extent that they are open to stringent checks.

(4) Corroboration Not Confirmation, Severity Not Probabilism. Rejecting probabilism, Popper denies scientists are interested in highly probable hypotheses (in any sense). They seek bold, informative, interesting conjectures and ingenious and severe attempts to refute them.

These themes are in the spirit of the error statistician. Considerable spade-work is required to see what to keep and what to revise, so bring along your archeological shovels.

The Severe Tester Revises Popper’s Demarcation of Science (Live Exhibit (vi)): What he should be asking is not whether a theory is unscientific, but When is an inquiry into a theory, or an appraisal of claim H, unscientic? We want to distinguish meritorious modes of inquiry from those that are BENT. If the test methods enable ad hoc maneuvering, sneaky face- saving devices, then the inquiry – the handling and use of data – is unscientific. Despite being logically falsifiable, theories can be rendered immune from falsification by means of questionable methods for their testing.

Greater Content, Greater Severity. The severe tester accepts Popper’s central intuition in (4): if we wanted highly probable claims, scientists would stick to low-level observables and not seek generalizations, much less theories with high explanatory content.A highly explanatory, high-content theory, with interconnected tentacles, has a higher probability of having flaws discerned than low-content theories that do not rule out as much. Thus, when the bolder, higher content, theory stands up to testing, it may earn higher overall severity than the one with measly content. It is the fuller, unifying, theory developed in the course of solving interconnected problems that enables severe tests.

Methodological Probability. Probability in learning attaches to a method of conjecture and refutation, that is to testing: it is methodological probability. An error probability is a special case of a methodological probability. We want methods with a high probability of teaching us (and machines) how to distinguish approximately correct and incorrect interpretations of data. That a theory is plausible is of little interest, in and of itself; what matters is that it is implausible for it to have passed these tests were it false or incapable of adequately solving its set of problems.

Methodological falsification. We appeal to methodological rules for when to regard a claim as falsified.

  • Inductive-statistical falsification proceeds by methods that allow ~H to be inferred with severity. A first step is often to infer an anomaly is real, by falsifying a “due to chance” hypothesis.
  • Going further, we may corroborate (i.e., infer with severity) effects that count as falsifying hypotheses. A falsifying hypothesis is a hypothesis inferred in order to falsify some other claim. Example: the pathological proteins (prions) in mad cow disease infect without nucleic acid. This falsifies: all infectious agents involve nucleic acid.

Despite giving lip service to testing and falsification, many popular accounts of statistical inference do not embody falsification – even of a statistical sort.

However, the falsifying hypotheses that are integral for Popper also necessitate an evidence-transcending (inductive) statistical inference.

The Popperian (Methodological) Falsificationist Is an Error Statistician

When is a statistical hypothesis to count as falsified? Although extremely rare events may occur, Popper notes:

such occurrences would not be physical effects, because, on account of their immense improbability, they are not reproducible at will … If, however, we find reproducible deviations from a macro effect .. . deduced from a probability estimate … then we must assume that the probability estimate is falsified. (Popper 1959, p. 203)

In the same vein, we heard Fisher deny that an “isolated record” of statistically significant results suffices to warrant a reproducible or genuine effect (Fisher 1935a, p. 14).

In a sense, the severe tester ‘breaks’ from Popper by solving his key problem: Popper’s account rests on severe tests, tests that would probably falsify claims if false, but he cannot warrant saying a method is probative or severe, because that would mean it was reliable, which makes Popperians squeamish. It would appear to concede to his critics that Popper has a “whiff of induction” after all. But it’s not inductive enumeration. Error statistical methods (whether from statistics or informal) can supply the severe tests Popper sought.

A scientific inquiry (a procedure for finding something out) for a severe tester:

  • blocks inferences that fail the minimal requirement for severity:
  • must be able to embark on a reliable probe to pinpoint blame for anomalies (and use the results to replace falsied claims and build a repertoire of errors).

The parenthetical remark isn’t absolutely required, but is a feature that greatly strengthens scientific credentials.

The reliability requirement is: infer claims just to the extent that they pass severe tests. There’s no sharp line for demarcation, but when these requirements are absent, an inquiry veers into the realm of questionable science or pseudoscience.

To see mementos of 2.4-2.7, I’ve placed them here.**

All of 2.3 is here.

Please use the comments for your questions, corrections, suggested additions.

*All items refer to my new book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP 2018)

**I’m bound to revise and add to these during a seminar next semester.

Categories: Popper, Statistical Inference as Severe Testing, Statistics

Tour Guide Mementos and QUIZ 2.1 (Excursion 2 Tour I: Induction and Confirmation)


Excursion 2 Tour I: Induction and Confirmation (Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars)

Tour Blurb. The roots of rival statistical accounts go back to the logical Problem of Induction. (2.1) The logical problem of induction is a matter of finding an argument to justify a type of argument (enumerative induction), so it is important to be clear on arguments, their soundness versus their validity. These are key concepts of fundamental importance to our journey. Given that any attempt to solve the logical problem of induction leads to circularity, philosophers turned instead to building logics that seemed to capture our intuitions about induction. This led to confirmation theory and some projects in today’s formal epistemology. There’s an analogy between contrasting views in philosophy and statistics: Carnapian confirmation is to Bayesian statistics, as Popperian falsification is to frequentist error statistics. Logics of confirmation take the form of probabilisms, either in the form of raising the probability of a hypothesis, or arriving at a posterior probability. (2.2) The contrast between these types of probabilisms, and the problems each is found to have in confirmation theory are directly relevant to the types of probabilisms in statistics. Notably, Harold Jeffreys’ non-subjective Bayesianism, and current spin-offs, share features with Carnapian inductive logics. We examine the problem of irrelevant conjunctions: that if x confirms H, it confirms (H & J) for any J. This also leads to what’s called the tacking paradox.

Quiz on 2.1 Soundness vs Validity in Deductive Logic. Let ~C be the denial of claim C. For each of the following argument, indicate whether it is valid and sound, valid but unsound, invalid. Continue reading

Categories: induction, SIST, Statistical Inference as Severe Testing, Statistics | 6 Comments

Excursion 1 Tour I: Beyond Probabilism and Performance: Severity Requirement (1.1)

The cruise begins…

I’m talking about a specific, extra type of integrity that is [beyond] not lying, but bending over backwards to show how you’re maybe wrong, that you ought to have when acting as a scientist. (Feynman 1974/1985, p. 387)

It is easy to lie with statistics. Or so the cliché goes. It is also very difficult to uncover these lies without statistical methods – at least of the right kind. Self- correcting statistical methods are needed, and, with minimal technical fanfare, that’s what I aim to illuminate. Since Darrell Huff wrote How to Lie with Statistics in 1954, ways of lying with statistics are so well worn as to have emerged in reverberating slogans:

  • Association is not causation.
  • Statistical significance is not substantive significamce
  • No evidence of risk is not evidence of no risk.
  • If you torture the data enough, they will confess.

Exposés of fallacies and foibles ranging from professional manuals and task forces to more popularized debunking treatises are legion. New evidence has piled up showing lack of replication and all manner of selection and publication biases. Even expanded “evidence-based” practices, whose very rationale is to emulate experimental controls, are not immune from allegations of illicit cherry picking, significance seeking, P-hacking, and assorted modes of extra- ordinary rendition of data. Attempts to restore credibility have gone far beyond the cottage industries of just a few years ago, to entirely new research programs: statistical fraud-busting, statistical forensics, technical activism, and widespread reproducibility studies. There are proposed methodological reforms – many are generally welcome (preregistration of experiments, transparency about data collection, discouraging mechanical uses of statistics), some are quite radical. If we are to appraise these evidence policy reforms, a much better grasp of some central statistical problems is needed.

Continue reading

Categories: Statistical Inference as Severe Testing, Statistics | 8 Comments

A. Spanos: Egon Pearson’s Neglected Contributions to Statistics

Continuing with the discussion of E.S. Pearson in honor of his birthday:

Egon Pearson’s Neglected Contributions to Statistics

by Aris Spanos

    Egon Pearson (11 August 1895 – 12 June 1980), is widely known today for his contribution in recasting of Fisher’s significance testing into the Neyman-Pearson (1933) theory of hypothesis testing. Occasionally, he is also credited with contributions in promoting statistical methods in industry and in the history of modern statistics; see Bartlett (1981). What is rarely mentioned is Egon’s early pioneering work on:

(i) specification: the need to state explicitly the inductive premises of one’s inferences,

(ii) robustness: evaluating the ‘sensitivity’ of inferential procedures to departures from the Normality assumption, as well as

(iii) Mis-Specification (M-S) testing: probing for potential departures from the Normality  assumption.

Arguably, modern frequentist inference began with the development of various finite sample inference procedures, initially by William Gosset (1908) [of the Student’s t fame] and then Fisher (1915, 1921, 1922a-b). These inference procedures revolved around a particular statistical model, known today as the simple Normal model: Continue reading

Categories: E.S. Pearson, Egon Pearson, Statistics | 1 Comment

Egon Pearson’s Heresy

E.S. Pearson: 11 Aug 1895-12 June 1980.

Today is Egon Pearson’s birthday. In honor of his birthday, I am posting “Statistical Concepts in Their Relation to Reality” (Pearson 1955). I’ve posted it several times over the years, but always find a new gem or two, despite its being so short. E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

Continue reading

Categories: phil/history of stat, Philosophy of Statistics, Statistics | Tags: , , | 2 Comments

“Intentions (in your head)” is the code word for “error probabilities (of a procedure)”: Allan Birnbaum’s Birthday

27 May 1923-1 July 1976

27 May 1923-1 July 1976

Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference,” in Breakthroughs in Statistics (volume I 1993), concerns a principle that remains at the heart of today’s controversies in statistics–even if it isn’t obvious at first: the Likelihood Principle (LP) (also called the strong likelihood Principle SLP, to distinguish it from the weak LP [1]). According to the LP/SLP, given the statistical model, the information from the data are fully contained in the likelihood ratio. Thus, properties of the sampling distribution of the test statistic vanish (as I put it in my slides from this post)! But error probabilities are all properties of the sampling distribution. Thus, embracing the LP (SLP) blocks our error statistician’s direct ways of taking into account “biasing selection effects” (slide #10). [Posted earlier here.] Interesting, as seen in a 2018 post on Neyman, Neyman did discuss this paper, but had an odd reaction that I’m not sure I understand. (Check it out.) Continue reading

Categories: Birnbaum, Birnbaum Brakes, frequentist/Bayesian, Likelihood Principle, phil/history of stat, Statistics | 7 Comments

Neyman vs the ‘Inferential’ Probabilists continued (a)


Today is Jerzy Neyman’s Birthday (April 16, 1894 – August 5, 1981).  I am posting a brief excerpt and a link to a paper of his that I hadn’t posted before: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”.  “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). [iii] I’ll explain in later stages of this post & in comments…(so please check back); I don’t want to miss the start of the birthday party in honor of Neyman, and it’s already 8:30 p.m in Berkeley!

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program. HAPPY BIRTHDAY NEYMAN! Continue reading

Categories: Bayesian/frequentist, Error Statistics, Neyman, Statistics | Leave a comment

Deconstructing the Fisher-Neyman conflict wearing fiducial glasses (continued)


Fisher/ Neyman

This continues my previous post: “Can’t take the fiducial out of Fisher…” in recognition of Fisher’s birthday, February 17. I supply a few more intriguing articles you may find enlightening to read and/or reread on a Saturday night

Move up 20 years to the famous 1955/56 exchange between Fisher and Neyman. Fisher clearly connects Neyman’s adoption of a behavioristic-performance formulation to his denying the soundness of fiducial inference. When “Neyman denies the existence of inductive reasoning, he is merely expressing a verbal preference. For him ‘reasoning’ means what ‘deductive reasoning’ means to others.” (Fisher 1955, p. 74). Continue reading

Categories: fiducial probability, Fisher, Neyman, Statistics | 4 Comments

Can’t Take the Fiducial Out of Fisher (if you want to understand the N-P performance philosophy) [i]


R.A. Fisher: February 17, 1890 – July 29, 1962

Continuing with posts in recognition of R.A. Fisher’s birthday, I post one from a couple of years ago on a topic that had previously not been discussed on this blog: Fisher’s fiducial probability

[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).

The entire episode of fiducial probability is fraught with minefields. Many say it was Fisher’s biggest blunder; others suggest it still hasn’t been understood. The majority of discussions omit the side trip to the Fiducial Forest altogether, finding the surrounding brambles too thorny to penetrate. Besides, a fascinating narrative about the Fisher-Neyman-Pearson divide has managed to bloom and grow while steering clear of fiducial probability–never mind that it remained a centerpiece of Fisher’s statistical philosophy. I now think that this is a mistake. It was thought, following Lehman (1993) and others, that we could take the fiducial out of Fisher and still understand the core of the Neyman-Pearson vs Fisher (or Neyman vs Fisher) disagreements. We can’t. Quite aside from the intrinsic interest in correcting the “he said/he said” of these statisticians, the issue is intimately bound up with the current (flawed) consensus view of frequentist error statistics.

So what’s fiducial inference? I follow Cox (2006), adapting for the case of the lower limit: Continue reading

Categories: fiducial probability, Fisher, Statistics | Leave a comment

R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)



In recognition of R.A. Fisher’s birthday on February 17….

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

Categories: Fisher, phil/history of stat, Spanos, Statistics | 3 Comments

Guest Blog: STEPHEN SENN: ‘Fisher’s alternative to the alternative’

“You May Believe You Are a Bayesian But You Are Probably Wrong”


As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog a guest post by Stephen Senn from 2012/2017.  The comments from 2017 lead to a troubling issue that I will bring up in the comments today.

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading

Categories: Fisher, S. Senn, Statistics | 1 Comment

Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I’ll post some Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!

Two New Properties of Mathematical Likelihood

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

  The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

Categories: Fisher, phil/history of stat, Statistics | Tags: , , , | 1 Comment

S. Senn: Being a statistician means never having to say you are certain (Guest Post)


Stephen Senn
Head of  Competence Center
for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Being a statistician means never having to say you are certain

A recent discussion of randomised controlled trials[1] by Angus Deaton and Nancy Cartwright (D&C) contains much interesting analysis but also, in my opinion, does not escape rehashing some of the invalid criticisms of randomisation with which the literatures seems to be littered. The paper has two major sections. The latter, which deals with generalisation of results, or what is sometime called external validity, I like much more than the former which deals with internal validity. It is the former I propose to discuss.

Continue reading

Categories: Error Statistics, RCTs, Statistics | 26 Comments

60 yrs of Cox’s (1958) weighing machine, & links to binge-read the Likelihood Principle



2018 will mark 60 years since the famous chestnut from Sir David Cox (1958). The example  “is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992, p. 582). When I describe it, you’ll find it hard to believe many regard it as causing an earthquake in statistical foundations, unless you’re already steeped in these matters. A simple version: If half the time I reported my weight from a scale that’s always right, and half the time use a scale that gets it right with probability .5, would you say I’m right with probability ¾? Well, maybe. But suppose you knew that this measurement was made with the scale that’s right with probability .5? The overall error probability is scarcely relevant for giving the warrant of the particular measurement, knowing which scale was used. So what’s the earthquake? First a bit more on the chestnut. Here’s an excerpt from Cox and Mayo (2010, 295-8): Continue reading

Categories: Sir David Cox, Statistics, strong likelihood principle | 4 Comments

The Conversion of Subjective Bayesian, Colin Howson, & the problem of old evidence (i)


“The subjective Bayesian theory as developed, for example, by Savage … cannot solve the deceptively simple but actually intractable old evidence problem, whence as a foundation for a logic of confirmation at any rate, it must be accounted a failure.” (Howson, (2017), p. 674)

What? Did the “old evidence” problem cause Colin Howson to recently abdicate his decades long position as a leading subjective Bayesian? It seems to have. I was so surprised to come across this in a recent perusal of Philosophy of Science that I wrote to him to check if it is really true. (It is.) I thought perhaps it was a different Colin Howson, or the son of the one who co-wrote 3 editions of Howson and Urbach: Scientific Reasoning: The Bayesian Approach espousing hard-line subjectivism since 1989.[1] I am not sure which of the several paradigms of non-subjective or default Bayesianism Howson endorses (he’d argued for years, convincingly, against any one of them), nor how he handles various criticisms (Kass and Wasserman 1996), I put that aside. Nor have I worked through his, rather complex, paper to the extent necessary, yet. What about the “old evidence” problem, made famous by Clark Glymour 1980?  What is it? Continue reading

Categories: Bayesian priors, objective Bayesians, Statistics | Tags: | 25 Comments

Statistical skepticism: How to use significance tests effectively: 7 challenges & how to respond to them

Here are my slides from the ASA Symposium on Statistical Inference : “A World Beyond p < .05”  in the session, “What are the best uses for P-values?”. (Aside from me,our session included Yoav Benjamini and David Robinson, with chair: Nalini Ravishanker.)


  • Why use a tool that infers from a single (arbitrary) P-value that pertains to a statistical hypothesis H0 to a research claim H*?
  • Why use an incompatible hybrid (of Fisher and N-P)?
  • Why apply a method that uses error probabilities, the sampling distribution, researcher “intentions” and violates the likelihood principle (LP)? You should condition on the data.
  • Why use methods that overstate evidence against a null hypothesis?
  • Why do you use a method that presupposes the underlying statistical model?
  • Why use a measure that doesn’t report effect sizes?
  • Why do you use a method that doesn’t provide posterior probabilities (in hypotheses)?


Categories: P-values, spurious p values, statistical tests, Statistics | Leave a comment

Egon Pearson’s Heresy

E.S. Pearson: 11 Aug 1895-12 June 1980.

Here’s one last entry in honor of Egon Pearson’s birthday: “Statistical Concepts in Their Relation to Reality” (Pearson 1955). I’ve posted it several times over the years (6!), but always find a new gem or two, despite its being so short. E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

(Nowadays, some people concentrate to an absurd extent on “science-wise error rates in dichotomous screening”.) Continue reading

Categories: phil/history of stat, Philosophy of Statistics, Statistics | Tags: , , | Leave a comment

Performance or Probativeness? E.S. Pearson’s Statistical Philosophy

egon pearson

E.S. Pearson (11 Aug, 1895-12 June, 1980)

This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ll blog some E. Pearson items this week, including, my latest reflection on a historical anecdote regarding Egon and the woman he wanted marry, and surely would have, were it not for his father Karl!


Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson.  Continue reading

Categories: highly probable vs highly probed, phil/history of stat, Statistics | Tags: | Leave a comment

Allan Birnbaum: Foundations of Probability and Statistics (27 May 1923 – 1 July 1976)

27 May 1923-1 July 1976

27 May 1923-1 July 1976

Today is Allan Birnbaum’s birthday. In honor of his birthday, I’m posting the articles in the Synthese volume that was dedicated to his memory in 1977. The editors describe it as their way of  “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. I paste a few snippets from the articles by Giere and Birnbaum. If you’re interested in statistical foundations, and are unfamiliar with Birnbaum, here’s a chance to catch up. (Even if you are, you may be unaware of some of these key papers.)


Synthese Volume 36, No. 1 Sept 1977: Foundations of Probability and Statistics, Part I

Editorial Introduction:

This special issue of Synthese on the foundations of probability and statistics is dedicated to the memory of Professor Allan Birnbaum. Professor Birnbaum’s essay ‘The Neyman-Pearson Theory as Decision Theory; and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory’ was received by the editors of Synthese in October, 1975, and a decision was made to publish a special symposium consisting of this paper together with several invited comments and related papers. The sad news about Professor Birnbaum’s death reached us in the summer of 1976, but the editorial project could nevertheless be completed according to the original plan. By publishing this special issue we wish to pay homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics. We are grateful to Professor Ronald Giere who wrote an introductory essay on Professor Birnbaum’s concept of statistical evidence and who compiled a list of Professor Birnbaum’s publications.


Continue reading

Categories: Birnbaum, Likelihood Principle, Statistics, strong likelihood principle | Tags: | 1 Comment


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: April 2014. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 4 others I’d recommend[2].  Posts that are part of a “unit” or a group count as one. For this month, I’ll include all the 6334 seminars as “one”.

April 2014

  • (4/1) April Fool’s. Skeptical and enthusiastic Bayesian priors for beliefs about insane asylum renovations at Dept of Homeland Security: I’m skeptical and unenthusiastic
  • (4/3) Self-referential blogpost (conditionally accepted*)
  • (4/5) Who is allowed to cheat? I.J. Good and that after dinner comedy hour. . ..
  • (4/6) Phil6334: Duhem’s Problem, highly probable vs highly probed; Day #9 Slides
  • (4/8) “Out Damned Pseudoscience: Non-significant results are the new ‘Significant’ results!” (update)
  • (4/12) “Murder or Coincidence?” Statistical Error in Court: Richard Gill (TEDx video)
  • (4/14) Phil6334: Notes on Bayesian Inference: Day #11 Slides
  • (4/16) A. Spanos: Jerzy Neyman and his Enduring Legacy
  • (4/17) Duality: Confidence intervals and the severity of tests
  • (4/19) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill)
  • (4/21) Phil 6334: Foundations of statistics and its consequences: Day#12
  • (4/23) Phil 6334 Visitor: S. Stanley Young, “Statistics and Scientific Integrity”
  • (4/26) Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day #13)
  • (4/30) Able Stats Elba: 3 Palindrome nominees for April! (rejected post)


[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 (moved to 4) -very convenient way to allow data-dependent choices.






Categories: 3-year memory lane, Statistics | Leave a comment

Blog at