Evidence can only strengthen a prior belief in low data veracity, N. Liberman & M. Denzler: “Response”



I thought the criticisms of social psychologist Jens Förster were already quite damning (despite some attempts to explain them as mere QRPs), but there’s recently been some pushback from two of his co-authors Liberman and Denzler. Their objections are directed to the application of a distinct method, touted as “Bayesian forensics”, to their joint work with Förster. I discussed it very briefly in a recent “rejected post“. Perhaps the earlier method of criticism was inapplicable to these additional papers, and there’s an interest in seeing those papers retracted as well as the one that was. I don’t claim to know. A distinct “policy” issue is whether there should be uniform standards for retraction calls. At the very least, one would think new methods should be well-vetted before subjecting authors to their indictment (particularly methods which are incapable of issuing in exculpatory evidence, like this one). Here’s a portion of their response. I don’t claim to be up on this case, but I’d be very glad to have reader feedback.

Nira Liberman, School of Psychological Sciences, Tel Aviv University, Israel

Markus Denzler, Federal University of Applied Administrative Sciences, Germany

June 7, 2015

Response to a Report Published by the University of Amsterdam

The University of Amsterdam (UvA) has recently announced the completion of a report that summarizes an examination of all the empirical articles by Jens Förster (JF) during the years of his affiliation with UvA, including those co-authored by us. The report is available online. The report relies solely on statistical evaluation, using the method originally employed in the anonymous complaint against JF, as well as a new version of a method for detecting “low scientific veracity” of data, developed by Prof. Klaassen (2015). The report concludes that some of the examined publications show “strong statistical evidence for low scientific veracity”, some show “inconclusive evidence for low scientific veracity”, and some show “no evidence for low veracity”. UvA announced that on the basis of that report, it would send letters to the Journals, asking them to retract articles from the first category, and to consider retraction of articles in the second category.

After examining the report, we have reached the conclusion that it is misleading, biased and is based on erroneous statistical procedures. In view of that we surmise that it does not present reliable evidence for “low scientific veracity”.

We ask you to consider our criticism of the methods used in UvA’s report and the procedures leading to their recommendations in your decision.

Let us emphasize that we never fabricated or manipulated data, nor have we ever witnessed such behavior on the part of Jens Förster or other co-authors.

Here are our major points of criticism. Please note that, due to time considerations, our examination and criticism focus on papers co-authored by us. Below, we provide some background information and then elaborate on these points. Continue reading

Categories: junk science, reproducibility | Tags: | 9 Comments

“Fraudulent until proved innocent: Is this really the new “Bayesian Forensics”? (rejected post)

Objectivity 1: Will the Real Junk Science Please Stand Up?Fraudulent until proved innocent: Is this really the new “Bayesian Forensics”? (rejected post)




Categories: evidence-based policy, frequentist/Bayesian, junk science, Rejected Posts | 2 Comments

What Would Replication Research Under an Error Statistical Philosophy Be?

f1ce127a4cfe95c4f645f0cc98f04fcaAround a year ago on this blog I wrote:

“There are some ironic twists in the way psychology is dealing with its replication crisis that may well threaten even the most sincere efforts to put the field on firmer scientific footing”

That’s philosopher’s talk for “I see a rich source of problems that cry out for ministrations of philosophers of science and of statistics”. Yesterday, I began my talk at the Society for Philosophy and Psychology workshop on “Replication in the Sciences”with examples of two main philosophical tasks: to clarify concepts, and reveal inconsistencies, tensions and ironies surrounding methodological “discomforts” in scientific practice.

Example of a conceptual clarification 

Editors of a journal, Basic and Applied Social Psychology, announced they are banning statistical hypothesis testing because it is “invalid” (A puzzle about the latest “test ban”)

It’s invalid because it does not supply “the probability of the null hypothesis, given the finding” (the posterior probability of H0) (2015 Trafimow and Marks)

  • Since the methodology of testing explicitly rejects the mode of inference they don’t supply, it would be incorrect to claim the methods were invalid.
  • Simple conceptual job that philosophers are good at

(I don’t know if the group of eminent statisticians assigned to react to the “test ban” will bring up this point. I don’t think it includes any philosophers.)



Example of revealing inconsistencies and tensions 

Critic: It’s too easy to satisfy standard significance thresholds

You: Why do replicationists find it so hard to achieve significance thresholds?

Critic: Obviously the initial studies were guilty of p-hacking, cherry-picking, significance seeking, QRPs

You: So, the replication researchers want methods that pick up on and block these biasing selection effects.

Critic: Actually the “reforms” recommend methods where selection effects and data dredging make no difference.


Whether this can be resolved or not is separate.

  • We are constantly hearing of how the “reward structure” leads to taking advantage of researcher flexibility
  • As philosophers, we can at least show how to hold their feet to the fire, and warn of the perils of accounts that bury the finagling

The philosopher is the curmudgeon (takes chutzpah!)

I also think it’s crucial for philosophers of science and statistics to show how to improve on and solve problems of methodology in scientific practice.

My slides are below; share comments.

Categories: Error Statistics, reproducibility, Statistics | 18 Comments

3 YEARS AGO (MAY 2012): Saturday Night Memory Lane

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: May 2012. Lots of worthy reading and rereading for your Saturday night memory lane; it was hard to choose just 3. 

I mark in red three posts that seem most apt for general background on key issues in this blog* (Posts that are part of a “unit” or a group of “U-Phils” count as one.) This new feature, appearingthe end of each month, began at the blog’s 3-year anniversary in Sept, 2014.

*excluding any that have been recently reblogged.


May 2012

Categories: 3-year memory lane | Leave a comment

“Intentions” is the new code word for “error probabilities”: Allan Birnbaum’s Birthday

27 May 1923-1 July 1976

27 May 1923-1 July 1976

Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference,” in Breakthroughs in Statistics (volume I 1993), concerns a principle that remains at the heart of today’s controversies in statistics–even if it isn’t obvious at first: the Likelihood Principle (LP) (also called the strong likelihood Principle SLP, to distinguish it from the weak LP [1]). According to the LP/SLP, given the statistical model, the information from the data are fully contained in the likelihood ratio. Thus, properties of the sampling distribution of the test statistic vanish (as I put it in my slides from my last post)! But error probabilities are all properties of the sampling distribution. Thus, embracing the LP (SLP) blocks our error statistician’s direct ways of taking into account “biasing selection effects” (slide #10).

Intentions is a New Code Word: Where, then, is all the information regarding your trying and trying again, stopping when the data look good, cherry picking, barn hunting and data dredging? For likelihoodists and other probabilists who hold the LP/SLP, it is ephemeral information locked in your head reflecting your “intentions”!  “Intentions” is a code word for “error probabilities” in foundational discussions, as in “who would want to take intentions into account?” (Replace “intentions” (or the “researcher’s intentions”) with “error probabilities” (or the method’s error probabilities”) and you get a more accurate picture.) Keep this deciphering tool firmly in mind as you read criticisms of methods that take error probabilities into account[2]. For error statisticians, this information reflects real and crucial properties of your inference procedure.

Continue reading

Categories: Birnbaum, Birnbaum Brakes, frequentist/Bayesian, Likelihood Principle, phil/history of stat, Statistics | 48 Comments

From our “Philosophy of Statistics” session: APS 2015 convention



“The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference,” at the 2015 American Psychological Society (APS) Annual Convention in NYC, May 23, 2015:


D. Mayo: “Error Statistical Control: Forfeit at your Peril” 


S. Senn: “‘Repligate’: reproducibility in statistical studies. What does it mean and in what sense does it matter?”


A. Gelman: “The statistical crisis in science” (this is not his exact presentation, but he focussed on some of these slides)


For more details see this post.

Categories: Bayesian/frequentist, Error Statistics, P-values, reforming the reformers, reproducibility, S. Senn, Statistics | 10 Comments

Workshop on Replication in the Sciences: Society for Philosophy and Psychology: (2nd part of double header)

brain-quadrants2nd part of the double header:

Society for Philosophy and Psychology (SPP): 41st Annual meeting

SPP 2015 Program

Wednesday, June 3rd
1:30-6:30: Preconference Workshop on Replication in the Sciences, organized by Edouard Machery

1:30-2:15: Edouard Machery (Pitt)
2:15-3:15: Andrew Gelman (Columbia, Statistics, via video link)
3:15-4:15: Deborah Mayo (Virginia Tech, Philosophy)
4:15-4:30: Break
4:30-5:30: Uri Simonshon (Penn, Psychology)
5:30-6:30: Tal Yarkoni (University of Texas, Neuroscience)

 SPP meeting: 4-6 June 2015 at Duke University in Durham, North Carolina


First part of the double header:

The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference, 2015 APS Annual Convention Saturday, May 23  2:00 PM- 3:50 PM in Wilder (Marriott Marquis 1535 B’way)aps_2015_logo_cropped-1

Andrew Gelman
Stephen Senn
Deborah Mayo
Richard Morey, Session Chair & Discussant

taxi: VA-NYC-NC

 See earlier post for Frank Sinatra and more details
Categories: Announcement, reproducibility | Leave a comment

“Error statistical modeling and inference: Where methodology meets ontology” A. Spanos and D. Mayo



A new joint paper….

“Error statistical modeling and inference: Where methodology meets ontology”

Aris Spanos · Deborah G. Mayo

Abstract: In empirical modeling, an important desideratum for deeming theoretical entities and processes real is that they can be reproducible in a statistical sense. Current day crises regarding replicability in science intertwine with the question of how statistical methods link data to statistical and substantive theories and models. Different answers to this question have important methodological consequences for inference, which are intertwined with a contrast between the ontological commitments of the two types of models. The key to untangling them is the realization that behind every substantive model there is a statistical model that pertains exclusively to the probabilistic assumptions imposed on the data. It is not that the methodology determines whether to be a realist about entities and processes in a substantive field. It is rather that the substantive and statistical models refer to different entities and processes, and therefore call for different criteria of adequacy.

Keywords: Error statistics · Statistical vs. substantive models · Statistical ontology · Misspecification testing · Replicability of inference · Statistical adequacy

To read the full paper: “Error statistical modeling and inference: Where methodology meets ontology.”

The related conference.

Mayo & Spanos spotlight

Reference: Spanos, A. & Mayo, D. G. (2015). “Error statistical modeling and inference: Where methodology meets ontology.” Synthese (online May 13, 2015), pp. 1-23.

Categories: Error Statistics, misspecification testing, O & M conference, reproducibility, Severity, Spanos | 2 Comments

Stephen Senn: Double Jeopardy?: Judge Jeffreys Upholds the Law (sequel to the pathetic P-value)

S. Senn

S. Senn

Stephen Senn
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

Double Jeopardy?: Judge Jeffreys Upholds the Law

“But this could be dealt with in a rough empirical way by taking twice the standard error as a criterion for possible genuineness and three times the standard error for definite acceptance”. Harold Jeffreys(1) (p386)

This is the second of two posts on P-values. In the first, The Pathetic P-Value, I considered the relation of P-values to Laplace’s Bayesian formulation of induction, pointing out that that P-values, whilst they had a very different interpretation, were numerically very similar to a type of Bayesian posterior probability. In this one, I consider their relation or lack of it, to Harold Jeffreys’s radically different approach to significance testing. (An excellent account of the development of Jeffreys’s thought is given by Howie(2), which I recommend highly.)

The story starts with Cambridge philosopher CD Broad (1887-1971), who in 1918 pointed to a difficulty with Laplace’s Law of Succession. Broad considers the problem of drawing counters from an urn containing n counters and supposes that all m drawn had been observed to be white. He now considers two very different questions, which have two very different probabilities and writes:

C.D. Broad quoteNote that in the case that only one counter remains we have n = m + 1 and the two probabilities are the same. However, if n > m+1 they are not the same and in particular if m is large but n is much larger, the first probability can approach 1 whilst the second remains small.

The practical implication of this just because Bayesian induction implies that a large sequence of successes (and no failures) supports belief that the next trial will be a success, it does not follow that one should believe that all future trials will be so. This distinction is often misunderstood. This is The Economist getting it wrong in September 2000

The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child’s degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.

See Dicing with Death(3) (pp76-78).

The practical relevance of this is that scientific laws cannot be established by Laplacian induction. Jeffreys (1891-1989) puts it thus

Thus I may have seen 1 in 1000 of the ‘animals with feathers’ in England; on Laplace’s theory the probability of the proposition, ‘all animals with feathers have beaks’, would be about 1/1000. This does not correspond to my state of belief or anybody else’s. (P128)

Continue reading

Categories: Jeffreys, P-values, reforming the reformers, Statistics, Stephen Senn | 41 Comments

What really defies common sense (Msc kvetch on rejected posts)

imgres-2Msc Kvetch on my Rejected Posts blog.

Categories: frequentist/Bayesian, msc kvetch, rejected post | Leave a comment

Spurious Correlations: Death by getting tangled in bedsheets and the consumption of cheese! (Aris Spanos)



These days, there are so many dubious assertions about alleged correlations between two variables that an entire website: Spurious Correlation (Tyler Vigen) is devoted to exposing (and creating*) them! A classic problem is that the means of variables X and Y may both be trending in the order data are observed, invalidating the assumption that their means are constant. In my initial study with Aris Spanos on misspecification testing, the X and Y means were trending in much the same way I imagine a lot of the examples on this site are––like the one on the number of people who die by becoming tangled in their bedsheets and the per capita consumption of cheese in the U.S.

The annual data for 2000-2009 are: xt: per capita consumption of cheese (U.S.) : x = (29.8, 30.1, 30.5, 30.6, 31.3, 31.7, 32.6, 33.1, 32.7, 32.8); yt: Number of people who died by becoming tangled in their bedsheets: y = (327, 456, 509, 497, 596, 573, 661, 741, 809, 717)

I asked Aris Spanos to have a look, and it took him no time to identify the main problem. He was good enough to write up a short note which I’ve pasted as slides.


Aris Spanos

Wilson E. Schmidt Professor of Economics
Department of Economics, Virginia Tech



*The site says that the server attempts to generate a new correlation every 60 seconds.

Categories: misspecification testing, Spanos, Statistics, Testing Assumptions | 14 Comments

96% Error in “Expert” Testimony Based on Probability of Hair Matches: It’s all Junk!

Objectivity 1: Will the Real Junk Science Please Stand Up?Imagine. The New York Times reported a few days ago that the FBI erroneously identified criminals 96% of the time based on probability assessments using forensic hair samples (up until 2000). Sometimes the hair wasn’t even human, it might have come from a dog, a cat or a fur coat!  I posted on  the unreliability of hair forensics a few years ago.  The forensics of bite marks aren’t much better.[i] John Byrd, forensic analyst and reader of this blog had commented at the time that: “At the root of it is the tradition of hiring non-scientists into the technical positions in the labs. They tended to be agents. That explains a lot about misinterpretation of the weight of evidence and the inability to explain the import of lab findings in court.” DNA is supposed to cure all that. So is it? I don’t know, but apparently the FBI “has agreed to provide free DNA testing where there is either a court order or a request for testing by the prosecution.”[ii] See the FBI report.

Here’s the op-ed from the New York Times from April 27, 2015:

Junk Science at the FBI”

The odds were 10-million-to-one, the prosecution said, against hair strands found at the scene of a 1978 murder of a Washington, D.C., taxi driver belonging to anyone but Santae Tribble. Based largely on this compelling statistic, drawn from the testimony of an analyst with the Federal Bureau of Investigation, Mr. Tribble, 17 at the time, was convicted of the crime and sentenced to 20 years to life.

But the hair did not belong to Mr. Tribble. Some of it wasn’t even human. In 2012, a judge vacated Mr. Tribble’s conviction and dismissed the charges against him when DNA testing showed there was no match between the hair samples, and that one strand had come from a dog.

Mr. Tribble’s case — along with the exoneration of two other men who served decades in prison based on faulty hair-sample analysis — spurred the F.B.I. to conduct a sweeping post-conviction review of 2,500 cases in which its hair-sample lab reported a match.

The preliminary results of that review, which Spencer Hsu of The Washington Post reported last week, are breathtaking: out of 268 criminal cases nationwide between 1985 and 1999, the bureau’s “elite” forensic hair-sample analysts testified wrongly in favor of the prosecution, in 257, or 96 percent of the time. Thirty-two defendants in those cases were sentenced to death; 14 have since been executed or died in prison.Forensic Hair red

The agency is continuing to review the rest of the cases from the pre-DNA era. The Justice Department is working with the Innocence Project and the National Association of Criminal Defense Lawyers to notify the defendants in those cases that they may have grounds for an appeal. It cannot, however, address the thousands of additional cases where potentially flawed testimony came from one of the 500 to 1,000 state or local analysts trained by the F.B.I. Peter Neufeld, co-founder of the Innocence Project, rightly called this a “complete disaster.”

Law enforcement agencies have long known of the dubious value of hair-sample analysis. A 2009 report by the National Research Council found “no scientific support” and “no uniform standards” for the method’s use in positively identifying a suspect. At best, hair-sample analysis can rule out a suspect, or identify a wide class of people with similar characteristics.

Yet until DNA testing became commonplace in the late 1990s, forensic analysts testified confidently to the near-certainty of matches between hair found at crime scenes and samples taken from defendants. The F.B.I. did not even have written standards on how analysts should testify about their findings until 2012.

Continue reading

Categories: evidence-based policy, junk science, PhilStat Law, Statistics | 3 Comments


3 years ago...

* 3 years ago…

MONTHLY MEMORY LANE: 3 years ago: March 2012. I mark in red three posts that seem most apt for general background on key issues in this blog* (Posts that are part of a “unit” or a group of “U-Phils” count as one.) This new feature, appearing the last week of each month, began at the blog’s 3-year anniversary in Sept, 2014.

*excluding those recently reblogged.

April 2012

Contributions from readers in relation to published papers

Two book reviews of Error and the Growth of Experimental Knowledge (EGEK 1996)-counted as 1 unit

Categories: 3-year memory lane, Statistics | Tags: | Leave a comment

“Statistical Concepts in Their Relation to Reality” by E.S. Pearson

To complete the last post, here’s Pearson’s portion of the “triad” 

E.S.Pearson on Gate

E.S.Pearson on Gate (sketch by D. Mayo)

“Statistical Concepts in Their Relation to Reality”

by E.S. PEARSON (1955)

SUMMARY: This paper contains a reply to some criticisms made by Sir Ronald Fisher in his recent article on “Scientific Methods and Scientific Induction”.

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”.  There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!


Categories: E.S. Pearson, phil/history of stat, Statistics | Tags: , , | Leave a comment

NEYMAN: “Note on an Article by Sir Ronald Fisher” (3 uses for power, Fisher’s fiducial argument)

Note on an Article by Sir Ronald Fisher

By Jerzy Neyman (1956)


(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation.  (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible.  (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values.  The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight.  (4)  The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

1. Introduction

In a recent article (Fisher, 1955), Sir Ronald Fisher delivered an attack on a a substantial part of the research workers in mathematical statistics. My name is mentioned more frequently than any other and is accompanied by the more expressive invectives. Of the scientific questions raised by Fisher many were sufficiently discussed before (Neyman and Pearson, 1933; Neyman, 1937; Neyman, 1952). In the present note only the following points will be considered: (i) Fisher’s attack on the concept of errors of the second kind; (ii) Fisher’s reference to my objections to fiducial probability; (iii) Fisher’s reference to the origin of the concept of loss function and, before all, (iv) Fisher’s attack on Abraham Wald.


Categories: Fisher, Neyman, phil/history of stat, Statistics | Tags: , , | 2 Comments

Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen


Neyman, drawn by ?

Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena” by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

I hadn’t posted this paper of Neyman’s before, so here’s something for your weekend reading:  “Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena.”  I recommend, especially, the example on home ownership. Here are two snippets:


The title of the present session involves an element that appears mysterious to me. This element is the apparent distinction between tests of statistical hypotheses, on the one hand, and tests of significance, on the other. If this is not a lapse of someone’s pen, then I hope to learn the conceptual distinction. Continue reading

Categories: Error Statistics, Neyman, Statistics | Tags: | 18 Comments

A. Spanos: Jerzy Neyman and his Enduring Legacy


A Statistical Model as a Chance Mechanism
Aris Spanos 

Today is the birthday of Jerzy Neyman (April 16, 1894 – August 5, 1981). Neyman was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

Neyman: 16 April

Neyman: 16 April 1894 – 5 Aug 1981

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for  non-random samples. Fisher’s original parametric statistical model Mθ(x) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data x0:=(x1,x2,…,xn) can be viewed as a ‘truly representative sample’ from that ‘population’:


Fisher and Neyman

“The postulate of randomness thus resolves itself into the question, Of what population is this a random sample? (ibid., p. 313), underscoring that: the adequacy of our choice may be tested a posteriori.’’ (p. 314)

In cases where data x0 come from sample surveys or it can be viewed as a typical realization of a random sample X:=(X1,X2,…,Xn), i.e. Independent and Identically Distributed (IID) random variables, the ‘population’ metaphor can be helpful in adding some intuitive appeal to the inductive dimension of statistical inference, because one can imagine using a subset of a population (the sample) to draw inferences pertaining to the whole population. Continue reading

Categories: Neyman, phil/history of stat, Spanos, Statistics | Tags: , | Leave a comment

Philosophy of Statistics Comes to the Big Apple! APS 2015 Annual Convention — NYC

Start Spreading the News…..



 The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference,
2015 APS Annual Convention
Saturday, May 23  
2:00 PM- 3:50 PM in Wilder

(Marriott Marquis 1535 B’way)





Andrew Gelman

Professor of Statistics & Political Science
Columbia University



Stephen Senn

Head of Competence Center
for Methodology and Statistics (CCMS)

Luxembourg Institute of Health



D. Mayo headshot

D.G. Mayo, Philosopher



Richard Morey, Session Chair & Discussant

Senior Lecturer
School of Psychology
Cardiff University
Categories: Announcement, Bayesian/frequentist, Statistics | 8 Comments

Heads I win, tails you lose? Meehl and many Popperians get this wrong (about severe tests)!


bending of starlight.

[T]he impressive thing about the 1919 tests of Einstein ‘s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted. The theory is incompatible with certain possible results of observation—in fact with results which everybody before Einstein would have expected. This is quite different from the situation I have previously described, [where]..it was practically impossible to describe any human behavior that might not be claimed to be a verification of these [psychological] theories.” (Popper, CR, [p. 36))


Popper lauds Einstein’s General Theory of Relativity (GTR) as sticking its neck out, bravely being ready to admit its falsity were the deflection effect not found. The truth is that even if no deflection effect had been found in the 1919 experiments it would have been blamed on the sheer difficulty in discerning so small an effect (the results that were found were quite imprecise.) This would have been entirely correct! Yet many Popperians, perhaps Popper himself, get this wrong.[i] Listen to Popperian Paul Meehl (with whom I generally agree).

The stipulation beforehand that one will be pleased about substantive theory T when the numerical results come out as forecast, but will not necessarily abandon it when they do not, seems on the face of it to be about as blatant a violation of the Popperian commandment as you could commit. For the investigator, in a way, is doing…what astrologers and Marxists and psychoanalysts allegedly do, playing heads I win, tails you lose.” (Meehl 1978, 821)

No, there is a confusion of logic. A successful result may rightly be taken as evidence for a real effect H, even though failing to find the effect need not be taken to refute the effect, or even as evidence as against H. This makes perfect sense if one keeps in mind that a test might have had little chance to detect the effect, even if it existed. The point really reflects the asymmetry of falsification and corroboration. Popperian Alan Chalmers wrote an appendix to a chapter of his book, What is this Thing Called Science? (1999)(which at first had criticized severity for this) once I made my case. [i] Continue reading

Categories: fallacy of non-significance, philosophy of science, Popper, Severity, Statistics | Tags: | 2 Comments

Joan Clarke, Turing, I.J. Good, and “that after-dinner comedy hour…”

I finally saw The Imitation Game about Alan Turing and code-breaking at Bletchley Park during WWII. This short clip of Joan Clarke, who was engaged to Turing, includes my late colleague I.J. Good at the end (he’s not second as the clip lists him). Good used to talk a great deal about Bletchley Park and his code-breaking feats while asleep there (see note[a]), but I never imagined Turing’s code-breaking machine (which, by the way, was called the Bombe and not Christopher as in the movie) was so clunky. The movie itself has two tiny scenes including Good. Below I reblog: “Who is Allowed to Cheat?”—one of the topics he and I debated over the years. Links to the full “Savage Forum” (1962) may be found at the end (creaky, but better than nothing.)

[a]”Some sensitive or important Enigma messages were enciphered twice, once in a special variation cipher and again in the normal cipher. …Good dreamed one night that the process had been reversed: normal cipher first, special cipher second. When he woke up he tried his theory on an unbroken message – and promptly broke it.” This, and further examples may be found in this obituary

[b] Pictures comparing the movie cast and the real people may be found here. Continue reading

Categories: Bayesian/frequentist, optional stopping, Statistics, strong likelihood principle | 6 Comments

Blog at WordPress.com. The Adventure Journal Theme.


Get every new post delivered to your Inbox.

Join 997 other followers