Monthly Archives: April 2015

96% Error in “Expert” Testimony Based on Probability of Hair Matches: It’s all Junk!

Objectivity 1: Will the Real Junk Science Please Stand Up?Imagine. The New York Times reported a few days ago that the FBI erroneously identified criminals 96% of the time based on probability assessments using forensic hair samples (up until 2000). Sometimes the hair wasn’t even human, it might have come from a dog, a cat or a fur coat!  I posted on  the unreliability of hair forensics a few years ago.  The forensics of bite marks aren’t much better.[i] John Byrd, forensic analyst and reader of this blog had commented at the time that: “At the root of it is the tradition of hiring non-scientists into the technical positions in the labs. They tended to be agents. That explains a lot about misinterpretation of the weight of evidence and the inability to explain the import of lab findings in court.” DNA is supposed to cure all that. So is it? I don’t know, but apparently the FBI “has agreed to provide free DNA testing where there is either a court order or a request for testing by the prosecution.”[ii] See the FBI report.

Here’s the op-ed from the New York Times from April 27, 2015:

Junk Science at the FBI”

The odds were 10-million-to-one, the prosecution said, against hair strands found at the scene of a 1978 murder of a Washington, D.C., taxi driver belonging to anyone but Santae Tribble. Based largely on this compelling statistic, drawn from the testimony of an analyst with the Federal Bureau of Investigation, Mr. Tribble, 17 at the time, was convicted of the crime and sentenced to 20 years to life.

But the hair did not belong to Mr. Tribble. Some of it wasn’t even human. In 2012, a judge vacated Mr. Tribble’s conviction and dismissed the charges against him when DNA testing showed there was no match between the hair samples, and that one strand had come from a dog.

Mr. Tribble’s case — along with the exoneration of two other men who served decades in prison based on faulty hair-sample analysis — spurred the F.B.I. to conduct a sweeping post-conviction review of 2,500 cases in which its hair-sample lab reported a match.

The preliminary results of that review, which Spencer Hsu of The Washington Post reported last week, are breathtaking: out of 268 criminal cases nationwide between 1985 and 1999, the bureau’s “elite” forensic hair-sample analysts testified wrongly in favor of the prosecution, in 257, or 96 percent of the time. Thirty-two defendants in those cases were sentenced to death; 14 have since been executed or died in prison.Forensic Hair red

The agency is continuing to review the rest of the cases from the pre-DNA era. The Justice Department is working with the Innocence Project and the National Association of Criminal Defense Lawyers to notify the defendants in those cases that they may have grounds for an appeal. It cannot, however, address the thousands of additional cases where potentially flawed testimony came from one of the 500 to 1,000 state or local analysts trained by the F.B.I. Peter Neufeld, co-founder of the Innocence Project, rightly called this a “complete disaster.”

Law enforcement agencies have long known of the dubious value of hair-sample analysis. A 2009 report by the National Research Council found “no scientific support” and “no uniform standards” for the method’s use in positively identifying a suspect. At best, hair-sample analysis can rule out a suspect, or identify a wide class of people with similar characteristics.

Yet until DNA testing became commonplace in the late 1990s, forensic analysts testified confidently to the near-certainty of matches between hair found at crime scenes and samples taken from defendants. The F.B.I. did not even have written standards on how analysts should testify about their findings until 2012.

Continue reading

Categories: evidence-based policy, junk science, PhilStat Law, Statistics | 3 Comments


3 years ago...

* 3 years ago…

MONTHLY MEMORY LANE: 3 years ago: March 2012. I mark in red three posts that seem most apt for general background on key issues in this blog* (Posts that are part of a “unit” or a group of “U-Phils” count as one.) This new feature, appearing the last week of each month, began at the blog’s 3-year anniversary in Sept, 2014.

*excluding those recently reblogged.

April 2012

Contributions from readers in relation to published papers

Two book reviews of Error and the Growth of Experimental Knowledge (EGEK 1996)-counted as 1 unit

Categories: 3-year memory lane, Statistics | Tags: | Leave a comment

“Statistical Concepts in Their Relation to Reality” by E.S. Pearson

To complete the last post, here’s Pearson’s portion of the “triad” 

E.S.Pearson on Gate

E.S.Pearson on Gate (sketch by D. Mayo)

“Statistical Concepts in Their Relation to Reality”

by E.S. PEARSON (1955)

SUMMARY: This paper contains a reply to some criticisms made by Sir Ronald Fisher in his recent article on “Scientific Methods and Scientific Induction”.

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”.  There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!


Categories: E.S. Pearson, phil/history of stat, Statistics | Tags: , , | Leave a comment

NEYMAN: “Note on an Article by Sir Ronald Fisher” (3 uses for power, Fisher’s fiducial argument)

Note on an Article by Sir Ronald Fisher

By Jerzy Neyman (1956)


(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation.  (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible.  (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values.  The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight.  (4)  The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

1. Introduction

In a recent article (Fisher, 1955), Sir Ronald Fisher delivered an attack on a a substantial part of the research workers in mathematical statistics. My name is mentioned more frequently than any other and is accompanied by the more expressive invectives. Of the scientific questions raised by Fisher many were sufficiently discussed before (Neyman and Pearson, 1933; Neyman, 1937; Neyman, 1952). In the present note only the following points will be considered: (i) Fisher’s attack on the concept of errors of the second kind; (ii) Fisher’s reference to my objections to fiducial probability; (iii) Fisher’s reference to the origin of the concept of loss function and, before all, (iv) Fisher’s attack on Abraham Wald.


Categories: Fisher, Neyman, phil/history of stat, Statistics | Tags: , , | 2 Comments

Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen


Neyman, drawn by ?

Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena” by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

I hadn’t posted this paper of Neyman’s before, so here’s something for your weekend reading:  “Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena.”  I recommend, especially, the example on home ownership. Here are two snippets:


The title of the present session involves an element that appears mysterious to me. This element is the apparent distinction between tests of statistical hypotheses, on the one hand, and tests of significance, on the other. If this is not a lapse of someone’s pen, then I hope to learn the conceptual distinction. Continue reading

Categories: Error Statistics, Neyman, Statistics | Tags: | 18 Comments

A. Spanos: Jerzy Neyman and his Enduring Legacy


A Statistical Model as a Chance Mechanism
Aris Spanos 

Today is the birthday of Jerzy Neyman (April 16, 1894 – August 5, 1981). Neyman was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

Neyman: 16 April

Neyman: 16 April 1894 – 5 Aug 1981

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for  non-random samples. Fisher’s original parametric statistical model Mθ(x) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data x0:=(x1,x2,…,xn) can be viewed as a ‘truly representative sample’ from that ‘population’:


Fisher and Neyman

“The postulate of randomness thus resolves itself into the question, Of what population is this a random sample? (ibid., p. 313), underscoring that: the adequacy of our choice may be tested a posteriori.’’ (p. 314)

In cases where data x0 come from sample surveys or it can be viewed as a typical realization of a random sample X:=(X1,X2,…,Xn), i.e. Independent and Identically Distributed (IID) random variables, the ‘population’ metaphor can be helpful in adding some intuitive appeal to the inductive dimension of statistical inference, because one can imagine using a subset of a population (the sample) to draw inferences pertaining to the whole population. Continue reading

Categories: Neyman, phil/history of stat, Spanos, Statistics | Tags: , | Leave a comment

Philosophy of Statistics Comes to the Big Apple! APS 2015 Annual Convention — NYC

Start Spreading the News…..



 The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference,
2015 APS Annual Convention
Saturday, May 23  
2:00 PM- 3:50 PM in Wilder

(Marriott Marquis 1535 B’way)





Andrew Gelman

Professor of Statistics & Political Science
Columbia University



Stephen Senn

Head of Competence Center
for Methodology and Statistics (CCMS)

Luxembourg Institute of Health



D. Mayo headshot

D.G. Mayo, Philosopher



Richard Morey, Session Chair & Discussant

Senior Lecturer
School of Psychology
Cardiff University
Categories: Announcement, Bayesian/frequentist, Statistics | 8 Comments

Heads I win, tails you lose? Meehl and many Popperians get this wrong (about severe tests)!


bending of starlight.

[T]he impressive thing about the 1919 tests of Einstein ‘s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted. The theory is incompatible with certain possible results of observation—in fact with results which everybody before Einstein would have expected. This is quite different from the situation I have previously described, [where] was practically impossible to describe any human behavior that might not be claimed to be a verification of these [psychological] theories.” (Popper, CR, [p. 36))


Popper lauds Einstein’s General Theory of Relativity (GTR) as sticking its neck out, bravely being ready to admit its falsity were the deflection effect not found. The truth is that even if no deflection effect had been found in the 1919 experiments it would have been blamed on the sheer difficulty in discerning so small an effect (the results that were found were quite imprecise.) This would have been entirely correct! Yet many Popperians, perhaps Popper himself, get this wrong.[i] Listen to Popperian Paul Meehl (with whom I generally agree).

The stipulation beforehand that one will be pleased about substantive theory T when the numerical results come out as forecast, but will not necessarily abandon it when they do not, seems on the face of it to be about as blatant a violation of the Popperian commandment as you could commit. For the investigator, in a way, is doing…what astrologers and Marxists and psychoanalysts allegedly do, playing heads I win, tails you lose.” (Meehl 1978, 821)

No, there is a confusion of logic. A successful result may rightly be taken as evidence for a real effect H, even though failing to find the effect need not be taken to refute the effect, or even as evidence as against H. This makes perfect sense if one keeps in mind that a test might have had little chance to detect the effect, even if it existed. The point really reflects the asymmetry of falsification and corroboration. Popperian Alan Chalmers wrote an appendix to a chapter of his book, What is this Thing Called Science? (1999)(which at first had criticized severity for this) once I made my case. [i] Continue reading

Categories: fallacy of non-significance, philosophy of science, Popper, Severity, Statistics | Tags: | 2 Comments

Joan Clarke, Turing, I.J. Good, and “that after-dinner comedy hour…”

I finally saw The Imitation Game about Alan Turing and code-breaking at Bletchley Park during WWII. This short clip of Joan Clarke, who was engaged to Turing, includes my late colleague I.J. Good at the end (he’s not second as the clip lists him). Good used to talk a great deal about Bletchley Park and his code-breaking feats while asleep there (see note[a]), but I never imagined Turing’s code-breaking machine (which, by the way, was called the Bombe and not Christopher as in the movie) was so clunky. The movie itself has two tiny scenes including Good. Below I reblog: “Who is Allowed to Cheat?”—one of the topics he and I debated over the years. Links to the full “Savage Forum” (1962) may be found at the end (creaky, but better than nothing.)

[a]”Some sensitive or important Enigma messages were enciphered twice, once in a special variation cipher and again in the normal cipher. …Good dreamed one night that the process had been reversed: normal cipher first, special cipher second. When he woke up he tried his theory on an unbroken message – and promptly broke it.” This, and further examples may be found in this obituary

[b] Pictures comparing the movie cast and the real people may be found here. Continue reading

Categories: Bayesian/frequentist, optional stopping, Statistics, strong likelihood principle | 6 Comments

Are scientists really ready for ‘retraction offsets’ to advance ‘aggregate reproducibility’? (let alone ‘precautionary withdrawals’)



Given recent evidence of the irreproducibility of a surprising number of published scientific findings, the White House’s Office of Science and Technology Policy (OSTP) sought ideas for “leveraging its role as a significant funder of scientific research to most effectively address the problem”, and announced funding for projects to “reset the self-corrective process of scientific inquiry”. (first noted in this post.)ostp

I was sent some information this morning with a rather long description of the project that received the top government award thus far (and it’s in the millions). I haven’t had time to read the proposal*, which I’ll link to shortly, but for a clear and quick description, you can read the excerpt of an interview of the OSTP representative by the editor of the Newsletter for Innovation in Science Journals (Working Group), Jim Stein, who took the lead in writing the author check list for Nature.

Stein’s queries are in burgundy, OSTP’s are in blue. Occasional comments from me are in black, which I’ll update once I study the fine print of the proposal itself. Continue reading

Categories: junk science, reproducibility, science communication, Statistics | 9 Comments

Blog at