Memory Lane: One Year Ago on error statistics.com
A quick perusal of the “Manual” on Nathan Schachtman’s legal blog shows it to be chock full of revealing points of contemporary legal statistical philosophy. The following are some excerpts, read the full blog here. I make two comments at the end.
July 8th, 2012
How does the new Reference Manual on Scientific Evidence (RMSE3d 2011) treat statistical significance? Inconsistently and at times incoherently.
Professor Berger’s Introduction
In her introductory chapter, the late Professor Margaret A. Berger raises the question of the role statistical significance should play in evaluating a study’s support for causal conclusions:
“What role should statistical significance play in assessing the value of a study? Epidemiological studies that are not conclusive but show some increased risk do not prove a lack of causation. Some courts find that they therefore have some probative value, 62 at least in proving general causation. 63”
Margaret A. Berger, “The Admissibility of Expert Testimony,” in RMSE3d 11, 24 (2011).
This seems rather backwards. Berger’s suggestion that inconclusive studies do not prove lack of causation seems nothing more than a tautology. And how can that tautology support the claim that inconclusive studies “therefore ” have some probative value? This is a fairly obvious logical invalid argument, or perhaps a passage badly in need of an editor.
Chapter on Statistics
The RMSE’s chapter on statistics is relatively free of value judgments about significance probability, and, therefore, a great improvement upon Berger’s introduction. The authors carefully describe significance probability and p-values, and explain:
“Small p-values argue against the null hypothesis. Statistical significance is determined by reference to the p-value; significance testing (also called hypothesis testing) is the technique for computing p-values and determining statistical significance.”
David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” in RMSE3d 211, 241 (3ed 2011). Although the chapter confuses and conflates Fisher’s interpretation of p-values with Neyman’s conceptualization of hypothesis testing as a dichotomous decision procedure, this treatment is unfortunately fairly standard in introductory textbooks.
Kaye and Freedman, however, do offer some important qualifications to the untoward consequences of using significance testing as a dichotomous outcome:
“Artifacts from multiple testing are commonplace. Because research that fails to uncover significance often is not published, reviews of the literature may produce an unduly large number of studies finding statistical significance.111 Even a single researcher may examine so many different relationships that a few will achieve statistical significance by mere happenstance. Almost any large data set—even pages from a table of random digits—will contain some unusual pattern that can be uncovered by diligent search. Having detected the pattern, the analyst can perform a statistical test for it, blandly ignoring the search effort. Statistical significance is bound to follow.
There are statistical methods for dealing with multiple looks at the data, which permit the calculation of meaningful p-values in certain cases.112 However, no general solution is available, and the existing methods would be of little help in the typical case where analysts have tested and rejected a variety of models before arriving at the one considered the most satisfactory (see infra Section V on regression models). In these situations, courts should not be overly impressed with claims that estimates are significant. Instead, they should be asking how analysts developed their models.113 ”
Id. at 256 -57. This qualification is omitted from the overlapping discussion in the chapter on epidemiology, where it is very much needed. Continue reading