Given some slight recuperation delays, interested readers might wish to poke around the multiple layers of goodies on the left hand side of this web page, wherein all manner of foundational/statistical controversies are considered. In a recent attempt by Aris Spanos and I to address the age-old criticisms from the perspective of the “error statistical philosophy,” we delineate 13 criticisms. Here they are:
Ø (#1) error statistical tools forbid using any background knowledge.
Ø (#2) All statistically signiﬁcant results are treated the same.
Ø (#3) The p-value does not tell us how large a discrepancy is found.
Ø (#4) With large enough sample size even a trivially small discrepancy from the null can be detected.
Ø (#5) Whether there is a statistically signiﬁcant diﬀerence from the null depends on which is the null and which is the alternative.
Ø (#6) Statistically insigniﬁcant results are taken as evidence that the null hypothesis is true.
Ø (#7) Error probabilities are invariably misinterpreted as posterior probabilities.
Ø (#8) Error statistical tests are justiﬁed only in cases where there is a very long (if not inﬁnite) series of repetitions of the same experiment.
Ø (#9) Specifying statistical tests is too arbitrary.
Ø (#10) We should be doing conﬁdence interval estimation rather than signiﬁcance tests.
Ø (#11) Error statistical methods take into account the intentions of the scientists analyzing the data.
Ø (#12) All models are false anyway.
Ø (#13) Testing assumptions involves illicit data-mining.
HAVE WE LEFT ANY OUT?
Mayo & Spanos “Error Statistics” 2011
(for problems accessing links, please write to: firstname.lastname@example.org)
This is a nice collection of issues and arguments that many interested in statistical inference might value from reading. It does though presuppose a good familiarity with the finer conditions on tests and confidence intervals that perhaps a minority of statisticians actually encounter in their education.
Additionally those who do are likely to see it just as further testing of their abilities to notice and rigorously address the very subtle in proofs and theorems – the similarity of tests, nesting requirement of confidence intervals and the “demand” in both to avoid relevant subsets. Most accept anything with a definable type one error rate (e.g. even a using supremum over nuisance parameters) a test and anything with a definable coverage rate as a confidence intervals. (Why a paper purporting show how to get a confidence intervals from a single observation assuming unknown location and scale got published in a statistics journal.) So a background paper that covers this, providing adequate coverage, clarity and agreement on these finer points might move things along better.
Other the other hand, I must strongly object to the impression that nuisance parameters have been adequately “escaped from” . There has been some limited success on the low hanging fruit but the fruit higher up has so far resisted common solution and seems to require considerable piecemeal analytical work. For instance, a simple 2by2 table of proportions has yet to be fully satisfactorily addressed and as far as I know nothing yet for hierarchical models which are coming into common use. One needs to be careful not to generalize too much here. For instance, although there are some good approximate solutions for the 2by2 table of proportions using conditioning or higher order asymptotic techniques, they only work for the odds ratio. If you are interested in the relative risk you are stuck with dependence on the nuisance parameter. Some way to extend type one error rates to being functions of nuisance parameters seems unavoidable and currently lacking. The Bayesian approach has an automatic solution here, but it suffers (for the same reason) of not being robust to prior specifications (implied) for the nuisance parameters. It is though very convenient.