## Comedy Hour at the Bayesian (Epistemology) Retreat: Highly Probable vs Highly Probed

Bayesian philosophers (among others) have analogous versions of the criticism in my April 28 blogpost: error probabilities (associated with inferences to hypotheses) may conflict with chosen posterior probabilities in hypotheses. Since it’s Saturday night let’s listen in to one of the comedy hours at the Bayesian retreat (note the sedate philosopher’s comedy club backdrop):

*D**id you hear the one about the frequentist error statistical tester who inferred a hypothesis H passed a stringent test (with data x)?*

*The problem was, the epistemic probability in H was so low that H couldn’t be believed! Instead we believe its denial H’! So, she will infer hypotheses that are simply unbelievable!
*

So clearly the error statistical testing account fails to serve in an account of knowledge or inference (i.e., an epistemic account). However severely I might wish to say that a hypothesis *H* has passed a test, the Bayesian critic assigns a sufficiently low prior probability to *H* so as to yield a low posterior probability in *H*[i]*. * But this is no argument about why this counts in favor of, rather than against, their Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis *H*.

To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.” This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true. This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H. Continue reading

## That Promissory Note From Lehmann’s Letter; Schmidt to Speak

Monday, April 16, is Jerzy Neyman’s birthday, but this post is not about Neyman (that comes later, I hope). But in thinking of Neyman, I’m reminded of Erich Lehmann, Neyman’s first student, and a promissory note I gave in a post on September 15, 2011. I wrote:

“One day (in 1997), I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that). …. I remember it contained two especially noteworthy pieces of information, one intriguing, the other quite surprising. The intriguing one (

I’ll come back to the surprising one another time, if reminded) was this: He told me he was sitting in a very large room at an ASA meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, dark table sat just one book, all alone, shiny red. He said he wondered if it might be of interest to him! So he walked up to it…. It turned out to be myError and the Growth of Experimental Knowledge(1996, Chicago), which he reviewed soon after.”

But what about the “surprising one” that I was to come back to “if reminded”? (yes, one person did remind me last month). The surprising one is that Lehmann’s letter—this is his first letter to me– asked me to please read a paper by Frank Schmidt to appear in his wife Juliet Shaffer’s new (at the time) journal, *Psychological Methods*, as he wondered if I had any ideas as to what may be done to answer such criticisms of frequentist tests! But, clearly, few people could have been in a better position than Lehmann to “do something about” these arguments …hence my surprise. But I think he was reluctant…. Continue reading

## Fallacy of Rejection and the Fallacy of Nouvelle Cuisine

In February, in London, criminologist Katrin H. and I went to see Jackie Mason do his shtick, a one-man show billed as his swan song to England. It was like a repertoire of his “Greatest Hits” without a new or updated joke in the mix. Still, hearing his rants for the nth time was often quite hilarious.

A sample: If you want to eat nothing, eat nouvelle cuisine. Do you know what it means? No food. The smaller the portion the more impressed people are, so long as the food’s got a fancy French name, haute cuisine. An empty plate with sauce!

As one critic wrote, Mason’s jokes “offer a window to a different era,” one whose caricatures and biases one can only hope we’ve moved beyond: http://www.guardian.co.uk/stage/2012/feb/21/jackie-mason-live-review

But it’s one thing for Jackie Mason to scowl at a seat in the front row and yell to the shocked audience member in his imagination, “These are jokes! They are just jokes!” and another to reprise statistical howlers, which are not jokes, to me. This blog found its reason for being partly as a place to expose, understand, and avoid them. Recall the September 26, 2011 post “Whipping Boys and Witch Hunters”: https://errorstatistics.com/2011/09/26/whipping-boys-and-witch-hunters-comments-are-now-open/: [i]

Fortunately, philosophers of statistics would surely not reprise decades-old howlers and fallacies. After all, it is the philosopher’s job to clarify and expose the conceptual and logical foibles of others; and even if we do not agree, we would never merely disregard and fail to address the criticisms in published work by other philosophers. Oh wait, ….one of the leading texts repeats the fallacy in their third edition: Continue reading

## Philosophy of Statistics: Retraction Watch, Vol. 1, No. 1

This morning I received a paper I have been asked to review (anonymously as is typical). It is to head up a forthcoming issue of a new journal called *Philosophy of Statistics: Retraction Watch*. This is the first I’ve heard of the journal, and I plan to recommend they publish the piece, conditional on revisions. I thought I would post the abstract here. It’s that interesting.

“Some Slightly More Realistic Self-Criticism in Recent Work in Philosophy of Statistics,”Philosophy of Statistics: Retraction Watch, Vol. 1, No. 1 (2012), pp. 1-19.In this paper we delineate some serious blunders that we and others have made in published work on frequentist statistical methods. First, although we have claimed repeatedly that a core thesis of the frequentist testing approach is that a hypothesis may be rejected with increasing confidence as the power of the test increases, we now see that this is completely backwards, and we regret that we have never addressed, or even fully read, the corrections found in Deborah Mayo’s work since at least 1983, and likely even before that.

Second, we have been wrong to claim that Neyman-Pearson (N-P) confidence intervals are inconsistent because in special cases it is possible for a specific 95% confidence interval to be known to be correct. Not only are the examples required to show this absurdly artificial, but the frequentist could simply interpret this “vacuous interval” “as a statement that all parameter values are consistent with the data at a particular level,” which, as Cox and Hinkley note, is an informative statement about the limitations in the data (Cox and Hinkley 1974, 226). Continue reading

## Oxford Gaol: Statistical Bogeymen

Oxford Jail is an entirely fitting place to be on Halloween!

Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba! My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should I think be shelved, not that names should matter.

In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory. Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended. But for Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort)

Criticisms then follow readily: the form of one or both:

- Error probabilities do not supply posterior probabilities in hypotheses, interpreted as if they do (and some say we just can’t help it), they lead to inconsistencies
- Methods with good long-run error rates can give rise to counterintuitive inferences in particular cases.
- I have proposed an alternative philosophy that replaces these tenets with different ones:
- the role of probability in inference is to quantify how reliably or severely claims (or discrepancies from claims) have been tested
- the severity goal directs us to the relevant error probabilities, avoiding the oft-repeated statistical fallacies due to tests that are overly sensitive, as well as those insufficiently sensitive to particular errors.
- Control of long run error probabilities, while necessary is not sufficient for good tests or warranted inferences.

## LUCKY 13 (Criticisms)

Given some slight recuperation delays, interested readers might wish to poke around the multiple layers of goodies on the left hand side of this web page, wherein all manner of foundational/statistical controversies are considered. In a recent attempt by Aris Spanos and I to address the age-old criticisms from the perspective of the “error statistical philosophy,” we delineate 13 criticisms. Here they are:

Ø (#1) error statistical tools forbid using any background knowledge.

Ø (#2) All statistically signiﬁcant results are treated the same.

Ø (#3) The p-value does not tell us how large a discrepancy is found.

Ø (#4) With large enough sample size even a trivially small discrepancy from the null can be detected.

Ø (#5) Whether there is a statistically signiﬁcant diﬀerence from the null depends on which is the null and which is the alternative.

Ø (#6) Statistically insigniﬁcant results are taken as evidence that the null hypothesis is true.

Ø (#7) Error probabilities are invariably misinterpreted as posterior probabilities.

Ø (#8) Error statistical tests are justiﬁed only in cases where there is a very long (if not inﬁnite) series of repetitions of the same experiment.

Ø (#9) Specifying statistical tests is too arbitrary.

Ø (#10) We should be doing conﬁdence interval estimation rather than signiﬁcance tests.

Ø (#11) Error statistical methods take into account the intentions of the scientists analyzing the data.

Ø (#12) All models are false anyway.

Ø (#13) Testing assumptions involves illicit data-mining.

HAVE WE LEFT ANY OUT?

Mayo & Spanos “Error Statistics” 2011

…more soon.

## In Exile, Clinging to Old Ideas?

To take up the first criticism, we can consider J. Kadane’s new book, Principles of Uncertainty (2011, CRC Press*). Kadane, to his credit, does not beat around the bush as regards his subjective Bayesian perspective; his is a leading Bayesian voice in the tradition of Savage. He takes up central criticisms of frequentist methods in Chapter 12 called “Exploration of Old Ideas”. So now I am not only in foundational exile, I am clinging to ideas that are in need of Juvederm! Continue reading

## Overheard at the comedy hour at the Bayesian retreat:

“**Did you hear the one about the frequentist . . .**

- “who claimed that observing “heads” on a biased coin that lands heads with probability .05 is evidence of a statistically significant improvement over the standard treatment of diabetes, on the grounds that such an event occurs with low probability (.05)?”

or

- “who defended the reliability of his radiation reading, despite using a broken radiometer, on the grounds that most of the time he uses one that works, so on average he’s pretty reliable?”

Such jests may work for an after-dinner laugh, but if it turns out that, despite being retreads of “straw-men” fallacies, they form the basis of why some reject frequentist methods, then they are not such a laughing matter. But surely the drubbing of frequentist methods could not be based on a collection of howlers, could it? I invite the curious reader to stay and find out. Continue reading