Today is the birthday of Lucien Le Cam (Nov. 18, 1924-April 25,2000): Please see my updated 2013 post on him.

Today is the birthday of Lucien Le Cam (Nov. 18, 1924-April 25,2000): Please see my updated 2013 post on him.

Categories: Bayesian/frequentist, Statistics
Leave a comment

Memory Lane: 3 years ago. Oxford Jail (also called Oxford Castle) is an entirely fitting place to be on (and around) Halloween! Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba! (It is now a boutique hotel, though many of the rooms are still too jail-like for me.) My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should, I think, be shelved, not that names should matter.

In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory. Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended. But for (most) Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort)

Criticisms then follow readily: the form of one or both:

- Error probabilities do not supply posterior probabilities in hypotheses, interpreted as if they do (and some say we just can’t help it), they lead to inconsistencies
- Methods with good long-run error rates can give rise to counterintuitive inferences in particular cases.
- I have proposed an alternative philosophy that replaces these tenets with different ones:
- the role of probability in inference is to quantify how reliably or severely claims (or discrepancies from claims) have been tested
- the severity goal directs us to the relevant error probabilities, avoiding the oft-repeated statistical fallacies due to tests that are overly sensitive, as well as those insufficiently sensitive to particular errors.
- Control of long run error probabilities, while necessary is not sufficient for good tests or warranted inferences.

**From Gelman’s blog:**

“In one of life’s horrible ironies, I wrote a paper “Why we (usually) don’t have to worry about multiple comparisons” but now I spend lots of time worrying about multiple comparisons”Posted by 14 October 2014, 11:13 am on

Exhibit B: The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time, in press. (Andrew Gelman and Eric Loken) (Shortened version is here.)

The “forking paths” paper, in my reading, ** basically argues that mere hypothetical possibilities about what you would or might have done had the data been different (in order to secure a desired interpretation) suffices to alter the characteristics of the analysis you actually did.** That’s an error statistical argument–maybe even stronger than what some error statisticians would say. What’s really being condemned are overly flexible ways to move from statistical results to substantive claims.

Categories: Error Statistics, Gelman
15 Comments

Congratulations to Faye Flam for finally getting her article published at the Science Times at the *New York Times*, “The odds, continually updated” after months of reworking and editing, interviewing and reinterviewing. I’m grateful too, that one remark from me remained. Seriously I am. A few comments: The Monty Hall example is simple probability not statistics, and finding that fisherman who floated on his boots at best used likelihoods. I might note, too, that critiquing that ultra-silly example about ovulation and voting–a study so bad they actually had to pull it at CNN due to reader complaints[i]–scarcely required more than noticing the researchers didn’t even know the women were ovulating[ii]. Experimental design is an old area of statistics developed by frequentists; on the other hand, these ovulation researchers really believe their theory, so the posterior checks out.

The article says, Bayesian methods can “crosscheck work done with the more traditional or ‘classical’ approach.” Yes, but on traditional frequentist grounds. What many would like to know is how to cross check Bayesian methods—how do I test your beliefs? Anyway, I should stop kvetching and thank Faye and the NYT for doing the article at all[iii]. Here are some excerpts:

Statistics may not sound like the most heroic of pursuits. But if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer.

Categories: Bayesian/frequentist, Statistics
47 Comments

Since the comments to my previous post are getting too long, I’m reblogging it here to make more room. I say that the issue raised by J. Berger and Sellke (1987) and Casella and R. Berger (1987) concerns evaluating the evidence in relation to a given hypothesis (using error probabilities). Given the information that *this* hypothesis H* was randomly selected from an urn with 99% true hypothesis, we wouldn’t say this gives a great deal of evidence for the truth of H*, nor suppose that H* had thereby been well-tested. (H* might concern the existence of a standard model-like Higgs.) I think the issues about “science-wise error rates” and long-run performance in dichotomous, diagnostic screening should be taken up separately, but commentators can continue on this, if they wish (perhaps see this related post). Continue reading

** 0. July 20, 2014: **Some of the comments to this post reveal that using the word “fallacy” in my original title might have encouraged running together the current issue with the fallacy of transposing the conditional. Please see a newly added Section 7.

**1. What you should ask…**

Discussions of P-values in the Higgs discovery invariably recapitulate many of the familiar criticisms of P-values (some “howlers”, some not). When you hear the familiar refrain, “We all know that P-values overstate the evidence against the null hypothesis”, denying the P-value aptly measures evidence, what you should ask is:

“What do you mean by overstating the evidence against a hypothesis?”

July 4, 2014 was the two year anniversary of the Higgs boson discovery. As the world was celebrating the “5 sigma!” announcement, and we were reading about the statistical aspects of this major accomplishment, I was aghast to be emailed a letter, purportedly instigated by Bayesian Dennis Lindley, through Tony O’Hagan (to the ISBA). Lindley, according to this letter, wanted to know:

“Arethe particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?”

Fairly sure it was a joke, I posted it on my “Rejected Posts” blog for a bit until it checked out [1]. (See O’Hagan’s “Digest and Discussion”) Continue reading

* “Wonderful examples, but let’s not close our eyes,” * is David J. Hand’s apt title for his discussion of the recent special issue (Feb 2014) of Statistical Science called “Big Bayes Stories” (edited by Sharon McGrayne, Kerrie Mengersen and Christian Robert.) For your Saturday night/ weekend reading, here are excerpts from Hand, another discussant (Welsh), scattered remarks of mine, along with links to papers and background. I begin with David Hand:

[The papers in this collection] give examples of problems which are well-suited to being tackled using such methods, but one must not lose sight of the merits of having multiple different strategies and tools in one’s inferential armory.(Hand [1])_

…. But I have to ask, is the emphasis on ‘Bayesian’ necessary? That is, do we need further demonstrations aimed at promoting the merits of Bayesian methods? … The examples in this special issue were selected, firstly by the authors, who decided what to write about, and then, secondly, by the editors, in deciding the extent to which the articles conformed to their desiderata of being Bayesian success stories: that they ‘present actual data processing stories where a non-Bayesian solution would have failed or produced sub-optimal results.’ In a way I think this is unfortunate. I am certainly convinced of the power of Bayesian inference for tackling many problems, but the generality and power of the method is not really demonstrated by a collection specifically selected on the grounds that this approach works and others fail. To take just one example, choosing problems which would be difficult to attack using the Neyman-Pearson hypothesis testing strategy would not be a convincing demonstration of a weakness of that approach if those problems lay outside the class that that approach was designed to attack.

Hand goes on to make a philosophical assumption that might well be questioned by Bayesians: Continue reading

Categories: Bayesian/frequentist, Honorary Mention, Statistics
62 Comments

At the start of our seminar, I said that “on weekends this spring (in connection with Phil 6334, but not limited to seminar participants) I will post some of my ‘deconstructions‘ of articles”. I began with Andrew Gelman‘s note “Ethics and the statistical use of prior information”[i], but never posted my deconstruction of it. So since it’s Saturday night, and the seminar is just ending, here it is, along with related links to Stat and ESP research (including me, Jack Good, Persi Diaconis and Pat Suppes). Please share comments especially in relation to current day ESP research. Continue reading

Categories: Background knowledge, Gelman, Phil6334, Statistics
35 Comments

“What ever happened to Bayesian foundations?” was one of the final topics of our seminar (Mayo/SpanosPhil6334). In the past 15 years or so, not only have (some? most?) Bayesians come to accept violations of the Likelihood Principle, they have also tended to disown Dutch Book arguments, and the very idea of inductive inference as updating beliefs by Bayesian conditionalization has evanescencd. In one of Thursday’s readings, by Baccus, Kyburg, and Thalos (1990)[1], it is argued that under certain conditions, it is *never* a rational course of action to change belief by Bayesian conditionalization. Here’s a short snippet for your Saturday night reading (the full paper is http://errorstatistics.files.wordpress.com/2014/05/bacchus_kyburg_thalos-against-conditionalization.pdf): Continue reading

Categories: Bayes' Theorem, Phil 6334 class material, Statistics
Tags: Bayesian updating, Kyburg
29 Comments

We interspersed key issues from the reading for this session (from Howson and Urbach) with portions of my presentation at the Boston Colloquium (Feb, 2014): Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge. (Slides below)*.

*Someone sent us a recording (mp3)of the panel discussion from that Colloquium (there’s a lot on “big data” and its politics) including: Mayo, Xiao-Li Meng (Harvard), Kent Staley (St. Louis), and Mark van der Laan (Berkeley). *

See if this works: | mp3

*There’s a prelude here to our visitor on April 24: Professor Stanley Young from the National Institute of Statistical Sciences.

Categories: Bayesian/frequentist, Error Statistics, Phil6334
43 Comments

A. Spanos Probability/Statistics Lecture Notes 7: An Introduction to Bayesian Inference (4/10/14)

Categories: Bayesian/frequentist, Phil 6334 class material, Statistics
11 Comments

April 3, 2014: We interspersed discussion with slides; these cover the main readings of the day (check syllabus): the Duhem’s Probem and the Bayesian Way, and “Highly probable vs Highly Probed”. syllabus four. Slides are below (followers of this blog will be familiar with most of this, e.g., here). We also did further work on misspecification testing.

Monday, April 7, is an optional outing, “a seminar class trip”

you might say, here at Thebes at which time we will analyze the statistical curves of the mountains, pie charts of pizza, and (seriously) study some experiments on the problem of replication in “the Hamlet Effect in social psychology”. If you’re around please bop in!

Mayo’s slides on Duhem’s Problem and more from April 3 (Day#9):

It was from my Virginia Tech colleague I.J. Good (in statistics), who died five years ago (April 5, 2009), at 93, that I learned most of what I call “howlers” on this blog. His favorites were based on the “paradoxes” of stopping rules. (I had posted this last year here.)

“In conversation I have emphasized to other statisticians, starting in 1950, that, in virtue of the ‘law of the iterated logarithm,’ by optional stopping an arbitrarily high sigmage, and therefore an arbitrarily small tail-area probability, can be attained even when the null hypothesis is true. In other words if a Fisherian is prepared to use optional stopping (which usually he is not) he can be sure of rejecting a true null hypothesis provided that he is prepared to go on sampling for a long time.

The way I usually express this ‘paradox’ is that a Fisherian [but not a Bayesian] can cheat by pretending he has a plane to catch like a gambler who leaves the table when he is ahead” (Good 1983, 135)[*]

Categories: Bayesian/frequentist, Comedy, Statistics
Tags: argument from intentions, frequentist criticisms, Stopping rules
18 Comments

**Day #3: Spanos lecture notes 2, and reading/resources from Feb 6 seminar **

** 6334 Day 3 slides: Spanos-lecture-2 **

___

Crupi & Tentori (2010). Irrelevant Conjunction: Statement and Solution of a New Paradox, *Phil Sci*, 77, 1–13.

Hawthorne & Fitelson (2004). Re-Solving Irrelevant Conjunction with Probabilistic Independence, *Phil Sci* *71*: 505–514.

Skryms (1975) *Choice and Chance* 2nd ed. Chapter V and Carnap (pp. 206-211), Dickerson Pub. Co.

Mayo posts on the tacking paradox: Oct. 25, 2013: “Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*” & Oct 25.

**An update on this issue will appear shortly in a separate blogpost.**

_

**READING FOR NEXT WEEK
**Selection (pp. 35-59) from: Popper (1962).

Categories: Bayes' Theorem, Phil 6334 class material, Statistics
Leave a comment

Andrew Gelman says that as a philosopher, I should appreciate his blog today in which he records his frustration: “Against aggressive definitions: No, I don’t think it helps to describe Bayes as ‘the analysis of subjective beliefs’…” Gelman writes:

I get frustrated with what might be called “aggressive definitions,” where people use a restrictive definition of something they don’t like. For example, Larry Wasserman writes (as reported by Deborah Mayo):

“I wish people were clearer about what Bayes is/is not and what frequentist inference is/is not. Bayes is the analysis of subjective beliefs but provides no frequency guarantees. Frequentist inference is about making procedures that have frequency guarantees but makes no pretense of representing anyone’s beliefs.”

I’ll accept Larry’s definition of frequentist inference. But as for his definition of Bayesian inference: No no no no no. The probabilities we use in our Bayesian inference are not subjective, or, they’re no more subjective than the logistic regressions and normal distributions and Poisson distributions and so forth that fill up all the textbooks on frequentist inference.

To quickly record some of my own frustrations:*: First, I would disagree with Wasserman’s characterization of frequentist inference, but as is clear from Larry’s comments to (my reaction to him), I think he concurs that he was just giving a broad contrast. Please see Note [1] for a remark from my post: Comments on Wasserman’s “what is Bayesian/frequentist inference?” Also relevant is a Gelman post on the Bayesian name: [2].

Second, Gelman’s “no more subjective than…” evokes remarks I’ve made before. For example, in “What should philosophers of science do…” I wrote:

Arguments given for some very popular slogans (mostly by non-philosophers), are too readily taken on faith as canon by others, and are repeated as gospel. Examples are easily found: all models are false, no models are falsifiable, everything is subjective, or equally subjective and objective, and the only properly epistemological use of probability is to supply posterior probabilities for quantifying actual or rational degrees of belief. Then there is the cluster of “howlers” allegedly committed by frequentist error statistical methods repeated verbatim (discussed on this blog).

I’ve written a lot about objectivity on this blog, e.g., here, here and here (and in real life), but what’s the point if people just rehearse the “everything is a mixture…” line, without making deeply important distinctions? I really think that, next to the “all models are false” slogan, the most confusion has been engendered by the “no methods are objective” slogan. However much we may aim at objective constraints, it is often urged, we can never have “clean hands” free of the influence of beliefs and interests, and we invariably sully methods of inquiry by the entry of background beliefs and personal judgments in their specification and interpretation. Continue reading

Categories: Bayesian/frequentist, Error Statistics, Gelman, Objectivity, Statistics
41 Comments

** (see rejected posts)**

Categories: Bayesian/frequentist, Rejected Posts
Leave a comment

**Memory Lane: 2 years ago:**

*M**y efficient Errorstat Blogpeople ^{1} have put forward the following 3 reader-contributed interpretive efforts^{2} as a result of the “deconstruction” exercise from December 11, (mine, from the earlier blog, is at the end) of what I consider:*

“….an especially intriguing remark by Jim Berger that I think bears upon the current mindset (Jim is aware of my efforts):

Too often I see people pretending to be subjectivists, and then using “weakly informative” priors that the objective Bayesian community knows are terrible and will give ridiculous answers; subjectivism is then being used as a shield to hide ignorance. . . . In my own more provocative moments, I claim that the only true subjectivists are the objective Bayesians, because they refuse to use subjectivism as a shield against criticism of sloppy pseudo-Bayesian practice. (Berger 2006, 463)” (From blogpost, Dec. 11, 2011)

_________________________________________________

Andrew Gelman:

The statistics literature is big enough that I assume there really is some bad stuff out there that Berger is reacting to, but I think that when he’s talking about weakly informative priors, Berger is not referring to the work in this area that I like, as I think of weakly informative priors as specifically being designed to give answers that are _not_ “ridiculous.”

Keeping things unridiculous is what regularization’s all about, and one challenge of regularization (as compared to pure subjective priors) is that the answer to the question, What is a good regularizing prior?, will depend on the likelihood. There’s a lot of interesting theory and practice relating to weakly informative priors for regularization, a lot out there that goes beyond the idea of noninformativity.

To put it another way: We all know that there’s no such thing as a purely noninformative prior: any model conveys some information. But, more and more, I’m coming across applied problems where I wouldn’t want to be noninformative even if I could, problems where some weak prior information regularizes my inferences and keeps them sane and under control. Continue reading

Categories: Gelman, Irony and Bad Faith, J. Berger, Statistics, U-Phil
Tags: Deconstruction, irony, Jim Berger, subjective/objective Bayesian
3 Comments

I attended a lecture by Aris Spanos to his graduate econometrics class here at Va Tech last week[i]. This course, which Spanos teaches every fall, gives a superb illumination of the disparate pieces involved in statistical inference and modeling, and affords clear foundations for how they are linked together. His slides follow the intro section. Some examples with severity assessments are also included.

**Frequentist Hypothesis Testing: A Coherent Approach**

Aris Spanos

1 Inherent difficulties in learning statistical testing

Statistical testing is arguably the most important, but also the most difficult and confusing chapter of statistical inference for several reasons, including the following.

(i) The need to introduce numerous new notions, concepts and procedures before one can paint — even in broad brushes — a coherent picture of hypothesis testing.

(ii) The current textbook discussion of statistical testing is both highly confusing and confused. There are several sources of confusion.

- (a) Testing is conceptually one of the most sophisticated sub-fields of any scientific discipline.
- (b) Inadequate knowledge by textbook writers who often do not have the technical skills to read and understand the original sources, and have to rely on second hand accounts of previous textbook writers that are often misleading or just outright erroneous. In most of these textbooks hypothesis testing is poorly explained as an idiot’s guide to combining off-the-shelf formulae with statistical tables like the Normal, the Student’s t, the chi-square, etc., where the underlying statistical model that gives rise to the testing procedure is hidden in the background.
- (c) The misleading portrayal of Neyman-Pearson testing as essentially decision-theoretic in nature, when in fact the latter has much greater affinity with the Bayesian rather than the frequentist inference.
- (d) A deliberate attempt to distort and cannibalize frequentist testing by certain Bayesian drumbeaters who revel in (unfairly) maligning frequentist inference in their attempts to motivate their preferred view on statistical inference.

(iii) The discussion of frequentist testing is rather incomplete in so far as it has been beleaguered by serious foundational problems since the 1930s. As a result, different applied fields have generated their own secondary literatures attempting to address these problems, but often making things much worse! Indeed, in some fields like psychology it has reached the stage where one has to correct the ‘corrections’ of those chastising the initial correctors!

In an attempt to alleviate problem (i), the discussion that follows uses a sketchy historical development of frequentist testing. To ameliorate problem (ii), the discussion includes ‘red flag’ pointers (¥) designed to highlight important points that shed light on certain erroneous in- terpretations or misleading arguments. The discussion will pay special attention to (iii), addressing some of the key foundational problems.

[i] It is based on Ch. 14 of Spanos (1999) *Probability Theory and Statistical Inference. *Cambridge[ii].

[ii] You can win a free copy of this 700+ page text by creating a simple palindrome! http://errorstatistics.com/palindrome/march-contest/

Categories: Bayesian/frequentist, Error Statistics, Severity, significance tests, Statistics
Tags: Spanos
36 Comments

A comment today by Stephen Senn leads me to post the last few sentences of my (2010) paper with David Cox, “Frequentist Statistics as a Theory of Inductive Inference”:

A fundamental tenet of the conception of inductive learning most at home with the frequentist philosophy is that inductive inference requires building up incisive arguments and inferences by putting together several different piece-meal results; we have set out considerations to guide these pieces[i]. Although the complexity of the issues makes it more difficult to set out neatly, as, for example, one could by imagining that a single algorithm encompasses the whole of inductive inference, the payoff is an account that approaches the kind of arguments that scientists build up in order to obtain reliable knowledge and understanding of a field.” (273)[ii]

A reread for Saturday night?

[i]The pieces hang together by dint of the rationale growing out of a severity criterion (or something akin but using a different term.)

*[ii]Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science* (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 1-27. This paper appeared in *The Second Erich L. Lehmann Symposium: Optimality*, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics, pp. 247-275.

Categories: Bayesian/frequentist, Error Statistics
20 Comments