Monthly Archives: December 2013

Midnight With Birnbaum (Happy New Year)

 Just as in the past 2 years since I’ve been blogging, I revisit that spot in the road, get into a strange-looking taxi, and head to “Midnight With Birnbaum”. There are a couple of brief (12/31/13) updates at the end.  

You know how in that (not-so) recent movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf?  He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve 2011 2012, 2013) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i]

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics.  I happen to be writing on your famous argument about the likelihood principle (LP).  (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept.

ERROR STATISTICIAN: Yes, but I actually don’t think your argument shows that the LP follows from such frequentist concepts as sufficiency S and the weak conditionality principle WLP.[ii]  Sorry,…I know it’s famous…

BIRNBAUM:  Well, I shall happily invite you to take any case that violates the LP and allow me to demonstrate that the frequentist is led to inconsistency, provided she also wishes to adhere to the WLP and sufficiency (although less than S is needed).

ERROR STATISTICIAN: Well I happen to be a frequentist (error statistical) philosopher; I have recently (2006) found a hole in your proof,…well I hope we can discuss it.

BIRNBAUM: Well, well, well: I’ll bet you a bottle of Elba Grease champagne that I can demonstrate it!

ERROR STATISTICAL PHILOSOPHER:  It is a great drink, I must admit that: I love lemons.

BIRNBAUM: OK.  (A waiter brings a bottle, they each pour a glass and resume talking).  Whoever wins this little argument pays for this whole bottle of vintage Ebar or Elbow or whatever it is Grease.

ERROR STATISTICAL PHILOSOPHER:  I really don’t mind paying for the bottle.

BIRNBAUM: Good, you will have to. Take any LP violation. Let  x’ be 2-standard deviation difference from the null (asserting m = 0) in testing a normal mean from the fixed sample size experiment E’, say n = 100; and let x” be a 2-standard deviation difference from an optional stopping experiment E”, which happens to stop at 100.  Do you agree that:

(0) For a frequentist, outcome x’ from E’ (fixed sample size) is NOT evidentially equivalent to x” from E” (optional stopping that stops at n)

ERROR STATISTICAL PHILOSOPHER: Yes, that’s a clear case where we reject the strong LP, and it makes perfect sense to distinguish their corresponding p-values (which we can write as p’ and p”, respectively).  The searching in the optional stopping experiment makes the p-value quite a bit higher than with the fixed sample size.  For n = 100, data x’ yields p’= ~.05; while p”  is ~.3.  Clearly, p’ is not equal to p”, I don’t see how you can make them equal. Continue reading

Categories: Birnbaum Brakes, strong likelihood principle | Tags: , , , | 2 Comments

Wasserman on Wasserman: Update! December 28, 2013

Professor Larry Wasserman

Professor Larry Wasserman

I had invited Larry to give an update, and I’m delighted that he has! The discussion relates to the last post (by Spanos), which follows upon my deconstruction of Wasserman*. So, for your Saturday night reading pleasure, join me** in reviewing this and the past two blogs and the links within.

“Wasserman on Wasserman: Update! December 28, 2013”

My opinions have shifted a bit.

My reference to Franken’s joke suggested that the usual philosophical 
debates about the foundations of statistics were un-important, much 
like the debate about media bias. I was wrong on both counts.

First, I now think Franken was wrong. CNN and network news have a 
strong liberal bias, especially on economic issues. FOX has an 
obvious right wing, and anti-atheist bias. (At least FOX has some 
libertarians on the payroll.) And this does matter. Because people
 believe what they see on TV and what they read in the NY times. Paul
 Krugman’s socialist bullshit parading as economics has brainwashed 
millions of Americans. So media bias is much more than who makes 
better hummus.

Similarly, the Bayes-Frequentist debate still matters. And people —
including many statisticians — are still confused about the 
distinction. I thought the basic Bayes-Frequentist debate was behind 
us. A year and a half of blogging (as well as reading other blogs) 
convinced me I was wrong here too. And this still does matter. Continue reading

Categories: Error Statistics, frequentist/Bayesian, Statistics, Wasserman | 56 Comments

More on deconstructing Larry Wasserman (Aris Spanos)

This follows up on yesterday’s deconstruction:

 Aris Spanos (2012)[i] – Comments on: L. Wasserman “Low Assumptions, High Dimensions (2011)*

I’m happy to play devil’s advocate in commenting on Larry’s very interesting and provocative (in a good way) paper on ‘how recent developments in statistical modeling and inference have [a] changed the intended scope of data analysis, and [b] raised new foundational issues that rendered the ‘older’ foundational problems more or less irrelevant’.

The new intended scope, ‘low assumptions, high dimensions’, is delimited by three characteristics:

“1. The number of parameters is larger than the number of data points.

2. Data can be numbers, images, text, video, manifolds, geometric objects, etc.

3. The model is always wrong. We use models, and they lead to useful insights but the parameters in the model are not meaningful.” (p. 1)

In the discussion that follows I focus almost exclusively on the ‘low assumptions’ component of the new paradigm. The discussion by David F. Hendry (2011), “Empirical Economic Model Discovery and Theory Evaluation,” RMM, 2: 115-145,  is particularly relevant to some of the issues raised by the ‘high dimensions’ component in a way that complements the discussion that follows.

My immediate reaction to the demarcation based on 1-3 is that the new intended scope, although interesting in itself, excludes the overwhelming majority of scientific fields where restriction 3 seems unduly limiting. In my own field of economics the substantive information comes primarily in the form of substantively specified mechanisms (structural models), accompanied with theory-restricted and substantively meaningful parameters.

In addition, I consider the assertion “the model is always wrong” an unhelpful truism when ‘wrong’ is used in the sense that “the model is not an exact picture of the ‘reality’ it aims to capture”. Worse, if ‘wrong’ refers to ‘the data in question could not have been generated by the assumed model’, then any inference based on such a model will be dubious at best! Continue reading

Categories: Philosophy of Statistics, Spanos, Statistics, U-Phil, Wasserman | Tags: , , , , | 5 Comments

Deconstructing Larry Wasserman

 Greek dancing lady gold SavoyLarry Wasserman (“Normal Deviate”) has announced he will stop blogging (for now at least). That means we’re losing one of the wisest blog-voices on issues relevant to statistical foundations (among many other areas in statistics). Whether this lures him back or reaffirms his decision to stay away, I thought I’d reblog my (2012) “deconstruction” of him (in relation to a paper linked below)[i]

Deconstructing Larry Wasserman [i] by D. Mayo

The temptation is strong, but I shall refrain from using the whole post to deconstruct Al Franken’s 2003 quip about media bias (from Lies and Lying Liars Who Tell Them: A Fair and Balanced Look at the Right), with which Larry Wasserman begins his paper “Low Assumptions, High Dimensions” (2011) in his contribution to Rationality, Markets and Morals (RMM) Special Topic: Statistical Science and Philosophy of Science:

Wasserman: There is a joke about media bias from the comedian Al Franken:
‘To make the argument that the media has a left- or right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?’

According to Wasserman, “a similar comment could be applied to the usual debates in the foundations of statistical inference.”

Although it’s not altogether clear what Wasserman means by his analogy with comedian (now senator) Franken, it’s clear enough what Franken meant if we follow up the quip with the next sentence in his text (which Wasserman omits): “The problem with al Qaeda is that they’re trying to kill us!” (p. 1). The rest of Franken’s opening chapter is not about al Qaeda but about bias in media. Conservatives, he says, decry what they claim is a liberal bias in mainstream media. Franken rejects their claim.

The mainstream media does not have a liberal bias. And for all their other biases . . . , the mainstream media . . . at least try to be fair. …There is, however, a right-wing media. . . . They are biased. And they have an agenda…The members of the right-wing media are not interested in conveying the truth… . They are an indispensable component of the right-wing machine that has taken over our country… .   We have to be vigilant.  And we have to be more than vigilant.  We have to fight back… . Let’s call them what they are: liars. Lying, lying, liars. (Franken, pp. 3-4)

When I read this in 2004 (when Bush was in office), I couldn’t have agreed more. How things change*. Now, of course, any argument that swerves from the politically correct is by definition unsound, irrelevant, and/ or biased. [ii](December 2016 update: This just shows how things get topsy-turvy every 5-8 years. Now we have extremes on both sides.)

But what does this have to do with Bayesian-frequentist foundations? What is Wasserman, deep down, really trying to tell us by way of this analogy (if only subliminally)? Such are my ponderings—and thus this deconstruction.  (I will invite your “U-Phils” at the end[a].) I will allude to passages from my contribution to  RMM (2011)  (in red).

A.What Is the Foundational Issue?

Wasserman: To me, the most pressing foundational question is: how do we reconcile the two most powerful needs in modern statistics: the need to make methods assumption free and the need to make methods work in high dimensions… . The Bayes-Frequentist debate is not irrelevant but it is not as central as it once was. (p. 201)

One may wonder why he calls this a foundational issue, as opposed to, say, a technical one. I will assume he means what he says and attempt to extract his meaning by looking through a foundational lens. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , | 10 Comments

Mascots of Bayesneon statistics (rejected post)

bayes_theorem (see rejected posts)

Categories: Bayesian/frequentist, Rejected Posts | Leave a comment

“Bad Arguments” (a book by Ali Almossawi)

I received a new book today as a present[i]: “(An illustrated book of) Bad Arguments” (Ali Almossawi 2013) [ii]. I wish I’d had it for the critical thinking class I just completed! Here’s the illustration it gives for “hasty generalization”.

hasty 001

The author allows it to be accessed here, I just discovered.

But it’s not just a clever book of cartoons: it does a better job than most texts in its conception of bad inductive arguments. Recall my post, “A critical Look at Critical Thinking”–prior to the start of my class–in which I explained why critical thinking is actually a sophisticated affair that philosophers have never fully sorted out. (We may teach it before “baby (symbolic) logic”, but it’s really very grown-up.) I gave my recommendation there as to where probability ought to enter in understanding bad (inductive) arguments, and Almossawi’s conception is in sync with mine[iii]. The inductive qualification is on the mode of inferring, rather than on the conclusion (or inferential claim H) itself*. The difference might seem subtle, but I swear it’s at the heart of many contemporary controversies about statistical inference, and the most serious among them.

[i] From Aris Spanos—thanks Aris.

[ii] Ali Almossawi, whom I never heard of before, has masters degrees in engineering/CS from MIT and CMU, and is a data visualization designer. The illustrator is: Alejandro Giraldo
[iii] I haven’t read all of it, but I doubt I’ll find any howlers.

*About the mode of inferring: What’s its capability to have avoided (alerted us to) the ways it would be wrong to infer H (from the data).


Categories: critical thinking, Statistics | 1 Comment

U-Phil: Deconstructions [of J. Berger]: Irony & Bad Faith 3

Memory Lane: 2 years ago:
My efficient Errorstat Blogpeople1 have put forward the following 3 reader-contributed interpretive efforts2 as a result of the “deconstruction” exercise from December 11, (mine, from the earlier blog, is at the end) of what I consider:

“….an especially intriguing remark by Jim Berger that I think bears upon the current mindset (Jim is aware of my efforts):

Too often I see people pretending to be subjectivists, and then using “weakly informative” priors that the objective Bayesian community knows are terrible and will give ridiculous answers; subjectivism is then being used as a shield to hide ignorance. . . . In my own more provocative moments, I claim that the only true subjectivists are the objective Bayesians, because they refuse to use subjectivism as a shield against criticism of sloppy pseudo-Bayesian practice. (Berger 2006, 463)” (From blogpost, Dec. 11, 2011)
Andrew Gelman:

The statistics literature is big enough that I assume there really is some bad stuff out there that Berger is reacting to, but I think that when he’s talking about weakly informative priors, Berger is not referring to the work in this area that I like, as I think of weakly informative priors as specifically being designed to give answers that are _not_ “ridiculous.”

Keeping things unridiculous is what regularization’s all about, and one challenge of regularization (as compared to pure subjective priors) is that the answer to the question, What is a good regularizing prior?, will depend on the likelihood.  There’s a lot of interesting theory and practice relating to weakly informative priors for regularization, a lot out there that goes beyond the idea of noninformativity.

To put it another way:  We all know that there’s no such thing as a purely noninformative prior:  any model conveys some information.  But, more and more, I’m coming across applied problems where I wouldn’t want to be noninformative even if I could, problems where some weak prior information regularizes my inferences and keeps them sane and under control. Continue reading

Categories: Gelman, Irony and Bad Faith, J. Berger, Statistics, U-Phil | Tags: , , , | 3 Comments

A. Spanos lecture on “Frequentist Hypothesis Testing”


Aris Spanos

I attended a lecture by Aris Spanos to his graduate econometrics class here at Va Tech last week[i]. This course, which Spanos teaches every fall, gives a superb illumination of the disparate pieces involved in statistical inference and modeling, and affords clear foundations for how they are linked together. His slides follow the intro section. Some examples with severity assessments are also included.

Frequentist Hypothesis Testing: A Coherent Approach

Aris Spanos

1    Inherent difficulties in learning statistical testing

Statistical testing is arguably  the  most  important, but  also the  most difficult  and  confusing chapter of statistical inference  for several  reasons, including  the following.

(i) The need to introduce numerous new notions, concepts and procedures before one can paint —  even in broad brushes —  a coherent picture  of hypothesis  testing.

(ii) The current textbook discussion of statistical testing is both highly confusing and confused.  There  are several sources of confusion.

  • (a) Testing is conceptually one of the most sophisticated sub-fields of any scientific discipline.
  • (b) Inadequate knowledge by textbook writers who often do not have  the  technical  skills to read  and  understand the  original  sources, and  have to rely on second hand  accounts  of previous  textbook writers that are  often  misleading  or just  outright erroneous.   In most  of these  textbooks hypothesis  testing  is poorly  explained  as  an  idiot’s guide to combining off-the-shelf formulae with statistical tables like the Normal, the Student’s t, the chi-square,  etc., where the underlying  statistical  model that gives rise to the testing procedure  is hidden  in the background.
  • (c)  The  misleading  portrayal of Neyman-Pearson testing  as essentially  decision-theoretic in nature, when in fact the latter has much greater  affinity with the Bayesian rather than the frequentist inference.
  • (d)  A deliberate attempt to distort and  cannibalize  frequentist testing by certain  Bayesian drumbeaters who revel in (unfairly)  maligning frequentist inference in their  attempts to motivate their  preferred view on statistical inference.

(iii) The  discussion of frequentist testing  is rather incomplete  in so far as it has been beleaguered by serious foundational problems since the 1930s. As a result, different applied fields have generated their own secondary  literatures attempting to address  these  problems,  but  often making  things  much  worse!  Indeed,  in some fields like psychology  it has reached the stage where one has to correct the ‘corrections’ of those chastising  the initial  correctors!

In an attempt to alleviate  problem  (i),  the discussion  that follows uses a sketchy historical  development of frequentist testing.  To ameliorate problem (ii), the discussion includes ‘red flag’ pointers (¥) designed to highlight important points that shed light on certain  erroneous  in- terpretations or misleading arguments.  The discussion will pay special attention to (iii), addressing  some of the key foundational problems.

[i] It is based on Ch. 14 of Spanos (1999) Probability Theory and Statistical Inference. Cambridge[ii].

[ii] You can win a free copy of this 700+ page text by creating a simple palindrome!

Categories: Bayesian/frequentist, Error Statistics, Severity, significance tests, Statistics | Tags: | 36 Comments

Surprising Facts about Surprising Facts

Mayo mirror


A paper of mine on “double-counting” and novel evidence just came out: “Some surprising facts about (the problem of) surprising facts” in Studies in History and Philosophy of Science (2013),

ABSTRACT: A common intuition about evidence is that if data x have been used to construct a hypothesis H, then x should not be used again in support of H. It is no surprise that x fits H, if H was deliberately constructed to accord with x. The question of when and why we should avoid such ‘‘double-counting’’ continues to be debated in philosophy and statistics. It arises as a prohibition against data mining, hunting for significance, tuning on the signal, and ad hoc hypotheses, and as a preference for predesignated hypotheses and ‘‘surprising’’ predictions. I have argued that it is the severity or probativeness of the test—or lack of it—that should determine whether a double-use of data is admissible. I examine a number of surprising ambiguities and unexpected facts that continue to bedevil this debate.

Categories: double-counting, Error Statistics, philosophy of science, Statistics | 36 Comments

The error statistician has a complex, messy, subtle, ingenious, piece-meal approach

RMM: "A Conversation Between Sir David Cox & D.G. Mayo"A comment today by Stephen Senn leads me to post the last few sentences of my (2010) paper with David Cox, “Frequentist Statistics as a Theory of Inductive Inference”:

A fundamental tenet of the conception of inductive learning most at home with the frequentist philosophy is that inductive inference requires building up incisive arguments and inferences by putting together several different piece-meal results; we have set out considerations to guide these pieces[i]. Although the complexity of the issues makes it more difficult to set out neatly, as, for example, one could by imagining that a single algorithm encompasses the whole of inductive inference, the payoff is an account that approaches the kind of arguments that scientists build up in order to obtain reliable knowledge and understanding of a field.” (273)[ii]

A reread for Saturday night?

[i]The pieces hang together by dint of the rationale growing out of a severity criterion (or something akin but using a different term.)

[ii]Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 1-27. This paper appeared in The Second Erich L. Lehmann Symposium: Optimality, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics, pp. 247-275.

Categories: Bayesian/frequentist, Error Statistics | 20 Comments

Blog Contents for Oct and Nov 2013*


2 tough months in exile

October 2013

  • (10/3) Will the Real Junk Science Please Stand Up? (critical thinking)
  • (10/5) Was Janina Hosiasson pulling Harold Jeffreys’ leg?
  • (10/9) Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock
  • (10/12) Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”
  • (10/19) Blog Contents: September 2013
  • (10/19) Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*
  • (10/25) Bayesian confirmation theory: example from last post…
  • (10/26) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)

November 2013

  • (11/02) Oxford Gaol: Statistical Bogeymen
  • (11/04) Forthcoming paper on the strong likelihood principle
  • (11/09) Null Effects and Replication
  • (11/09) Beware of questionable front page articles warning you to beware of questionable front page articles (iii)
  • (11/13) T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)
  • (11/16) PhilStock: No-pain bull
  • (11/16) S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)
  • (11/18) Lucien Le Cam: “The Bayesians hold the Magic”
  • (11/20) Erich Lehmann: Statistician and Poet
  • (11/23) Probability that it is a statistical fluke [i]
  • (11/27) “The probability that it be a statistical fluke” [iia]
  • (11/30) Saturday night comedy from a Bayesian diary (rejected post, see link)

*compiled by N. Jinn & J. Miller

Categories: blog contents, Statistics | Leave a comment

Why ecologists might want to read more philosophy of science

Jeremy Fox often publishes interesting blogposts like today’s. I’m “reblogging” straight from his site as an experiment.

Categories: Error Statistics | 12 Comments

FDA’S New Pharmacovigilance

FDA’s New Generic Drug Labeling Rule

The FDA is proposing an about-face on a controversial issue: to allow (or require? [1]) generic drug companies to alter the label on drugs, whereas they are currently  required to keep the identical label as used by the brand-name company (See earlier post here and here.) While it clearly makes sense to alert the public to newly found side-effects, this change, if adopted, will open generic companies to lawsuits to which they’d been immune (as determined by a 2011 Supreme Court decision).  Whether or not the rule passes, the FDA is ready with a training session for you!  The following is from the notice I received by e-mail: Continue reading

Categories: Announcement, PhilStatLaw, science communication | 4 Comments

Stephen Senn: Dawid’s Selection Paradox (guest post)

Stephen SennStephen Senn
Head, Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS),

“Dawid’s Selection Paradox”

You can protest, of course, that Dawid’s Selection Paradox is no such thing but then those who believe in the inexorable triumph of logic will deny that anything is a paradox. In a challenging paper published nearly 20 years ago (Dawid 1994), Philip Dawid drew attention to a ‘paradox’ of Bayesian inference. To describe it, I can do no better than to cite the abstract of the paper, which is available from Project Euclid, here:

 When the inference to be made is selected after looking at the data, the classical statistical approach demands — as seems intuitively sensible — that allowance be made for the bias thus introduced. From a Bayesian viewpoint, however, no such adjustment is required, even when the Bayesian inference closely mimics the unadjusted classical one. In this paper we examine more closely this seeming inadequacy of the Bayesian approach. In particular, it is argued that conjugate priors for multivariate problems typically embody an unreasonable determinism property, at variance with the above intuition.

I consider this to be an important paper not only for Bayesians but also for frequentists, yet it has only been cited 14 times as of 15 November 2013 according to Google Scholar. In fact I wrote a paper about it in the American Statistician a few years back (Senn 2008) and have also referred to it in a previous blogpost (12 May 2012). That I think it is important and neglected is excuse enough to write about it again.

Philip Dawid is not responsible for my interpretation of his paradox but the way that I understand it can be explained by considering what it means to have a prior distribution. First, as a reminder, if you are going to be 100% Bayesian, which is to say that all of what you will do by way of inference will be to turn a prior into a posterior distribution using the likelihood and the operation of Bayes theorem, then your prior distribution has to satisfy two conditions. First, it must be what you would use to bet now (that is to say at the moment it is established) and second no amount of subsequent data will change your prior qua prior. It will, of course, be updated by Bayes theorem to form a posterior distribution once further data are obtained but that is another matter. The relevant time here is your observation time not the time when the data were collected, so that data that were available in principle but only came to your attention after you established your prior distribution count as further data.

Now suppose that you are going to make an inference about a population mean, θ, using a random sample from the population and choose the standard conjugate prior distribution. Then in that case you will use a Normal distribution with known (to you) parameters μ and σ2. If σ2 is large compared to the random variation you might expect for the means in your sample, then the prior distribution is fairly uninformative and if it is small then fairly informative but being uninformative is not in itself a virtue. Being not informative enough runs the risk that your prior distribution is not one you might wish to use to bet now and being too informative that your prior distribution is one you might be tempted to change given further information. In either of these two cases your prior distribution will be wrong. Thus the task is to be neither too informative nor not informative enough. Continue reading

Categories: Bayesian/frequentist, selection effects, Statistics, Stephen Senn | 68 Comments

Blog at