Bayesian/frequentist

Getting Credit (or blame) for Something You Didn’t Do (BP oil spill, comedy hour)

UnknownThree years ago, many of us were glued to the “spill cam” showing, in real time, the gushing oil from the April 20, 2010 explosion sinking the Deepwater Horizon oil rig in the Gulf of Mexico, killing 11, and spewing oil until July 15. Trials have been taking place this month, as people try to meet the 3 year deadline to sue BP and others. But what happened to the 200 million gallons of oil?  (Is anyone up to date on this?)  Has it vanished or just sunk to the bottom of the sea by dispersants which may have caused hidden destruction of sea life? I don’t know, but given it’s Saturday night around the 3 year anniversary, let’s listen into a reblog of a spill-related variation on the second of two original “overheard at the comedy hour” jokes. 

In effect, it accuses the frequentist error-statistical account of licensing the following (make-believe) argument after the 2010 oil spill:

Oil Exec: We had highly reliable evidence that H: the pressure was at normal levels on April 20, 2010!

Senator: But you conceded that whenever your measuring tool showed dangerous or ambiguous readings, you continually lowered the pressure, and that the stringent “cement bond log” test was entirely skipped.

 Oil Exec:  Granted, we omitted reliable checks on April 20, 2010, but usually we do a better job—I am reporting the average!  You see, we use a randomizer that most of the time directs us to run the gold-standard check on pressure. But, but April  20 just happened to be one of those times we did the nonstringent test; but on average we do ok.

Senator:  But you don’t know that your system would have passed the more stringent test you didn’t perform!

Oil Exec:  That’s the beauty of the the frequentist test!

Even if we grant (for the sake of the joke) that overall, this “test” rarely errs in the report it outputs (pass or fail),  that is irrelevant to appraising the inference from the data on April 20, 2010 (which would have differed had the more stringent test been run). That interpretation violates the severity criterion:  the observed passing result was altogether common if generated from a source where the pressure level was unacceptably high, Therefore it misinterprets the actual data. The question is why anyone would saddle the frequentist with such shenanigans on averages?  … Lest anyone think I am inventing a criticism, here is a familiar statistical instantiation, where the choice for each experiment is given to be .5 (Cox 1958).

Two Measuring Instruments with Different Precisions:

 A single observation X is to be made on a normally distributed random variable with unknown mean m, but the measurement instrument is chosen by a coin flip: with heads we use instrument E’ with a known small variance, say 10-4, while with tails, we use E”, with a known large variance, say 104. The full data indicates whether E’ or E” was performed, and the particular value observed, which we can write as x’ and x”, respectively. (This example comes up in, ton o’bricks).

In applying our test T+ (see November 2011 blog post ) to a null hypothesis, say, µ = 0, the “same” value of X would correspond to a much smaller p-value were it to have come from E’ than if it had come from E”.  Denote the two p-values as p’ and p”, respectively.  However, or so the criticism proceeds, the error statistician would report the average p-value:  .5(p’ + p”).

But this would give a misleading assessment of the precision and corresponding severity with either measurement! Instead you should report the p-value of the result in the experiment actually run (this is Cox’s Weak Conditionality Principle, WCP).

But what could lead the critic to suppose the error statistician must average over experiments not even performed?  Rule #2 for legitimate criticism is to give the position being criticized the most generous construal one can think of.  Perhaps the critic supposes what is actually a distortion of even the most radical behavioristic construal:

  •   If you consider outcomes that could have occurred in hypothetical repetitions of this experiment, you must also consider other experiments you did not run (but could have been run) in reasoning from the data observed (from the test you actually ran), and report some kind of frequentist average!

The severity requirement makes explicit that such a construal is to be rejected—I would have thought it obvious, and not in need of identifying a special principle. Since it wasn’t, I articulated this special notion for interpreting tests and the corresponding severity criterion.

Let me now give a special (the first!) honorary mention to Christian Robert [2] on this point, as raised in Cox and Mayo (2010).  He writes p. 9 http://arxiv.org/abs/1111.5827:

A compelling section is the one about the weak conditionality principle (pp.294- 298), as it objects to the usual statement that a frequency approach breaks this principle. In a mixture experiment about the same parameter θ, inferences made conditional on the experiment “are appropriately drawn in terms of the sampling behaviour in the experiment known to have been performed” (p. 296). This seems hardly objectionable, as stated. And I must confess the sin of stating the opposite as The Bayesian Choice has this remark (Robert (2007), Example 1.3.7, p.18) that the classical confidence interval averages over the experiments. The term experiment validates the above conditioning in that several experiments could be used to measure θ, each with a different p-value. I will not argue with this.

He would want me to mention that he does raise some caveats:

I could, however, [argue] about ‘conditioning is warranted to achieve objective frequentist goals’ (p. 298) in that the choice of the conditioning, among other things, weakens the objectivity of the analysis. In a sense the above pirouette out of the conditioning principle paradox suffers from the same weakness, namely that when two distributions characterise the same data (the mixture and the conditional distributions), there is a choice to be made between “good” and “bad”.

But there is nothing arbitrary about regarding as “good” the only experiment actually run and from which the actual data arose.  The severity criterion only makes explicit what is/should be already obvious. Objectivity, for us, is directed by the goal of making correct and warranted inferences, not freedom from thinking. After all, any time an experiment E is performed, the critic could insist that the decision to perform E is the result of some chance circumstances and with some probability we might have felt differently that day and have run some other test, perhaps a highly imprecise test or a much more precise test or anything in between, and demand that we report whatever average properties they come up with.  The error statistician can only shake her head in wonder that this gambit is at the heart of criticisms of frequentist tests.

Still, we exiled ones can’t be too fussy, and Robert still gets the mention for conceding that we have  a solid leg on which to pirouette.


[1] You can search the blog for connections between this event, the June 2010 conference at the LSE (especially the RMM volume), my introduction to deepwater drilling, and the blog’s “mascot” stock, Diamond offshore, DO, which, incidentally, just had earnings.

[2] There have been around 4-5 others since then, not sure.


Categories: Bayesian/frequentist, Comedy, Statistics | 2 Comments

Does statistics have an ontology? Does it need one? (draft 2)

questionmark pinkChance, rational beliefs, decision, uncertainty, probability, error probabilities, truth, random sampling, resampling, opinion, expectations. These are some of the concepts we bandy about by giving various interpretations to mathematical statistics, to statistical theory, and to probabilistic models. But are they real? The question of “ontology” asks about such things, and given the “Ontology and Methodology” conference here at Virginia Tech (May 4, 5), I’d like to get your thoughts (for possible inclusion in a Mayo-Spanos presentation).*  Also, please consider attending**.

Interestingly, I noticed the posts that have garnered the most comments have touched on philosophical questions of the nature of entities and processes behind statistical idealizations (e.g.,http://errorstatistics.com/2012/10/18/query/).copy-cropped-ampersand-logo-blog1

1. When an interpretation is supplied for a formal statistical account, its theorems may well turn out to express approximately true claims, and the interpretation may be deemed useful, but this does not mean the concepts give correct descriptions of reality. The interpreted axioms, and inference principles, are chosen to reflect a given philosophy, or set of intended aims: roughly, to use probabilistic ideas (i) to control error probabilities of methods (Neyman-Pearson, Fisher), or (ii) to assign and update degrees of belief, actual or rational (Bayesian).  But this does not mean its adherents have to take seriously the realism of all the concepts generated. In fact ,we often (on this blog) see supporters of various stripes of frequentist and Bayesian accounts running far away from taking their accounts literally, even as those interpretations are, or at least were, the basis and motivation for the development of the formal edifice (“we never meant this literally”).  But are these caveats on the same order? Or do some threaten the entire edifice of the account?

Starting with the error statistical account, recall Egon Pearson in his “Statistical Concepts in Their Relation to Reality” making it clear to Fisher that the business of controlling erroneous actions in the long run, acceptance sampling in industry and 5-year plans, only arose with Wald, and were never really part of the original Neyman-Pearson tests (declaring that the behaviorist philosophy was Neyman’s, not his).  The paper itself may be found here. I was interested to hear (Mayo 2005)  Neyman’s arch opponent, Bruno de Finetti, remark (quite correctly) that the expression “inductive behavior…that was for Neyman simply a slogan underlining and explaining the difference between his, the Bayesian and the Fisherian formulations” became with Abraham Wald’s work, “something much more substantial” (de Finetti 1972, 176).

Granted, it has not been obvious to people just how to interpret N-P tests “evidentially “ or “inferentially”—the subject of my work over many years. But there always seemed to me to be enough hints and examples to see what was intended: A statistical hypothesis H assigns probabilities to possible outcomes, and the warrant for accepting H as adequate—for an error statistician– is in terms of how well corroborated H is: how well H has stood up to tests that would have detected flaws in H, at least with very high probability. So the grounds for holding or using H are error statistical. The control and assessment of error probabilities may be used inferentially to determine the capabilities of methods to detect the adequacy/inadequacy of models, and express the extent of the discrepancies that have been identified. We also employ these ideas to detect gambits that make it too easy to find evidence for claims, even if the claims have been subjected to weak tests and biased procedures. A recent post is here.

The account has never professed to supply a unified logic, or any kind of logic for inference. The idea that there was a single rational way to make inferences was ridiculed by Neyman (whose birthday is April 16).

2. Proposed (“we never meant this literally”) withdrawals  from the Bayesian interpretations do not seem so innocuous. Perhaps some will say this just shows my bias. Let me grant that the popular idea of interpreting prior probability distributions as non-subjective, in some sense or other, is not so radical (though I’d still want to know how to interpret posteriors and why). But what we usually see now is some blurring of the two: touting the advantage of Bayesian methods because they incorporate background beliefs, while also advertising “conventional” (default, reference, or “objective”) priors as having minimal influence on inference. [1] See “Grace and amen Bayesianism within this deconstruction. Also relevant: Irony and Bad Faith: Deconstructing Bayesians.

Perhaps the most popular view nowadays regards the prior as some kind of uninterpreted mathematical construct, merely serving to get a posterior. These same Bayesians, some of them, advocate “testing” the prior, but this is hard to grasp if we do not know what the priors intend to be, or stand for.  Then there are those Bayesians, perhaps they are a radical (but influential) subgroup, who deny the machine of updating by Bayes theorem altogether.  In Gelman (2011) (our special topic of RMM):

“Our key departure from the mainstream Bayesian view (as expressed, for example, [in Wikipedia]) is that we do not attempt to assign posterior probabilities to models or to select or average over them using posterior probabilities. Instead, we use predictive checks to compare models to data and use the information thus learned about anomalies to motivate model improvements.” (p. 71).

In Gelman and Robert (2013), we hear that a major source of Bayesian criticism comes from assuming “that Bayesians actually seem to believe their assumptions rather than merely treating them as counters in a mathematical game.” (p. 3) This comes as a surprise to those of us who thought the Bayesians really meant it. So what is the game being played?

[W]e make strong assumptions and use subjective knowledge in order to make inferences and predictions that can be tested by comparing to observed and new data (see Gelman and Shalizi, 2012, or Mayo, 1996 for a similar attitude coming from a non-Bayesian direction). (p. 3)

So maybe some kind of a “non-Bayesian checking of Bayesian models” would offer more a more promising foundation, at least for Gelman’s brand of “Bayesian falsificationism” (Gelman 2011). See my 2013 Comments on Gelman and Shalizi [2]. On the face of it, any inference, whether to the adequacy of a model (for a given purpose), or to a posterior probability, can be said to be warranted just to the extent that the inference has withstood severe testing: one with a high probability of having found flaws were they present.  The ontology matters less than the epistemology.

Thus, the severity idea, could conceivably illuminate what’s going on with Gelman’s model checking; I find the idea promising, but do not really know what he thinks.

But to pursue such an avenue still requires reckoning with a fundamental issue at the foundations of Bayesian method: the interpretation of and justification for the prior probability distribution. Error statisticians use idealizations, but they are tightly constrained by the need for error probabilities, in a statistical model, to approximate the actual ones, even if only hypothetical, or checked by simulation. We are modeling real processes, not knowledge of processes.

Gelman and Robert (2013) allow:

“that many Bayesians over the years have muddied the waters by describing parameters as random rather than fixed. Once again, for Bayesians as much as for any other statistician, parameters are (typically) fixed but unknown. It is the knowledge about these unknowns that Bayesians model as random” (p. 4).

Bayesians will …assign a probability distribution to a parameter that one could not possibly imagine to have been generated by a random process, parameters such as the coefficient of party identification in a regression on vote choice, or the overdispersion in a network model, or Hubble’s constant in cosmology. There is no inconsistency in this opposition once one realizes that priors are not reflections of a hidden “truth” but rather evaluations of the modeler’s uncertainty about the parameter. (p. 3)

The choice, of course, is not between modeling a “hidden ‘truth’” and modeling “the modeler’s uncertainty”. Actually, in the majority of the examples I have seen, it seems better to imagine the parameter being generated by a random process.  On the other hand, “the modeler’s uncertainty about the parameter” is one of the most unclear parts of Bayesian modeling. It is not that we can’t see measuring the degree of evidence, corroboration, severity of test, or the like, that is accorded a claim about a fixed parameter.  We can and do. It is just that those measures will not be well represented as posterior or prior probabilities, obeying the probability calculus.

Possibly an idea I once proposed–a variation on a view held by the frequentist Reichenbach– can work (in EGEK, ch. 4 1996). Reichenbach suggested that scientists might eventually be able to assess the relative frequency with which a given type of hypothesis or theory is true. This might provide it a frequentist probability assignment. I don’t see how one could get such a relative frequency (or rather I can see many different reference sets that could be used), nor why knowing such quantities would be useful in appraising the evidence for a given hypothesis H. My variation (Chapter 4 Duhem, Kuhn, and Bayes, pp 120-4) is to consider the relative frequency with which evidence of a certain strength, (e.g., passing k tests with increasingly impressive error probabilities)  is generated, despite H being false. This is attainable. But that of course take us to an error probabilistic assessment!

Maybe this style of Bayesianism doesn’t need a clear ontology so long as it’s got a clear epistemology. But does it?***

What do readers think?

*To see the full list of speakers: “Ontology and Methodology” conference. Actually our presentation will likely take a different tack, but I still want to hear your thoughts.

**Registration is free, but required, by April 20-25.

***I should say right off (for those who do not know) that my work is not in metaphysics, but on philosophical problems about inductive-statisical inference , experiment and evidence.My colleague (and co-conference organizer) Ben Jantzen is the “ontology” guy, and the third colleague involved, Lydia Patton, does O & M as well as HPS.

For further references, see those within posts and papers linked here, or search this blog.

De Finetti, B. (1972), Probability, Induction, and Statistics: The Art of Guessing. NY, Wiley.

Gelman, A. (2011). Induction and deduction in Bayesian data analysisRationality, Markets and Morals (RMM) 2, 67–78.

Gelman, A.and C. Shalizi. (Article first published online: 24 Feb 2012). “Philosophy and the Practice of Bayesian statistics (with discussion)”.British Journal of Mathematical and Statistical Psychology (BJMSP).

Gelman, A, and Robert, C. (2013). Not only defended but also applied: The perceived absurdity of Bayesian inference.

http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf

Kass and Wasserman, L. (1996). The Selection of Prior Distributions by Formal Rules. Journal of the American Statistical Association 91, 1343-1370.

Mayo, D. G. (1996).[EGEK] Error and the growth of experimental knowledge. Chicago: University of Chicago Press.

_____ (2005). Evidence as passing severe tests: Highly probable vs. highly probed hypotheses. In P. Achinstein (Ed.), Scientific Evidence (pp. 95-127). Baltimore: Johns Hopkins University Press.

_____ (2011). Statistical science and philosophy of science: where do/should they meet in 2011 (and beyond)?Rationality, Markets and Morals (RMM) 2, Special Topic: Statistical Science and Philosophy of Science, 79–102.

_____ (2013). Comments on A. Gelman and C. Shalizi: Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, forthcoming.

Mayo, D. and Cox, D. (2010). Frequentist statistics as a theory of inductive inference. In D. Mayo and A. Spanos (Eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (pp. 247-275). Cambridge: Cambridge University Press. This paper appeared in The Second Erich L. Lehmann Symposium: Optimality, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics, 247-275.

Mayo, D. and Spanos, A. (2011). Error statistics. In P. Bandyopadhyay and M. Forster (Volume Eds.); D. M.Gabbay, P. Thagard and J. Woods (General Eds.). Philosophy of statistics: Handbook of philosophy of science Vol 7 (pp. 1-46). The Netherlands: Elsevier.

Pearson, E. S. (1955). Statistical concepts in their relation to reality.  Journal of the Royal Statistical SocietyB 17, 204-207.

Senn, S. (2011). You may believe you are a Bayesian but you are probably wrong. Rationality, Markets and Morals (RMM) 2, Special Topic: Statistical Science and Philosophy of Science, 48-66.


[1] For a thorough account of problems with the latter, see Kass and Wasserman (1996).

[2] I take Gelman-Shalizi (2012) to be an attempt at a meeting of the minds between Bayesian Gelman and error statistical Shalizi. I may be wrong.

Categories: Bayesian/frequentist, Error Statistics, Statistics | 59 Comments

Who is allowed to cheat? I.J. Good and that after dinner comedy hour….

UnknownIt was from my Virginia Tech colleague I.J. Good (in statistics), who died four years ago (April 5, 2009), at 93, that I learned most of what I call “howlers” on this blog. His favorites were based on the “paradoxes” of stopping rules.

“In conversation I have emphasized to other statisticians, starting in 1950, that, in virtue of the ‘law of the iterated logarithm,’ by optional stopping an arbitrarily high sigmage, and therefore an arbitrarily small tail-area probability, can be attained even when the null hypothesis is true. In other words if a Fisherian is prepared to use optional stopping (which usually he is not) he can be sure of rejecting a true null hypothesis provided that he is prepared to go on sampling for a long time. The way I usually express this ‘paradox’ is that a Fisherian [but not a Bayesian] can cheat by pretending he has a plane to catch like a gambler who leaves the table when he is ahead” (Good 1983, 135) [*]

This paper came from a conference where we both presented, and he was extremely critical of my error statistical defense on this point. (I was a year out of grad school, and he a University Distinguished Professor.) 

One time, years later, after hearing Jack give this howler for the nth time, “a Fisherian [but not a Bayesian] can cheat, etc.,” I was driving him to his office, and suddenly blurted out what I really thought:

“You know Jack, as many times as I have heard you tell this, I’ve always been baffled as to its lesson about who is allowed to cheat. Error statisticians require the overall and not the ‘computed’ significance level be reported. To us, what would be cheating would be reporting the significance level you got after trying and trying again in just the same way as if the test had a fixed sample size. True, we are forced to fret about how stopping rules alter the error probabilities of tests, while the Bayesian is free to ignore them, but why isn’t the real lesson that the Bayesian is allowed to cheat?” (A published version of my remark may be found in EGEK p. 351: “As often as my distinguished colleague presents this point…”)

 To my surprise, or actually shock, after pondering this a bit, Jack said something like, “Hmm, I never thought of it this way.”

images-3By the way, the story of the “after dinner Bayesian comedy hour” on this blog, did not allude to Jack but to someone who gave a much more embellished version. Since it’s Saturday night, let’s once again listen into the comedy hour that unfolded at my dinner table at an academic conference:

 Did you hear the one about the researcher who gets a phone call from the guy analyzing his data? First the guy congratulates him and says, “The results show a Read more »

Categories: Bayesian/frequentist, Comedy, Statistics | Tags: , , | 68 Comments

From Gelman’s blog: philosophy and the practice of Bayesian statistics

mayo blackboard b&w 2I hadn’t read Gelman and Shalizi’s response to my comment on their paper in the British Journal of Mathematical and Statistical Psychology. I see the issue is posted on Gelman’s blogHere’s the issue of the journal,

Philosophy and the practice of Bayesian statistics (with all the discussions!)

Philosophy and the practice of Bayesian statistics (pages 8–38)
Andrew Gelman and Cosma Rohilla Shalizi

How to practise Bayesian statistics outside the Bayesian church: What philosophy for Bayesian statistical modelling? (pages 39–44) Denny Borsboom and Brian D. Haig

Posterior predictive checks can and should be Bayesian: Comment on Gelman and Shalizi, ‘Philosophy and the practice of Bayesian statistics’ (pages 45–56)
John K. Kruschke

The error-statistical philosophy and the practice of Bayesian statistics: Comments on Gelman and Shalizi: ‘Philosophy and the practice of Bayesian statistics’ (pages 57–64)
Deborah G. Mayo

Comment on Gelman and Shalizi (pages 65–67)
Stephen Senn

The humble Bayesian: Model checking from a fully Bayesian perspective (pages 68–75)
Richard D. Morey, Jan-Willem Romeijn and Jeffrey N. Rouder

Rejoinder to discussion of ‘Philosophy and the practice of Bayesian statistics’(pages 76–80)
Andrew Gelman and Cosma Shalizi

Categories: Bayesian/frequentist, Error Statistics, Philosophy of Statistics | Leave a comment

Mayo on S. Senn: “How Can We Cultivate Senn’s-Ability?”–reblogs

Since Stephen Senn will be leading our seminar at the LSE tomorrow morning (see PH500 page), I’m reblogging my deconstruction of his paper (“You May Believe You Are a Bayesian But You Probably Are Wrong”) from Jan.15 2012 (though not his main topic tomorrow). At the end I link to other “U-Phils” on Senn’s paper (by Andrew Gelman, Andrew Jaffe, Christian Robert), Senn’s response, and my response to them). Queries, write me at: error@vt.edu

Mayo Philosophizes on Stephen Senn: “How Can We Cultivate Senn’s-Ability?”

Where's Mayo?

Where’s Mayo?

Although, in one sense, Senn’s remarks echo the passage of Jim Berger’s that we deconstructed a few weeks ago, Senn at the same time seems to reach an opposite conclusion. He points out how, in practice, people who claim to have carried out a (subjective) Bayesian analysis have actually done something very different—but that then they heap credit on the Bayesian ideal. (See also “Who Is Doing the Work?”)

“A very standard form of argument I do object to is the one frequently encountered in many applied Bayesian papers where the first paragraphs laud the Bayesian approach on various grounds, in particular its ability to synthesize all sources of information, and in the rest of the paper the authors assume that because they have used the Bayesian machinery of prior distributions and Bayes theorem they have therefore done a good analysis. It is this sort of author who believes that he or she is Bayesian but in practice is wrong.” (Senn 58) Read more »

Categories: Bayesian/frequentist, U-Phil | 11 Comments

Blog at WordPress.com. Theme: Customized Adventure Journal by Contexture International.

Follow

Get every new post delivered to your Inbox.

Join 84 other followers