Error Statistics

Guest Post. Kent Staley: On the Five Sigma Standard in Particle Physics

Kent Staley

Kent Staley
Associate Professor
Department of philosophy
Saint Louis University

Regular visitors to Error Statistics Philosophy may recall a discussion that broke out here and on other sites last summer when the CMS and ATLAS collaborations at the Large Hadron Collider announced that they had discovered a new particle in their search for the Higgs boson that had at least some of the properties expected of the Higgs. Both collaborations emphasized that they had results that were significant at the level of “five sigma,” and the press coverage presented this is a requirement in high energy particle physics for claiming a new discovery. Both the use of significance testing and the reliance on the five sigma standard became a matter of debate.

Mayo has already commented on the recent updates to the Higgs search results (here and here); these seem to have further solidified the evidence for a new boson and the identification of that boson with the Higgs of the Standard Model. I have been thinking recently about the five sigma standard of discovery and what we might learn from reflecting on its role in particle physics. (I gave a talk on this at a workshop sponsored by the “Epistemology of the Large Hadron Collider” project at Wuppertal [i], which included both philosophers of science and physicists associated with the ATLAS collaboration.)

Just to refresh our memories, back in July 2012, Tony O’Hagan posted at the ISBA forum (prompted by “a question from Dennis Lindley”) three questions regarding the five-sigma claim:

  1. “Why such an extreme evidence requirement?} We know from a Bayesian perspective that this only makes sense if (a) the existence of the Higgs boson (or some other particle sharing some of its properties) has extremely small prior probability and/or (b) the consequences of erroneously announcing its discovery are dire in the extreme. Neither seems to be the case, so why 5-sigma?
  2. “Rather than ad hoc justification of a p-value, it is of course better to do a proper Bayesian analysis. Are the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?
  3. “We know that given enough data it is nearly always possible for a significance test to reject the null hypothesis at arbitrarily low p-values, simply because the parameter will never be exactly equal to its null value. And apparently the LHC has accumulated a very large quantity of data. So could even this extreme p-value be illusory?”

O’Hagan received a lot of responses to this post, and he very helpfully wrote up and posted a digest of those responses, discussed on this blog here and here. Continue reading

Categories: Error Statistics, P-values, Statistics | 26 Comments

capitalizing on chance

Mayo playing the slots

DGM playing the slots

Hardly a day goes by where I do not come across an article on the problems for statistical inference based on fallaciously capitalizing on chance: high-powered computer searches and “big” data trolling offer rich hunting grounds out of which apparently impressive results may be “cherry-picked”:

When the hypotheses are tested on the same data that suggested them and when tests of significance are based on such data, then a spurious impression of validity may result. The computed level of significance may have almost no relation to the true level. . . . Suppose that twenty sets of differences have been examined, that one difference seems large enough to test and that this difference turns out to be “significant at the 5 percent level.” Does this mean that differences as large as the one tested would occur by chance only 5 percent of the time when the true difference is zero? The answer is no, because the difference tested has been selected from the twenty differences that were examined. The actual level of significance is not 5 percent, but 64 percent! (Selvin 1970, 104)[1]

…Oh wait -this is from a contributor to Morrison and Henkel way back in 1970! But there is one big contrast, I find, that makes current day reports so much more worrisome: critics of the Morrison and Henkel ilk clearly report that to ignore a variety of “selection effects” results in a fallacious computation of the actual significance level associated with a given inference; clear terminology is used to distinguish the “computed” or “nominal” significance level on the one hand, and the actual or warranted significance level on the other. Nowadays, writers make it much less clear that the fault lies with the fallacious use of significance tests and other error statistical methods. Instead, the tests are blamed for permitting or even encouraging such misuses. Criticisms to the effect that we should stop trying to teach these methods correctly have hardly helped. The situation is especially puzzling given the fact that these same statistical fallacies have trickled down to the public sphere, what with Ben Goldacre’s “Bad Pharma”, calls for “all trials” to be registered and reported, and the popular articles on the ills of ‘big data’: Continue reading

Categories: Error Statistics, Statistics | 19 Comments

From Gelman’s blog: philosophy and the practice of Bayesian statistics

mayo blackboard b&w 2I hadn’t read Gelman and Shalizi’s response to my comment on their paper in the British Journal of Mathematical and Statistical Psychology. I see the issue is posted on Gelman’s blogHere’s the issue of the journal,

Philosophy and the practice of Bayesian statistics (with all the discussions!)

Philosophy and the practice of Bayesian statistics (pages 8–38)
Andrew Gelman and Cosma Rohilla Shalizi

How to practise Bayesian statistics outside the Bayesian church: What philosophy for Bayesian statistical modelling? (pages 39–44) Denny Borsboom and Brian D. Haig

Posterior predictive checks can and should be Bayesian: Comment on Gelman and Shalizi, ‘Philosophy and the practice of Bayesian statistics’ (pages 45–56)
John K. Kruschke

The error-statistical philosophy and the practice of Bayesian statistics: Comments on Gelman and Shalizi: ‘Philosophy and the practice of Bayesian statistics’ (pages 57–64)
Deborah G. Mayo

Comment on Gelman and Shalizi (pages 65–67)
Stephen Senn

The humble Bayesian: Model checking from a fully Bayesian perspective (pages 68–75)
Richard D. Morey, Jan-Willem Romeijn and Jeffrey N. Rouder

Rejoinder to discussion of ‘Philosophy and the practice of Bayesian statistics’(pages 76–80)
Andrew Gelman and Cosma Shalizi

Categories: Bayesian/frequentist, Error Statistics, Philosophy of Statistics | Leave a comment

P-values as posterior odds?

METABLOG QUERYI don’t know how to explain to this economist blogger that he is erroneously using p-values when he claims that “the odds are” (1 – p)/p that a null hypothesis is false. Maybe others want to jump in here?

On significance and model validation (Lars Syll)

Let us suppose that we as educational reformers have a hypothesis that implementing a voucher system would raise the mean test results with 100 points (null hypothesis). Instead, when sampling, it turns out it only raises it with 75 points and having a standard error (telling us how much the mean varies from one sample to another) of 20. Continue reading

Categories: fallacy of non-significance, Severity, Statistics | 36 Comments

Severity Calculator

Severitiy excel program pic

SEV calculator (with comparisons to p-values, power, CIs)

In the illustration in the Jan. 2 post,

H0: μ < 0 vs H1: μ > 0

and the standard deviation SD = 1, n = 25, so σx  = SD/√n = .2
Setting α to .025, the cut-off for rejection is .39.  (can round to .4).

Let the observed mean X  = .2 , a statistically insignificant result (p value = .16)
SEV (μ < .2) = .5
SEV(μ <.3) = .7
SEV(μ <.4) = .84
SEV(μ <.5) = .93
SEV(μ <.6*) = .975
*rounding

Some students asked about crunching some of the numbers, so here’s a rather rickety old SEV calculator*. It is limited, rather scruffy-looking (nothing like the pretty visuals others post) but it is very useful. It also shows the Normal curves, how shaded areas change with changed hypothetical alternatives, and gives contrasts with confidence intervals. Continue reading

Categories: Severity, statistical tests | Leave a comment

Severity as a ‘Metastatistical’ Assessment

Some weeks ago I discovered an error* in the upper severity bounds for the one-sided Normal test in section 5 of: “Statistical Science Meets Philosophy of Science Part 2” SS & POS 2.  The published article has been corrected.  The error was in section 5.3, but I am blogging all of 5.  

(* μo was written where xo should have been!)

5. The Error-Statistical Philosophy

I recommend moving away, once and for all, from the idea that frequentists must ‘sign up’ for either Neyman and Pearson, or Fisherian paradigms. As a philosopher of statistics I am prepared to admit to supplying the tools with an interpretation and an associated philosophy of inference. I am not concerned to prove this is what any of the founders ‘really meant’.

Fisherian simple-significance tests, with their single null hypothesis and at most an idea of  a directional alternative (and a corresponding notion of the ‘sensitivity’ of a test), are commonly distinguished from Neyman and Pearson tests, where the null and alternative exhaust the parameter space, and the corresponding notion of power is explicit. On the interpretation of tests that I am proposing, these are just two of the various types of testing contexts appropriate for different questions of interest. My use of a distinct term, ‘error statistics’, frees us from the bogeymen and bogeywomen often associated with ‘classical’ statistics, and it is to be hoped that that term is shelved. (Even ‘sampling theory’, technically correct, does not seem to represent the key point: the sampling distribution matters in order to evaluate error probabilities, and thereby assess corroboration or severity associated with claims of interest.) Nor do I see that my comments turn on whether one replaces frequencies with ‘propensities’ (whatever they are). Continue reading

Categories: Error Statistics, philosophy of science, Philosophy of Statistics, Severity, Statistics | 5 Comments

An established probability theory for hair comparison? “is not — and never was”

Forensic Hair red

Hypothesis H: “person S is the source of this hair sample,” if indicated by a DNA match, has passed a more severe test than if it were indicated merely by a visual analysis under a microscopic. There is a much smaller probability of an erroneous hair match using DNA testing than using the method of visual analysis used for decades by the FBI.

The Washington Post reported on its latest investigation into flawed statistics behind hair match testimony. “Thousands of criminal cases at the state and local level may have relied on exaggerated testimony or false forensic evidence to convict defendants of murder, rape and other felonies”. Below is an excerpt of the Post article by Spencer S. Hsu.

I asked John Byrd, forensic anthropologist and follower of this blog, what he thought. It turns out that “hair comparisons do not have a well-supported weight of evidence calculation.” (Byrd).  I put Byrd’s note at the end of this post. Continue reading

Categories: Severity, Statistics | 14 Comments

13 well-worn criticisms of significance tests (and how to avoid them)

IMG_12432013 is right around the corner, and here are 13 well-known criticisms of statistical significance tests, and how they are addressed within the error statistical philosophy, as discussed in Mayo, D. G. and Spanos, A. (2011) “Error Statistics“.

  •  (#1) error statistical tools forbid using any background knowledge.
  •  (#2) All statistically significant results are treated the same.
  • (#3) The p-value does not tell us how large a discrepancy is found.
  • (#4) With large enough sample size even a trivially small discrepancy from the null can be detected.
  •  (#5) Whether there is a statistically significant difference from the null depends on which is the null and which is the alternative.
  • (#6) Statistically insignificant results are taken as evidence that the null hypothesis is true.
  • (#7) Error probabilities are misinterpreted as posterior probabilities.
  • (#8) Error statistical tests are justified only in cases where there is a very long (if not infinite) series of repetitions of the same experiment.
  • (#9) Specifying statistical tests is too arbitrary.
  • (#10) We should be doing confidence interval estimation rather than significance tests.
  • (#11) Error statistical methods take into account the intentions of the scientists analyzing the data.
  • (#12) All models are false anyway.
  • (#13) Testing assumptions involves illicit data-mining.

You can read how we avoid them in the full paper here.

Mayo, D. G. and Spanos, A. (2011) “Error Statistics” in Philosophy of Statistics , Handbook of Philosophy of Science Volume 7 Philosophy of Statistics, (General editors: Dov M. Gabbay, Paul Thagard and John Woods; Volume eds. Prasanta S. Bandyopadhyay and Malcolm R. Forster.) Elsevier: 1-46.

Categories: Error Statistics, significance tests, Statistics | Tags: | Leave a comment

Error Statistics (brief overview)

In view of some questions about “behavioristic” vs “evidential” construals of frequentist statistics (from the last post), and how the error statistical philosophy tries to improve on Birnbaum’s attempt at providing the latter, I’m reblogging a portion of a post from Nov. 5, 2011 when I also happened to be in London. (The beginning just records a goofy mishap with a skeletal key, and so I leave it out in this reblog.) Two papers with much more detail are linked at the end.

Error Statistics

(1) There is a “statistical philosophy” and a philosophy of science. (a) An error-statistical philosophy alludes to the methodological principles and foundations associated with frequentist error-statistical methods. (b) An error-statistical philosophy of science, on the other hand, involves using the error-statistical methods, formally or informally, to deal with problems of philosophy of science: to model scientific inference (actual or rational), to scrutinize principles of inference, and to address philosophical problems about evidence and inference (the problem of induction, underdetermination, warranting evidence, theory testing, etc.). Continue reading

Categories: Error Statistics, Philosophy of Statistics, Statistics | Tags: , , | 10 Comments

Comments on Wasserman’s “what is Bayesian/frequentist inference?”

What I like best about Wasserman’s blogpost (Normal Deviate) is his clear denial that merely using conditional probability makes the method Bayesian (even if one chooses to call the conditional probability theorem Bayes’s theorem, and even if one is using ‘Bayes’s’ nets). Else any use of probability theory is Bayesian, which trivializes the whole issue. Thus, the fact that conditional probability is used in an application with possibly good results is not evidence of (yet another) Bayesian success story [i].

But I do have serious concerns that in his understandable desire (1) to be even-handed (hammers and screwdrivers are for different purposes, both perfectly kosher tools), as well as (2) to give a succinct sum-up of methods,Wasserman may encourage misrepresenting positions. Speaking only for “frequentist” sampling theorists [ii], I would urge moving away from the recommended quick sum-up of “the goal” of frequentist inference: “Construct procedures with frequency guarantees”. If by this Wasserman means that the direct aim is to have tools with “good long run properties”, that rarely err in some long run series of applications, then I think it is misleading. In the context of scientific inference or learning, such a long-run goal, while necessary is not at all sufficient; moreover, I claim, that satisfying this goal is actually just a byproduct of deeper inferential goals (controlling and evaluating how severely given methods are capable of revealing/avoiding erroneous statistical interpretations of data in the case at hand.) (So I deny that it is even the main goal to which frequentist methods direct themselves.) Even arch behaviorist Neyman used power post-data to ascertain how well corroborated various hypotheses were—never mind long-run repeated applications (see one of my Neyman’s Nursery posts). Continue reading

Categories: Error Statistics, Neyman's Nursery, Philosophy of Statistics, Statistics | 21 Comments

PhilStat: So you’re looking for a Ph.D dissertation topic?

Maybe you’ve already heard Hal Varian, Google’s chief economist: “The next sexy job in the next ten years will be statisticians.” Even Larry Wasserman declares that “statistics is sexy.” In that case, philosophy of statistics must be doubly so!

Thus one wonders at the decline of late in the lively and long-standing exchange between philosophers of science and statisticians. If you are a graduate student wondering how you might make your mark in a philosophy of science area, philosophy of statistical science, fairly brimming over with rich and open philosophical problems, may be the thing for you!* Surprising, pressing, intriguing, and novel philosophical twists on both traditional and cutting-edge controversies are going begging for analysis—they not only bear on many areas of popular philosophy but also may offer you ways of getting out in front of them.

I came across a spotty blog by Pitt graduate student Gregory Gandenberger awhile back (not like his new, frequently updated one) where he was wrestling with a topic for his masters thesis, and some years later, wrangling over dissertation topics in philosophy of statistics. After I started this blog, I looked for it again, and now I’ve invited him to post, on the topic of his choice, as he did here, and I invite other graduate students though the U-Phil call. Continue reading

Categories: Error Statistics, philosophy of science, Philosophy of Statistics | 3 Comments

Reblogging: Oxford Gaol: Statistical Bogeymen

Reblogging 1 year ago in Oxford: Oxford Jail is an entirely fitting place to be on Halloween!

Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba!  My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should I think be shelved, not that names should matter.

In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory.  Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended.   But for Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort) Continue reading

Categories: Error Statistics, Philosophy of Statistics | Tags: , | Leave a comment

Mayo: (section 7) “StatSci and PhilSci: part 2″

Here is the final section (7) of my paper: “Statistical Science Meets Philosophy of Science Part 2” SS & POS 2.* Section 6 is in my last post.

7. Can/Should Bayesian and Error Statistical Philosophies Be Reconciled?

Stephen Senn makes a rather startling but doubtlessly true remark:

The late and great George Barnard, through his promotion of the likelihood principle, probably did as much as any statistician in the second half of the last century to undermine the foundations of the then dominant Neyman-Pearson framework and hence prepare the way for the complete acceptance of Bayesian ideas that has been predicted will be achieved by the De Finetti-Lindley limit of 2020. (Senn 2008, 459)

Many do view Barnard as having that effect, even though he himself rejected the likelihood principle (LP). One can only imagine Savage’s shock at hearing that contemporary Bayesians (save true subjectivists) are lukewarm about the LP! The 2020 prediction could come to pass, only to find Bayesians practicing in bad faith. Kadane, one of the last of the true Savage Bayesians, is left to wonder at what can only be seen as a Pyrrhic victory for Bayesians.

Continue reading

Categories: Error Statistics, Philosophy of Statistics, Statistics | Leave a comment

Mayo: (section 6) “StatSci and PhilSci: part 2″

Here is section 6 of my paper: “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. Section 5 is in my last post.

6. Some Knock-Down Criticisms of Frequentist Error Statistics

 With the error-statistical philosophy of inference under our belts, it is easy to run through the classic and allegedly damning criticisms of frequentist errorstatistical methods. Open up Bayesian textbooks and you will find, endlessly reprised, the handful of ‘counterexamples’ and ‘paradoxes’ that make up the charges leveled against frequentist statistics, after which the Bayesian account is proffered as coming to the rescue. There is nothing about how frequentists have responded to these charges; nor evidence that frequentist theory endorses the applications or interpretations around which these ‘chestnuts’ revolve.

If frequentist and Bayesian philosophies are to find common ground, this should stop. The value of a generous interpretation of rival views should cut both ways. A key purpose of the forum out of which this paper arises is to encourage reciprocity.

Continue reading

Categories: Error Statistics, philosophy of science, Philosophy of Statistics | Leave a comment

Mayo: (section 5) “StatSci and PhilSci: part 2”

Here is section 5 of my new paper: “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. Sections 1 and 2 are in my last post.*

5. The Error-Statistical Philosophy

I recommend moving away, once and for all, from the idea that frequentists must ‘sign up’ for either Neyman and Pearson, or Fisherian paradigms. As a philosopher of statistics I am prepared to admit to supplying the tools with an interpretation and an associated philosophy of inference. I am not concerned to prove this is what any of the founders ‘really meant’.

Fisherian simple-significance tests, with their single null hypothesis and at most an idea of  a directional alternative (and a corresponding notion of the ‘sensitivity’ of a test), are commonly distinguished from Neyman and Pearson tests, where the null and alternative exhaust the parameter space, and the corresponding notion of power is explicit. On the interpretation of tests that I am proposing, these are just two of the various types of testing contexts appropriate for different questions of interest. My use of a distinct term, ‘error statistics’, frees us from the bogeymen and bogeywomen often associated with ‘classical’ statistics, and it is to be hoped that that term is shelved. (Even ‘sampling theory’, technically correct, does not seem to represent the key point: the sampling distribution matters in order to evaluate error probabilities, and thereby assess corroboration or severity associated with claims of interest.) Nor do I see that my comments turn on whether one replaces frequencies with ‘propensities’ (whatever they are). Continue reading

Categories: Error Statistics, philosophy of science, Philosophy of Statistics, Severity | 5 Comments

Mayo: (first 2 sections) “StatSci and PhilSci: part 2”

Here are the first two sections of my new paper: “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. (Alternatively, go to the RMM page and scroll down to the Sept 26, 2012 entry.)

1. Comedy Hour at the Bayesian Retreat[i]

 Overheard at the comedy hour at the Bayesian retreat: Did you hear the one about the frequentist…

 “who defended the reliability of his radiation reading, despite using a broken radiometer, on the grounds that most of the time he uses one that works, so on average he’s pretty reliable?”

or

 “who claimed that observing ‘heads’ on a biased coin that lands heads with probability .05 is evidence of a statistically significant improvement over the standard treatment of diabetes, on the grounds that such an event occurs with low probability (.05)?”

Such jests may work for an after-dinner laugh, but if it turns out that, despite being retreads of ‘straw-men’ fallacies, they form the basis of why some statisticians and philosophers reject frequentist methods, then they are not such a laughing matter. But surely the drubbing of frequentist methods could not be based on a collection of howlers, could it? I invite the reader to stay and find out. Continue reading

Categories: Error Statistics, Philosophy of Statistics, Severity | 2 Comments

Mayo Responds to U-Phils on Background Information

Thanks to Emrah Aktunc and Christian Hennig for their U-Phils on my September 12 post: “How should ‘prior information’ enter in statistical inference?” and my subsequent deconstruction of Gelman[i] (starting here, and ending with part 3).  I’ll begin with some remarks on Emrah Aktunc’s contribution.

First, we need to avoid an ambiguity that clouds prior information and prior probability. In a given experiment, prior information may be stronger than the data: to take but one example, say that we’ve already falsified Newton’s theory of gravity in several domains, but in our experiment the data (e.g., one of the sets of eclipse data from 1919) accords with the Newtonian prediction (of half the amount of deflection as that predicted by Einstein’s general theory of relativity [GTR]). The pro-Newton data, in and of itself, would be rejected because of all that we already know. Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics, U-Phil | Tags: , , | 4 Comments

U-Phils: Hennig and Aktunc on Gelman 2012

I am posting two U-Phils I received in relation to the 9/12 call call on Andrew Gelman’s (2012): “Ethics and the statistical use of prior information”

A Deconstruction of Gelman by Mayo in 3 parts:
(10/5/12) Part 1: “A Bayesian wants everybody else to be a non-Bayesian”
(10/7/12) Part 2: Using prior Information
(10/9/12) Part 3: beauty and the background knowledge

Comments on “How should prior information enter in statistical inference”

Christian Hennig 
Department of Statistical Science
University College London

Reading the blog entries on this topic, the Cox-Mayo Conversation and the linked paper by Gelman, I appreciate the valuable thoughts in both, which to me all make sense, specifying situations where prior information is rather not desired to enter, or rather in the Bayesian way.

Thinking more about the issue, however, I find both the frequentist and the Bayesian approach seriously wanting in this respect (and I don’t have a better one myself either).

A difference between the approaches seems to be that Cox/Mayo rather look at the analysis of data in an isolated situation whereas Gelman rather writes about conclusions from not only analysing a particular data set, but from aggregating all the information available.

Cox/Mayo do not advocate to ignore prior knowledge, but they prefer to keep it out of the process of actually analysing the data. Mayo talks of a piecemeal approach in which results from different data analyses can be put together in order to get an overall picture. Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics, Testing Assumptions, U-Phil | 11 Comments

Last part (3) of the deconstruction: beauty and background knowledge

Please see parts 1 and 2 and links therein. The background began in my Sept 12 post.

Gelman (2012) considers a case where the overall available evidence, E, is at odds with the indication of the results x from a given study:

Consider the notorious study in which a random sample of a few thousand people was analyzed, and it was found that the most beautiful parents were 8 percentage points more likely to have girls, compared to less attractive parents. The result was statistically significant (p<.05) and published in a reputable journal. But in this case we have good prior information suggesting that the difference in sex ratios in the population, comparing beautiful to less-beautiful parents, is less than 1 percentage point. A (non-Bayesian) design analysis reveals that, with this level of true difference, any statistically-significant observed difference in the sample is likely to be noise. At this point, you might well say that the original analysis should never have been done at all—but, given that it has been done, it is essential to use prior information (even if not in any formal Bayesian way) to interpret the data and generalize from sample to population.

Where did Fisher’s principle go wrong here? The answer is simple—and I think Cox would agree with me here. We’re in a setting where the prior information is much stronger than the data. (p. 3)

Let me simply grant Gelman that this prior information warrants (with severity) the hypothesis H:

H: “difference in sex ratios in the population, comparing beautiful to less-beautiful parents, is less than 1 percentage point,” (ibid.)

especially given my suspicions of the well-testedness of claims to show the effects of “beautiful to less-beautiful” on anything. I will simply take it as a given that it is well-tested background “knowledge.” Presumably, the well-tested claim goes beyond those individuals observed, and is generalizing at least to some degree. So we are given that the hypothesis H is one for which there is strong evidence. Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics, U-Phil | 14 Comments

Deconstructing Gelman part 2: Using prior Information

(Please see part 1 for links and references):

A Bayesian, Gelman tells us, “wants everybody else to be a non-Bayesian” (p. 5). Despite appearances, the claim need not be seen as self-contradictory, at least if we interpret it most generously, as Rule #2 (of this blog) directs. Whether or not “a Bayesian” refers to all Bayesians or only non-standard Bayesians (i.e., those wearing a hat of which Gelman approves), his meaning might be simply that when setting out with his own inquiry, he doesn’t want your favorite priors (be that beliefs or formally derived constructs) getting in the way. A Bayesian, says Gelman (in this article) is going to make inferences based on “trying to extract information from the data” in order to determine what to infer or believe (substitute your preferred form of output) about some aspect of a population (or mechanism) generating the data, as modeled. He just doesn’t want the “information from the data” muddied by your particular background knowledge. He would only have to subtract out all of this “funny business” to get at your likelihoods. He would only have to “divide away” your prior distributions before getting to his own analysis (p. 5). As in Gelman’s trial analogy (p. 5.), he prefers to combine your “raw data,” and your likelihoods, with his own well-considered background information. We can leave open whether he will compute posteriors (at least in the manner he recommends here) or not (as suggested in other work). So perhaps we have arrived at a sensible deconstruction of Gelman, free of contradiction. Whether or not this leaves texts open to some charge of disingenuity, I leave entirely to one side.

Now at this point I wonder: do Bayesian reports provide the ingredients for such “dividing away”?  I take it that they’d report the priors, which could be subtracted out, but how is the rest of the background knowledge communicated and used? It would seem to include assorted background knowledge of instruments, of claims that had been sufficiently well corroborated to count as knowledge, of information about that which was not previously well tested, of flaws and biases and threats of error to take into account in future designs, etc. (as in our ESP examples 9/22 and 9/25). The evidence for any background assumptions should also be made explicit and communicated (unless it consists of trivial common knowledge). Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics | 12 Comments

Blog at WordPress.com.