Statistics

U-Phils: Hennig and Aktunc on Gelman 2012

I am posting two U-Phils I received in relation to the 9/12 call call on Andrew Gelman’s (2012): “Ethics and the statistical use of prior information”

A Deconstruction of Gelman by Mayo in 3 parts:
(10/5/12) Part 1: “A Bayesian wants everybody else to be a non-Bayesian”
(10/7/12) Part 2: Using prior Information
(10/9/12) Part 3: beauty and the background knowledge

Comments on “How should prior information enter in statistical inference”

Christian Hennig 
Department of Statistical Science
University College London

Reading the blog entries on this topic, the Cox-Mayo Conversation and the linked paper by Gelman, I appreciate the valuable thoughts in both, which to me all make sense, specifying situations where prior information is rather not desired to enter, or rather in the Bayesian way.

Thinking more about the issue, however, I find both the frequentist and the Bayesian approach seriously wanting in this respect (and I don’t have a better one myself either).

A difference between the approaches seems to be that Cox/Mayo rather look at the analysis of data in an isolated situation whereas Gelman rather writes about conclusions from not only analysing a particular data set, but from aggregating all the information available.

Cox/Mayo do not advocate to ignore prior knowledge, but they prefer to keep it out of the process of actually analysing the data. Mayo talks of a piecemeal approach in which results from different data analyses can be put together in order to get an overall picture. Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics, Testing Assumptions, U-Phil | 11 Comments

Last part (3) of the deconstruction: beauty and background knowledge

Please see parts 1 and 2 and links therein. The background began in my Sept 12 post.

Gelman (2012) considers a case where the overall available evidence, E, is at odds with the indication of the results x from a given study:

Consider the notorious study in which a random sample of a few thousand people was analyzed, and it was found that the most beautiful parents were 8 percentage points more likely to have girls, compared to less attractive parents. The result was statistically significant (p<.05) and published in a reputable journal. But in this case we have good prior information suggesting that the difference in sex ratios in the population, comparing beautiful to less-beautiful parents, is less than 1 percentage point. A (non-Bayesian) design analysis reveals that, with this level of true difference, any statistically-significant observed difference in the sample is likely to be noise. At this point, you might well say that the original analysis should never have been done at all—but, given that it has been done, it is essential to use prior information (even if not in any formal Bayesian way) to interpret the data and generalize from sample to population.

Where did Fisher’s principle go wrong here? The answer is simple—and I think Cox would agree with me here. We’re in a setting where the prior information is much stronger than the data. (p. 3)

Let me simply grant Gelman that this prior information warrants (with severity) the hypothesis H:

H: “difference in sex ratios in the population, comparing beautiful to less-beautiful parents, is less than 1 percentage point,” (ibid.)

especially given my suspicions of the well-testedness of claims to show the effects of “beautiful to less-beautiful” on anything. I will simply take it as a given that it is well-tested background “knowledge.” Presumably, the well-tested claim goes beyond those individuals observed, and is generalizing at least to some degree. So we are given that the hypothesis H is one for which there is strong evidence. Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics, U-Phil | 14 Comments

Deconstructing Gelman part 2: Using prior Information

(Please see part 1 for links and references):

A Bayesian, Gelman tells us, “wants everybody else to be a non-Bayesian” (p. 5). Despite appearances, the claim need not be seen as self-contradictory, at least if we interpret it most generously, as Rule #2 (of this blog) directs. Whether or not “a Bayesian” refers to all Bayesians or only non-standard Bayesians (i.e., those wearing a hat of which Gelman approves), his meaning might be simply that when setting out with his own inquiry, he doesn’t want your favorite priors (be that beliefs or formally derived constructs) getting in the way. A Bayesian, says Gelman (in this article) is going to make inferences based on “trying to extract information from the data” in order to determine what to infer or believe (substitute your preferred form of output) about some aspect of a population (or mechanism) generating the data, as modeled. He just doesn’t want the “information from the data” muddied by your particular background knowledge. He would only have to subtract out all of this “funny business” to get at your likelihoods. He would only have to “divide away” your prior distributions before getting to his own analysis (p. 5). As in Gelman’s trial analogy (p. 5.), he prefers to combine your “raw data,” and your likelihoods, with his own well-considered background information. We can leave open whether he will compute posteriors (at least in the manner he recommends here) or not (as suggested in other work). So perhaps we have arrived at a sensible deconstruction of Gelman, free of contradiction. Whether or not this leaves texts open to some charge of disingenuity, I leave entirely to one side.

Now at this point I wonder: do Bayesian reports provide the ingredients for such “dividing away”?  I take it that they’d report the priors, which could be subtracted out, but how is the rest of the background knowledge communicated and used? It would seem to include assorted background knowledge of instruments, of claims that had been sufficiently well corroborated to count as knowledge, of information about that which was not previously well tested, of flaws and biases and threats of error to take into account in future designs, etc. (as in our ESP examples 9/22 and 9/25). The evidence for any background assumptions should also be made explicit and communicated (unless it consists of trivial common knowledge). Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics | 12 Comments

Deconstructing Gelman, Part 1: “A Bayesian wants everybody else to be a non-Bayesian.”

I was to have philosophically deconstructed a few paragraphs from (the last couple of sections) in a column Andrew Gelman sent me on “Ethics and the statistical use of prior information”[i]. The discussion begins with my Sept 12 post, and follows through several posts over the second half of September (see [ii]), all by way of background. But I got called away before finishing the promised deconstruction, and it was only this evening that I tried to wade through a circuitous swamp of remarks. I will just post the first part (of 2 or perhaps 3?), which is already too long.

Since I have a tendency to read articles from back to front, on a first read at least, let me begin with his last section titled:  “A Bayesian wants everybody else to be a non-Bayesian.”  Surely that calls for philosophical deconstruction, if anything does. It seems at the very least an exceptional view. Whether it’s widely held I can’t say (please advise). But suppose it’s true: Bayesians are publicly calling on everybody to use Bayesian methods, even though, deep down, they really, really hope everybody else won’t blend everything together before they can use the valid parts from the data—and they really, really hope that everybody else will provide the full panoply of information about what happened in other experiments, and what background theories are well corroborated, and about the precision of the instruments relied upon, and about other experiments that appear to conflict with the current one and with each other, etc., etc. Suppose that Bayesians actually would prefer, and are relieved to find, that, despite their exhortations, “everybody else” doesn’t report their posterior probabilities (whichever version of Bayesianism they are using) because then they can introduce their own background and figure out what is and is not warranted (in whatever sense seems appropriate).

At first glance, I am tempted to say that I don’t think Gelman really believes this statement himself if it were taken literally. Since he calls himself a Bayesian, at least of a special sort, then if he is wearing his Bayesian hat when he advocates others be non-Bayesian, then the practice of advocating others be non-Bayesian would itself be a Bayesian practice (not a non-Bayesian practice). But we philosophers know the danger of suggesting that authors under our scrutiny do not mean what they say—we may be missing their meaning and interpreting their words in a manner that is implausible. Though we may think, through our flawed interpretation, that they cannot possibly mean what they say, what we have done is substitute a straw view for the actual view (the straw man fallacy). (Note: You won’t get that I am mirroring Gelman unless you look at the article that began this deconstruction here.) Rule #2 of this blog[iii] is to interpret any given position in the most generous way possible; to do otherwise is to weaken our critical evaluation of it. This requires that we try to imagine a plausible reading, taking into account valid background information (e.g., other writings) that might bolster plausibility. This, at any rate, is what we teach our students in philosophy. So to begin with, what does Gelman actually say in the passage (in Section 4)?

“Bayesian inference proceeds by taking the likelihoods from different data sources and then combining them with a prior distribution (or, more generally, a hierarchical model). The likelihood is key. . . . No funny stuff, no posterior distributions, just the likelihood. . . . I don’t want everybody coming to me with their posterior distribution—I’d just have to divide away their prior distributions before getting to my own analysis. Sort of like a trial, where the judge wants to hear what everybody saw—not their individual inferences, but their raw data.” (p.5)

So if this is what he means by being a non-Bayesian, then his assertion that “a Bayesian wants everybody else to be a non-Bayesian” seems to mean that Bayesians want others to basically report their likelihoods. But again, if Gelman is wearing his Bayesian hat when he advocates others not wear theirs, i.e., be non-Bayesian, then his advising that everybody else not be Bayesian (in the sense of not combining priors and likelihoods), is itself a Bayesian practice (not a non-Bayesian practice). So either Gelman is not wearing his Bayesian hat when he recommends this, or his claim is self-contradictory—and I certainly do not want to attribute an inconsistent position to him. Moreover, I am quite certain that he would not advance any such inconsistent position.

Now, I do have some background knowledge. To ignore it is to fail to supply the most generous interpretation. Our background information—that is, Gelman’s (2011) RMM paper [iv]—tells me that he rejects the classic inductive philosophy that he has (correctly) associated with the definition of Bayesianism found on Wikipedia:

“Our key departure from the mainstream Bayesian view (as expressed, for example, [in Wikipedia]) is that we do not attempt to assign posterior probabilities to models or to select or average over them using posterior probabilities. Instead, we use predictive checks to compare models to data and use the information thus learned about anomalies to motivate model improvements” (p. 71).

So now Gelman’s assertion that “a Bayesian wants everybody else to be a non-Bayesian” makes sense and is not self-contradictory. Bayesian, in the term non-Bayesian, would mean something like a standard inductive Bayesian (where priors can be subjective or non-subjective). Gelman’s non-standard Bayesian wants everybody else not to be standard inductive Bayesians, but rather, something more akin to a likelihoodist. (I don’t know whether he wants only the likelihoods rather than the full panoply of background information, but I will return to this.) If Gelman’s Bayesian is not going to assign posterior probabilities to models, or select or average over them using posterior probabilities, then it’s pretty clear he will not find it useful to hear a report of your posterior probabilities. To allude to his trial analogy, the judge surely doesn’t want to hear your posterior probability in Ralph’s guilt, if he doesn’t even think it’s the proper way of couching inferences. Perhaps the judge finds it essential to know whether mistaken judgments of the pieces of evidence surrounding Ralph’s guilt have been well or poorly ruled out.That would be to require an error probabilistic assessment.

But a question might be raised: By “a Bayesian,” doesn’t Gelman clearly mean Bayesians in general, and not just one? And if he means all Bayesians, it would be wrong to think, as I have, that he was alluding to non-standard Bayesians (i.e., those wearing a hat of which Gelman approves). But there is no reason to suppose he means all Bayesians rather than all Bayesians who reject standard, Wiki-style Bayesianism, but instead favor something closer to the view in Gelman 2011, among other places.

Having gotten this far, however, I worry about using the view in Gelman 2011 to deconstruct the passages in the current article, in which, speaking of a Bayesian combining prior distributions and likelihoods, Gelman sounds more like a standard Bayesian. It would not help that he may be alluding to Bayesians in general for purposes of the article, because it is in this article that we find the claim: “A Bayesian wants everybody else to be a non-Bayesian.” So despite my attempts to sensibly deconstruct him, it appears that we are back to the initial problem, in which his claim that a Bayesian wants everybody else to be a non-Bayesian looks self-contradictory or at best disingenuous—and this in a column on ethics in statistics!

But we are not necessarily led to that conclusion!  Stay tuned for part 2, and part 3…..

(On how to do a philosophical analysis see here.)

[i]Gelman, A. “Ethics and the statistical use of prior information”

[ii] The main posts, following the first one, were:

More on using background info (9/15/12)
Statistics and ESP research (Diaconis) (9/22/12)
Insevere tests and pseudoscience (9/25/12)
Levels of inquiry (9/26/12)

[iii] This the Philosopher’s rule of “generous interpretation”, first introduced in this post.

[iv] Gelman, A. (2011).  “Induction and Deduction in Bayesian Data Analysis“, Rationality,  Markets, and Morals (RMM) 2, 67-78.

Categories: Background knowledge, Philosophy of Statistics, Statistics | 2 Comments

PhilStatLaw: Infections in the court

Nathan Schachtman appropriately refers to the way in which “dicta infects Daubert” in his latest blogpost Siracusano Dicta Infects Daubert Decisions. Here the “dicta” (or dictum?) is a throwaway remark on (lack of) statistical significance and causal inference by the Supreme Court, in an earlier case involving the drug company Matrixx (Matrixx Initiatives, Inc. v. Siracusano). As I note in my post of last Feb,

“the ruling had nothing to do with what’s required to show cause and effect, but only what information a company is required to reveal to its shareholders in order not to mislead them (as regards information that could be of relevance to them in their cost-benefit assessments of the stock’s value and future price).”(See “Distortions in the Court”)

obiter dicta

  1. A judge’s incidental expression of opinion, not essential to the decision and not establishing precedent.
  2. An incidental remark.

It was already surprising that the Supreme Court took up that earlier case; the way they handled the irrelevant statistical issues was more so. Continue reading

Categories: PhilStatLaw, Statistics | Tags: , , , , | 5 Comments

Stephen Senn: On the (ir)relevance of stopping rules in meta-analysis

Senn in China

Stephen Senn

Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg

George Barnard has had an important influence on the way I think about statistics. It was hearing him lecture in Aberdeen (I think) in the early 1980s (I think) on certain problems associated with Neyman confidence intervals that woke me to the problem of conditioning. Later as a result of a lecture he gave to the International Society of Clinical Biostatistics meeting in Innsbruck in 1988 we began a correspondence that carried on at irregular intervals until 2000. I continue to have reasons to be grateful for the patience an important and senior theoretical statistician showed to a junior and obscure applied one.

One of the things Barnard was adamant about was that you had to look at statistical problems with various spectacles. This is what I propose to do here, taking as an example meta-analysis. Suppose that it is the case that a meta-analyst is faced with a number of trials in a given field and that these trials have been carried out sequentially. In fact, to make the problem both simpler and more acute, suppose that no stopping rule adjustments have been made. Suppose, unrealistically, that each trial has identical planned maximum size but that a single interim analysis is carried out after a fraction f of information has been collected. For simplicity we suppose this fraction f to be the same for every trial. The questions is ‘should the meta-analyst ignore the stopping rule employed’? The answer is ‘yes’ or ‘no’ depending on how (s)he combines the information and, interestingly, this is not a question of whether the meta-analyst is Bayesian or not. Continue reading

Categories: Philosophy of Statistics, Statistics | Tags: , , , | 2 Comments

Levels of Inquiry

levels: data-statistics-theory

Many fallacious uses of statistical methods result from supposing that the statistical inference licenses a jump to a substantive claim that is ‘on a different level’ from a statistical one being probed. Given the familiar refrain that statistical significance is not substantive significance, it may seem surprising how often criticisms of significance tests depend on running the two together! But it is not just two, but a great many levels that need distinguishing linking collecting, modeling and analyzing data to a variety of substantive claims of inquiry (though for simplicity I often focus on the three depicted, described in various ways).

A question that continues to arise revolves around a blurring of levels, and is behind my recent ESP post.  It goes roughly like this:

If we are prepared to take a statistically significant proportion of successes (greater than .5) in n Binomial trials as grounds for inferring a real (better than chance) effect (perhaps of two teaching methods) but not as grounds for inferring Uri’s ESP (at guessing outcomes, say), then aren’t we implicitly invoking a difference in prior probabilities?  The answer is no, but there are two very different points to be made:

First, merely finding evidence of a non-chance effect is at a different “level” from a subsequent question about the explanation or cause of a non-chance effect. To infer from the former to the latter is an example of a fallacy of rejection.[1] The nature and threats of error in the hypothesis about a specific cause of an effect are very different from those in merely inferring a real effect. There are distinct levels of inquiry and distinct errors at each given level. The severity analysis for the respective claims makes this explicit.[ii] Even a test that did a good job distinguishing and ruling out threats to a hypothesis of “mere chance” would not thereby have probed errors about specific causes or potential explanations. Nor does an “isolated record” of  statistically significant results suffice. Recall Fisher: “In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result”(1935, 14).  PSI researchers never managed to demonstrate this. Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics | 2 Comments

Statistics and ESP research (Diaconis)

In the early ‘80s, fresh out of graduate school, I persuaded Persi Diaconis, Jack Good, and Patrick Suppes to participate in a session I wanted to organize on ESP and statistics. It seems remarkable to me now—not only that they agreed to participate*, but the extent that PSI research was taken seriously at the time. It wasn’t much later that all the recurring errors and loopholes, and the persistent cheating self-delusion —despite earnest attempts to trigger and analyze the phenomena—would lead many nearly everyone to label PSI research a “degenerating programme” (in the Popperian-Lakatosian sense).

(Though I’d have to check names and dates, I seem to recall that the last straw was when some of the Stanford researchers were found guilty of (unconscious) fraud. Jack Good continued to be interested in the area, but less so, I think. I do not know about the others.)

It is interesting to see how background information enters into inquiry here. So, even though it’s late on a Saturday night, here’s a snippet from one of the papers that caught my interest in graduate school: Diaconis’s (1978) “Statistical Problems in ESP Research“, in Science, along with some critical “letters”

Summary. In search of repeatable ESP experiments, modern investigators are using more complex targets, richer and freer responses, feedback, and more naturalistic conditions. This makes tractable statistical models less applicable. Moreover, controls often are so loose that no valid statistical analysis is possible. Some common problems are multiple end points, subject cheating, and unconscious sensory cueing. Unfortunately, such problems are hard to recognize from published records of the experiments in which they occur; rather, these problems are often uncovered by reports of independent skilled observers who were present during the experiment. This suggests that magicians and psychologists be regularly used as observers. New statistical ideas have been developed for some of the new experiments. For example, many modern ESP studies provide subjects with feedback—partial information about previous guesses—to reward the subjects for correct guesses in hope of inducing ESP learning. Some feedback experiments can be analyzed with the use of skill-scoring, a statistical procedure that depends on the information available and the way the guessing subject uses this information. (p. 131) Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics | 13 Comments

More on using background info

For the second* bit of background on the use of background info (for the new U-Phil for 9/21/12 9/25/12, I’ll reblog:

Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs

…I am discovering that one of the biggest sources of confusion about the foundations of statistics has to do with what it means or should mean to use “background knowledge” and “judgment” in making statistical and scientific inferences. David Cox and I address this in our “Conversation” in RMM (2011)….

Insofar as humans conduct science and draw inferences, and insofar as learning about the world is not reducible to a priori deductions, it is obvious that “human judgments” are involved. True enough, but too trivial an observation to help us distinguish among the very different ways judgments should enter according to contrasting inferential accounts. When Bayesians claim that frequentists do not use or are barred from using background information, what they really mean is that frequentists do not use prior probabilities of hypotheses, at least when those hypotheses are regarded as correct or incorrect, if only approximately. So, for example, we would not assign relative frequencies to the truth of hypotheses such as (1) prion transmission is via protein folding without nucleic acid, or (2) the deflection of light is approximately 1.75” (as if, as Pierce puts it, “universes were as plenty as blackberries”). How odd it would be to try to model these hypotheses as themselves having distributions: to us, statistical hypotheses assign probabilities to outcomes or values of a random variable. Continue reading

Categories: Background knowledge, philosophy of science, Philosophy of Statistics, Statistics | Tags: , | 21 Comments

Return to the comedy hour…(on significance tests)

These days, so many theater productions are updated reviews of older standards. Same with the comedy hours at the Bayesian retreat, and task force meetings of significance test reformers. So (on the 1-year anniversary of this blog) let’s listen in to one of the earliest routines (with highest blog hits), but with some new reflections (first considered here and here).

‘ “Did you hear the one about the frequentist . . .

“who claimed that observing “heads” on a biased coin that lands heads with probability .05 is evidence of a statistically significant improvement over the standard treatment of diabetes, on the grounds that such an event occurs with low probability (.05)?”

The joke came from J. Kadane’s Principles of Uncertainty (2011, CRC Press*).

 “Flip a biased coin that comes up heads with probability 0.95, and tails with probability 0.05.  If the coin comes up tails reject the null hypothesis.  Since the probability of rejecting the null hypothesis if it is true is 0.05, this is a valid 5% level test.  It is also very robust against data errors; indeed it does not depend on the data at all.  It is also nonsense, of course, but nonsense allowed by the rules of significance testing.” (439)

Much laughter.

___________________

But is it allowed?  I say no. The null hypothesis in the joke can be in any field, perhaps it concerns mean transmission of Scrapie in mice (as in my early Kuru post).  I know some people view significance tests as merely rules that rarely reject erroneously, but I claim this is mistaken. Both in significance tests and in scientific hypothesis testing more generally, data indicate inconsistency with H only by being counter to what would be expected under the assumption that H is correct (as regards a given aspect observed). Were someone to tell Prusiner that the testing methods he follows actually allow any old “improbable” event (a stock split in Apple?) to reject a hypothesis about prion transmission rates, Prusiner would say that person didn’t understand the requirements of hypothesis testing in science. Since the criticism would hold no water in the analogous case of Prusiner’s test, it must equally miss its mark in the case of significance tests**.  That, recall, was Rule #1. Continue reading

Categories: Comedy, Philosophy of Statistics, Statistics | Tags: , , , | 8 Comments

Stephen Senn: The nuisance parameter nuisance

Senn in China

Stephen Senn

Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg

“The nuisance parameter nuisance”

 A great deal of statistical debate concerns ‘univariate’ error, or disturbance, terms in models. I put ‘univariate’ in inverted commas because as soon as one writes a model of the form (say) Yi =Xiβ + Єi, i = 1 … n and starts to raise questions about the distribution of the disturbance terms, Єi one is frequently led into multivariate speculations, such as, ‘is the variance identical for every disturbance term?’ and, ‘are the disturbance terms independent?’ and not just speculations such as, ‘is the distribution of the disturbance terms Normal?’. Aris Spanos might also want me to put inverted commas around ‘disturbance’ (or ‘error’) since what I ought to be thinking about is the joint distribution of the outcomes, Yi conditional on the predictors.

However, in my statistical world of planning and analysing clinical trials, the differences made to inferences according to whether one uses parametric versus non-parametric methods is often minor. Of course, using non-parametric methods does nothing to answer the problem of non-independent observations but for experiments, as opposed to observational studies, you can frequently design-in independence. That is a major potential pitfall avoided but then there is still the issue of Normality. However, in my experience, this is rarely where the action is. Inferences rarely change dramatically on using ‘robust’ approaches (although one can always find examples with gross-outliers where they do). However, there are other sorts of problem that can affect data which can make a very big difference. Continue reading

Categories: Philosophy of Statistics, Statistics | Tags: , , , | 3 Comments

After dinner Bayesian comedy hour….

Given it’s the first anniversary of this blog, which opened with the howlers in “Overheard at the comedy hour …” let’s listen in as a Bayesian holds forth on one of the most famous howlers of the lot: the mysterious role that psychological intentions are said to play in frequentist methods such as statistical significance tests. Here it is, essentially as I remember it (though shortened), in the comedy hour that unfolded at my dinner table at an academic conference:

 Did you hear the one about the researcher who gets a phone call from the guy analyzing his data? First the guy congratulates him and says, “The results show a statistically significant difference at the .05 level—p-value .048.” But then, an hour later, the phone rings again. It’s the same guy, but now he’s apologizing. It turns out that the experimenter intended to keep sampling until the result was 1.96 standard deviations away from the 0 null—in either direction—so they had to reanalyze the data (n=169), and the results were no longer statistically significant at the .05 level.

 Much laughter.

 So the researcher is tearing his hair out when the same guy calls back again. “Congratulations!” the guy says. “I just found out that the experimenter actually had planned to take n=169 all along, so the results are statistically significant.”

 Howls of laughter.

 But then the guy calls back with the bad news . . .

It turns out that failing to score a sufficiently impressive effect after n’ trials, the experimenter went on to n” trials, and so on and so forth until finally, say, on trial number 169, he obtained a result 1.96 standard deviations from the null.

It continues this way, and every time the guy calls in and reports a shift in the p-value, the table erupts in howls of laughter! From everyone except me, sitting in stunned silence, staring straight ahead. The hilarity ensues from the idea that the experimenter’s reported psychological intentions about when to stop sampling is altering the statistical results. Continue reading

Categories: Comedy, philosophy of science, Philosophy of Statistics, Statistics | Tags: , , , | 3 Comments

Failing to Apply vs Violating the Likelihood Principle

In writing a new chapter on the Strong Likelihood Principle [i] the past few weeks, I noticed a passage in G. Casella and R. Berger (2002) that in turn recalled a puzzling remark noted in my Jan. 3, 2012 post. The post began:

A question arose from a Bayesian acquaintance:

“Although the Birnbaum result is of primary importance for sampling theorists, I’m still interested in it because many Bayesian statisticians think that model checking violates the (strong) likelihood principle (SLP), as if this principle is a fundamental axiom of Bayesian statistics”.

But this is puzzling for two reasons. First, if the LP does not preclude testing for assumptions (and he is right that it does not[ii]), then why not simply explain that rather than appeal to a disproof of something that actually never precluded model testing?   To take the disproof of the LP as grounds to announce: “So there! Now even Bayesians are free to test their models” would seem only to ingrain the original fallacy.

You can read the rest of the original post here.

The remark in G. Casella and R. Berger seems to me equivocal on this point: Continue reading

Categories: Likelihood Principle, Philosophy of Statistics, Statistics | Tags: , , , | 2 Comments

“Did Higgs Physicists Miss an Opportunity by Not Consulting More With Statisticians?”

On August 20 I posted the start of  “Discussion and Digest” by Bayesian statistician Tony O’Hagan– an oveview of  responses to his letter (ISBA website) on the use of p-values in analyzing the Higgs data, prompted, in turn, by a query of subjective Bayesian Dennis Lindley.  I now post the final section in which he discusses his own view. I think it raises many  questions of interest both as regards this case, and more generally about statistics and science. My initial July 11 post is here.

“Higgs Boson – Digest and Discussion” By Tony O’Hagan

Discussion

So here are some of my own views on this.

There are good reasons for being cautious and demanding a very high standard of evidence before announcing something as momentous as H. It is acknowledged by those who use it that the 5-sigma standard is a fudge, though. They would surely be willing to make such an announcement if they were, for instance, 99.99% certain of H’s existence, as long as that 99.99% were rigorously justified. 5-sigma is used because they don’t feel able to quantify the probability of H rigorously. So they use the best statistical analysis that they know how to do, but because they also know there are numerous factors not taken into account by this analysis – the multiple testing, the likelihood of unrecognised or unquantified deficiencies in the data, experiment or statistics, and the possibility of other explanations – they ask for what on the face of it is an absurdly high level of significance from that analysis. Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics | Tags: , | 8 Comments

Higgs Boson: Bayesian “Digest and Discussion”

Professor  O’Hagan sent around (to the ISBA list ) his summary of the comments he received in response to his request for information about the use of p-values in in relation to the Higgs boson data. My original July 11 post including O’Hagan’s initial letter is here.  His “digest” begins:

Before going further, I should say that the wording of this message, including the somewhat inflammatory nature of some parts of it, was mine; I was not quoting Dennis Lindley directly. The wording was, though, quite deliberately intended to provoke discussion. In that objective it was successful – I received more than 30 substantive comments in reply. All of these were thoughtful and I learnt a great deal from them. I promised to construct a digest of the discussion. This document is that digest and a bit more – it includes some personal reflections on the issues. Continue reading

Categories: Philosophy of Statistics, Statistics | Tags: , , , | 1 Comment

A. Spanos: Egon Pearson’s Neglected Contributions to Statistics

Continuing with the discussion of E.S. Pearson:

Egon Pearson’s Neglected Contributions to Statistics

by Aris Spanos

    Egon Pearson (11 August 1895 – 12 June 1980), is widely known today for his contribution in recasting of Fisher’s significance testing into the Neyman-Pearson (1933) theory of hypothesis testing. Occasionally, he is also credited with contributions in promoting statistical methods in industry and in the history of modern statistics; see Bartlett (1981). What is rarely mentioned is Egon’s early pioneering work on:

(i) specification: the need to state explicitly the inductive premises of one’s inferences,

(ii) robustness: evaluating the ‘sensitivity’ of inferential procedures to departures from the Normality assumption, as well as

(iii) Mis-Specification (M-S) testing: probing for potential departures from the Normality  assumption. Continue reading

Categories: Philosophy of Statistics, Statistics | Tags: , , , , , , , | 6 Comments

E.S. Pearson’s Statistical Philosophy

E.S. Pearson on the gate,
D. Mayo sketch

Egon Sharpe (E.S.) Pearson’s birthday was August 11.  This slightly belated birthday discussion is directly connected to the question of the uses to which frequentist methods may be put in inquiry.  Are they limited to supplying procedures which will not err too frequently in some vast long run? Or are these long run results of crucial importance for understanding and learning about the underlying causes in the case at hand?   I say no to the former and yes to the latter.  This was also the view of Egon Pearson (of Neyman and Pearson).

(i) Cases of Type A and Type B

“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)

Pearson considers the rationale that might be given to N-P tests in two types of cases, A and B:

“(A) At one extreme we have the case where repeated decisions must be made on results obtained from some routine procedure…

(B) At the other is the situation where statistical tools are applied to an isolated investigation of considerable importance…?” (ibid., 170)

In cases of type A, long-run results are clearly of interest, while in cases of type B, repetition is impossible and may be irrelevant:

“In other and, no doubt, more numerous cases there is no repetition of the same type of trial or experiment, but all the same we can and many of us do use the same test rules to guide our decision, following the analysis of an isolated set of numerical data. Why do we do this? What are the springs of decision? Is it because the formulation of the case in terms of hypothetical repetition helps to that clarity of view needed for sound judgment? Continue reading

Categories: Philosophy of Statistics, Statistics | Tags: , , | 2 Comments

U-Phil: (concluding the deconstruction) Wasserman / Mayo

It is traditional to end the U-Phil deconstruction discussion with the author’s remarks on the deconstruction itself.  I take this from Wasserman’s initial comment on 7/28/12, and my brief reply. I especially want to highlight the question of goals that arises.

Wasserman:

I thank Deborah Mayo for deconstructing me and Al Franken. (And for the record, I couldn’t be further from Franken politically; I just liked his joke.)

I have never been deconstructed before. I feel a bit like Humpty Dumpty. Anyway, I think I agree with everything Deborah wrote. I’ll just clarify two points.

First, my main point was just that the cutting edge of statistics today is dealing with complex, high-dimensional data. My essay was an invitation to Philosophers to turn their analytical skills towards the problems that arise in these modern statistical problems.

Deborah wonders whether these are technical rather than foundational issues. I don’t know. When physicists went from studying medium sized, slow-moving objects to studying the very small, the very fast and the very massive, they found a plethora of interesting questions, both technical and foundational. Perhaps inference for high-dimensional, complex data can also serve as a venue for both both technical and foundational questions.

Second, I downplayed the Bayes-Frequentist perhaps more than I should have. Indeed, this debate still persists. But I also feel that only a small subset of statisticians care about the debate (because, they do what they were taught to do, without questioning it) and those that do care, will never be swayed by debate. The way I see it is that there are basically two goals:

  • Goal 1: Find ways to quantify your subjective degrees of belief.
  • Goal 2: Find procedures with good frequency properties. Continue reading
Categories: Statistics | Tags: , , , , | Leave a comment

U-PHIL: Wasserman Replies to Spanos and Hennig

Wasserman on Spanos and Hennig on  “Low Assumptions, High Dimensions” (2011)

(originating U-PHIL : “Deconstructing Larry Wasserman” by Mayo )

________

Thanks to Aris and others for comments .

Response to Aris Spanos:

1. You don’t prefer methods based on weak assumptions? Really? I suspect Aris is trying to be provocative. Yes such inferences can be less precise. Good. Accuracy is an illusion if it comes from assumptions, not from data.

2. I do not think I was promoting inferences based on “asymptotic grounds.” If I did, that was not my intent. I want finite sample, distribution free methods. As an example, consider the usual finite sample (order statistics based) confidence interval for the median. No regularity assumptions, no asymptotics, no approximations. What is there to object to?

3. Indeed, I do have to make some assumptions. For simplicity, and because it is often reasonable, I assumed iid in the paper (as I will here). Other than that, where am I making any untestable assumptions in the example of the median?

4. I gave a very terse and incomplete summary of Davies’ work. I urge readers to look at Davies’ papers; my summary does not do the work justice. He certainly did not advocate eyeballing the data. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 3 Comments

E.S. Pearson Birthday

Egon Pearson on a Gate (by D. Mayo)

Today is Egon Pearson’s birthday, but I will postpone some discussion of his work for a few days. He is, as Erich Lehmann noted in his review of EGEK (1996)[i]*, “the hero of Mayo’s story” because one may find throughout his work, if only in side discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of Neyman-Pearson theory of statistics.  Pearson and Pearson statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect.[i]


[i] Mayo (1996), Error and the Growth of Experimental Knowledge.

*If you have items relating to E.S. Pearson you think might be relevant for this blog, please send them to: error@vt.edu until the end of August.

Categories: Statistics | Tags: , , | Leave a comment

Blog at WordPress.com.