Deconstructing Gelman part 2: Using prior Information

(Please see part 1 for links and references):

A Bayesian, Gelman tells us, “wants everybody else to be a non-Bayesian” (p. 5). Despite appearances, the claim need not be seen as self-contradictory, at least if we interpret it most generously, as Rule #2 (of this blog) directs. Whether or not “a Bayesian” refers to all Bayesians or only non-standard Bayesians (i.e., those wearing a hat of which Gelman approves), his meaning might be simply that when setting out with his own inquiry, he doesn’t want your favorite priors (be that beliefs or formally derived constructs) getting in the way. A Bayesian, says Gelman (in this article) is going to make inferences based on “trying to extract information from the data” in order to determine what to infer or believe (substitute your preferred form of output) about some aspect of a population (or mechanism) generating the data, as modeled. He just doesn’t want the “information from the data” muddied by your particular background knowledge. He would only have to subtract out all of this “funny business” to get at your likelihoods. He would only have to “divide away” your prior distributions before getting to his own analysis (p. 5). As in Gelman’s trial analogy (p. 5.), he prefers to combine your “raw data,” and your likelihoods, with his own well-considered background information. We can leave open whether he will compute posteriors (at least in the manner he recommends here) or not (as suggested in other work). So perhaps we have arrived at a sensible deconstruction of Gelman, free of contradiction. Whether or not this leaves texts open to some charge of disingenuity, I leave entirely to one side.

Now at this point I wonder: do Bayesian reports provide the ingredients for such “dividing away”?  I take it that they’d report the priors, which could be subtracted out, but how is the rest of the background knowledge communicated and used? It would seem to include assorted background knowledge of instruments, of claims that had been sufficiently well corroborated to count as knowledge, of information about that which was not previously well tested, of flaws and biases and threats of error to take into account in future designs, etc. (as in our ESP examples 9/22 and 9/25). The evidence for any background assumptions should also be made explicit and communicated (unless it consists of trivial common knowledge). Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics | 12 Comments

Deconstructing Gelman, Part 1: “A Bayesian wants everybody else to be a non-Bayesian.”

I was to have philosophically deconstructed a few paragraphs from (the last couple of sections) in a column Andrew Gelman sent me on “Ethics and the statistical use of prior information”[i]. The discussion begins with my Sept 12 post, and follows through several posts over the second half of September (see [ii]), all by way of background. But I got called away before finishing the promised deconstruction, and it was only this evening that I tried to wade through a circuitous swamp of remarks. I will just post the first part (of 2 or perhaps 3?), which is already too long.

Since I have a tendency to read articles from back to front, on a first read at least, let me begin with his last section titled:  “A Bayesian wants everybody else to be a non-Bayesian.”  Surely that calls for philosophical deconstruction, if anything does. It seems at the very least an exceptional view. Whether it’s widely held I can’t say (please advise). But suppose it’s true: Bayesians are publicly calling on everybody to use Bayesian methods, even though, deep down, they really, really hope everybody else won’t blend everything together before they can use the valid parts from the data—and they really, really hope that everybody else will provide the full panoply of information about what happened in other experiments, and what background theories are well corroborated, and about the precision of the instruments relied upon, and about other experiments that appear to conflict with the current one and with each other, etc., etc. Suppose that Bayesians actually would prefer, and are relieved to find, that, despite their exhortations, “everybody else” doesn’t report their posterior probabilities (whichever version of Bayesianism they are using) because then they can introduce their own background and figure out what is and is not warranted (in whatever sense seems appropriate).

At first glance, I am tempted to say that I don’t think Gelman really believes this statement himself if it were taken literally. Since he calls himself a Bayesian, at least of a special sort, then if he is wearing his Bayesian hat when he advocates others be non-Bayesian, then the practice of advocating others be non-Bayesian would itself be a Bayesian practice (not a non-Bayesian practice). But we philosophers know the danger of suggesting that authors under our scrutiny do not mean what they say—we may be missing their meaning and interpreting their words in a manner that is implausible. Though we may think, through our flawed interpretation, that they cannot possibly mean what they say, what we have done is substitute a straw view for the actual view (the straw man fallacy). (Note: You won’t get that I am mirroring Gelman unless you look at the article that began this deconstruction here.) Rule #2 of this blog[iii] is to interpret any given position in the most generous way possible; to do otherwise is to weaken our critical evaluation of it. This requires that we try to imagine a plausible reading, taking into account valid background information (e.g., other writings) that might bolster plausibility. This, at any rate, is what we teach our students in philosophy. So to begin with, what does Gelman actually say in the passage (in Section 4)?

“Bayesian inference proceeds by taking the likelihoods from different data sources and then combining them with a prior distribution (or, more generally, a hierarchical model). The likelihood is key. . . . No funny stuff, no posterior distributions, just the likelihood. . . . I don’t want everybody coming to me with their posterior distribution—I’d just have to divide away their prior distributions before getting to my own analysis. Sort of like a trial, where the judge wants to hear what everybody saw—not their individual inferences, but their raw data.” (p.5)

So if this is what he means by being a non-Bayesian, then his assertion that “a Bayesian wants everybody else to be a non-Bayesian” seems to mean that Bayesians want others to basically report their likelihoods. But again, if Gelman is wearing his Bayesian hat when he advocates others not wear theirs, i.e., be non-Bayesian, then his advising that everybody else not be Bayesian (in the sense of not combining priors and likelihoods), is itself a Bayesian practice (not a non-Bayesian practice). So either Gelman is not wearing his Bayesian hat when he recommends this, or his claim is self-contradictory—and I certainly do not want to attribute an inconsistent position to him. Moreover, I am quite certain that he would not advance any such inconsistent position.

Now, I do have some background knowledge. To ignore it is to fail to supply the most generous interpretation. Our background information—that is, Gelman’s (2011) RMM paper [iv]—tells me that he rejects the classic inductive philosophy that he has (correctly) associated with the definition of Bayesianism found on Wikipedia:

“Our key departure from the mainstream Bayesian view (as expressed, for example, [in Wikipedia]) is that we do not attempt to assign posterior probabilities to models or to select or average over them using posterior probabilities. Instead, we use predictive checks to compare models to data and use the information thus learned about anomalies to motivate model improvements” (p. 71).

So now Gelman’s assertion that “a Bayesian wants everybody else to be a non-Bayesian” makes sense and is not self-contradictory. Bayesian, in the term non-Bayesian, would mean something like a standard inductive Bayesian (where priors can be subjective or non-subjective). Gelman’s non-standard Bayesian wants everybody else not to be standard inductive Bayesians, but rather, something more akin to a likelihoodist. (I don’t know whether he wants only the likelihoods rather than the full panoply of background information, but I will return to this.) If Gelman’s Bayesian is not going to assign posterior probabilities to models, or select or average over them using posterior probabilities, then it’s pretty clear he will not find it useful to hear a report of your posterior probabilities. To allude to his trial analogy, the judge surely doesn’t want to hear your posterior probability in Ralph’s guilt, if he doesn’t even think it’s the proper way of couching inferences. Perhaps the judge finds it essential to know whether mistaken judgments of the pieces of evidence surrounding Ralph’s guilt have been well or poorly ruled out.That would be to require an error probabilistic assessment.

But a question might be raised: By “a Bayesian,” doesn’t Gelman clearly mean Bayesians in general, and not just one? And if he means all Bayesians, it would be wrong to think, as I have, that he was alluding to non-standard Bayesians (i.e., those wearing a hat of which Gelman approves). But there is no reason to suppose he means all Bayesians rather than all Bayesians who reject standard, Wiki-style Bayesianism, but instead favor something closer to the view in Gelman 2011, among other places.

Having gotten this far, however, I worry about using the view in Gelman 2011 to deconstruct the passages in the current article, in which, speaking of a Bayesian combining prior distributions and likelihoods, Gelman sounds more like a standard Bayesian. It would not help that he may be alluding to Bayesians in general for purposes of the article, because it is in this article that we find the claim: “A Bayesian wants everybody else to be a non-Bayesian.” So despite my attempts to sensibly deconstruct him, it appears that we are back to the initial problem, in which his claim that a Bayesian wants everybody else to be a non-Bayesian looks self-contradictory or at best disingenuous—and this in a column on ethics in statistics!

But we are not necessarily led to that conclusion!  Stay tuned for part 2, and part 3…..

(On how to do a philosophical analysis see here.)

[i]Gelman, A. “Ethics and the statistical use of prior information”

[ii] The main posts, following the first one, were:

More on using background info (9/15/12)
Statistics and ESP research (Diaconis) (9/22/12)
Insevere tests and pseudoscience (9/25/12)
Levels of inquiry (9/26/12)

[iii] This the Philosopher’s rule of “generous interpretation”, first introduced in this post.

[iv] Gelman, A. (2011).  “Induction and Deduction in Bayesian Data Analysis“, Rationality,  Markets, and Morals (RMM) 2, 67-78.

Categories: Background knowledge, Philosophy of Statistics, Statistics | 2 Comments

Metablog: Rejected posts (blog within a blog)

I’ve been speculating for awhile on the idea of creating a blog within a blog, and now it exists. From now on items under “rejected posts” (on any topic including phil stat), “msc kvetches”, “phil stock” and assorted other irrelevant, irreverent, absurd, or dangerous meanderings that I feel like writing, will all be banished to: http://rejectedpostsofdmayo.com/

I am not recommending it, and in all likelihood will only announce additions to it under the “rejected posts” page on this blog, if that.  I’m guessing that readers haven’t even noticed that all the entries under the pages Msc Kvetchs, Rejected posts, and others, have been stripped from this blog. Most, but not all, made it over the very low hurdle of the official “rejected posts” blog (others were rejected, by me, from even that).
Of course, it’s just like a regular wordpress blog with its usual features.
Categories: Announcement, Metablog | Tags: , , , | Leave a comment

PhilStatLaw: Infections in the court

Nathan Schachtman appropriately refers to the way in which “dicta infects Daubert” in his latest blogpost Siracusano Dicta Infects Daubert Decisions. Here the “dicta” (or dictum?) is a throwaway remark on (lack of) statistical significance and causal inference by the Supreme Court, in an earlier case involving the drug company Matrixx (Matrixx Initiatives, Inc. v. Siracusano). As I note in my post of last Feb,

“the ruling had nothing to do with what’s required to show cause and effect, but only what information a company is required to reveal to its shareholders in order not to mislead them (as regards information that could be of relevance to them in their cost-benefit assessments of the stock’s value and future price).”(See “Distortions in the Court”)

obiter dicta

  1. A judge’s incidental expression of opinion, not essential to the decision and not establishing precedent.
  2. An incidental remark.

It was already surprising that the Supreme Court took up that earlier case; the way they handled the irrelevant statistical issues was more so. Continue reading

Categories: PhilStatLaw, Statistics | Tags: , , , , | 5 Comments

Letter from George (Barnard)

George Barnard sent me this note on hearing of my Lakatos Prize. He was to have been at my Lakatos dinner at the LSE (March 17, 1999)—winners are permitted to invite ~2-3 guests—but he called me at the LSE at the last minute to say he was too ill to come to London.  Instead we had a long talk on the phone the next day, which I can discuss at some point.

Categories: phil/history of stat | Tags: , | Leave a comment

Stephen Senn: On the (ir)relevance of stopping rules in meta-analysis

Senn in China

Stephen Senn

Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg

George Barnard has had an important influence on the way I think about statistics. It was hearing him lecture in Aberdeen (I think) in the early 1980s (I think) on certain problems associated with Neyman confidence intervals that woke me to the problem of conditioning. Later as a result of a lecture he gave to the International Society of Clinical Biostatistics meeting in Innsbruck in 1988 we began a correspondence that carried on at irregular intervals until 2000. I continue to have reasons to be grateful for the patience an important and senior theoretical statistician showed to a junior and obscure applied one.

One of the things Barnard was adamant about was that you had to look at statistical problems with various spectacles. This is what I propose to do here, taking as an example meta-analysis. Suppose that it is the case that a meta-analyst is faced with a number of trials in a given field and that these trials have been carried out sequentially. In fact, to make the problem both simpler and more acute, suppose that no stopping rule adjustments have been made. Suppose, unrealistically, that each trial has identical planned maximum size but that a single interim analysis is carried out after a fraction f of information has been collected. For simplicity we suppose this fraction f to be the same for every trial. The questions is ‘should the meta-analyst ignore the stopping rule employed’? The answer is ‘yes’ or ‘no’ depending on how (s)he combines the information and, interestingly, this is not a question of whether the meta-analyst is Bayesian or not. Continue reading

Categories: Philosophy of Statistics, Statistics | Tags: , , , | 2 Comments

Levels of Inquiry

levels: data-statistics-theory

Many fallacious uses of statistical methods result from supposing that the statistical inference licenses a jump to a substantive claim that is ‘on a different level’ from a statistical one being probed. Given the familiar refrain that statistical significance is not substantive significance, it may seem surprising how often criticisms of significance tests depend on running the two together! But it is not just two, but a great many levels that need distinguishing linking collecting, modeling and analyzing data to a variety of substantive claims of inquiry (though for simplicity I often focus on the three depicted, described in various ways).

A question that continues to arise revolves around a blurring of levels, and is behind my recent ESP post.  It goes roughly like this:

If we are prepared to take a statistically significant proportion of successes (greater than .5) in n Binomial trials as grounds for inferring a real (better than chance) effect (perhaps of two teaching methods) but not as grounds for inferring Uri’s ESP (at guessing outcomes, say), then aren’t we implicitly invoking a difference in prior probabilities?  The answer is no, but there are two very different points to be made:

First, merely finding evidence of a non-chance effect is at a different “level” from a subsequent question about the explanation or cause of a non-chance effect. To infer from the former to the latter is an example of a fallacy of rejection.[1] The nature and threats of error in the hypothesis about a specific cause of an effect are very different from those in merely inferring a real effect. There are distinct levels of inquiry and distinct errors at each given level. The severity analysis for the respective claims makes this explicit.[ii] Even a test that did a good job distinguishing and ruling out threats to a hypothesis of “mere chance” would not thereby have probed errors about specific causes or potential explanations. Nor does an “isolated record” of  statistically significant results suffice. Recall Fisher: “In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result”(1935, 14).  PSI researchers never managed to demonstrate this. Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics | 2 Comments

Insevere tests and pseudoscience

Against the PSI skeptics of this period (discussed in my last post), defenders of PSI would often erect means to take experimental results as success stories (e.g., if he failed to correctly predict the next card, maybe he was aiming at the second or third card). If the data could not be made to fit some ESP claim or other (e.g., through multiple end points) it might, as a last resort, be explained away as due to negative energy of nonbelievers (or being on the Carson show). They manage to get their ESP hypothesis H to “pass,” but the “test” had little or no capability of finding (uncovering, admitting) the falsity of H, even if H is false. (This is the basis for my term “Gellerization”.) In such cases, I would deny that the results afford any evidence for H. They are terrible evidence for H. Now any domain will have some terrible tests, but a field that routinely passes off terrible tests as success stories I would deem pseudoscientific. 

We get a kind of minimal requirement for a test result to afford any evidence of assertion H, however partial and approximate H may be:  If a hypothesis H is assured of having* “passed” a test T, even if H is false, then test T is a terrible test or no test at all.**

Far from trying to reveal flaws, it masks them or prevents them from being uncovered. No one would be impressed to learn their bank had passed a “stress test” if it turns out that the test had little or no chance of giving a failing score to any bank, regardless of its ability to survive a stressed economy. (Would they?)

There are a million different ways to flesh out the idea, and I welcome hearing others. Now you might say that no one would disagree with this. Great. Because a core requirement for an adequate account of inquiry, as I see it, is that it be able to capture this rationale for pretty terrible evidence and fairly pseudoscientific inquiry– and it should do so in such a way that affords a starting point for not-so-awful tests, and rather reliable learning.

* or very probably would have passed.

**QUESTION: I seek your input: which sounds better, or is more accurate: saying a test T passes a hypothesis H, or that a hypothesis H passes a test T? I’ve used both and want to settle on one.

Categories: Error Statistics, philosophy of science | 5 Comments

Statistics and ESP research (Diaconis)

In the early ‘80s, fresh out of graduate school, I persuaded Persi Diaconis, Jack Good, and Patrick Suppes to participate in a session I wanted to organize on ESP and statistics. It seems remarkable to me now—not only that they agreed to participate*, but the extent that PSI research was taken seriously at the time. It wasn’t much later that all the recurring errors and loopholes, and the persistent cheating self-delusion —despite earnest attempts to trigger and analyze the phenomena—would lead many nearly everyone to label PSI research a “degenerating programme” (in the Popperian-Lakatosian sense).

(Though I’d have to check names and dates, I seem to recall that the last straw was when some of the Stanford researchers were found guilty of (unconscious) fraud. Jack Good continued to be interested in the area, but less so, I think. I do not know about the others.)

It is interesting to see how background information enters into inquiry here. So, even though it’s late on a Saturday night, here’s a snippet from one of the papers that caught my interest in graduate school: Diaconis’s (1978) “Statistical Problems in ESP Research“, in Science, along with some critical “letters”

Summary. In search of repeatable ESP experiments, modern investigators are using more complex targets, richer and freer responses, feedback, and more naturalistic conditions. This makes tractable statistical models less applicable. Moreover, controls often are so loose that no valid statistical analysis is possible. Some common problems are multiple end points, subject cheating, and unconscious sensory cueing. Unfortunately, such problems are hard to recognize from published records of the experiments in which they occur; rather, these problems are often uncovered by reports of independent skilled observers who were present during the experiment. This suggests that magicians and psychologists be regularly used as observers. New statistical ideas have been developed for some of the new experiments. For example, many modern ESP studies provide subjects with feedback—partial information about previous guesses—to reward the subjects for correct guesses in hope of inducing ESP learning. Some feedback experiments can be analyzed with the use of skill-scoring, a statistical procedure that depends on the information available and the way the guessing subject uses this information. (p. 131) Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics | 13 Comments

Barnard, background info/ intentions

G.A. Barnard: 23 Sept.1915 – 9 Aug.2002

G.A. Barnard’s birthday is 9/23, so, here’s a snippet of his discussion with Savage (1962) (link below [i]) that connects to our 2 recent issues: stopping rules, and background information here and here (at least of one type).

Barnard: I have been made to think further about this issue of the stopping rule since I first suggested that the stopping rule was irrelevant (Barnard 1947a,b). This conclusion does not follow only from the subjective theory of probability; it seems to me that the stopping rule is irrelevant in certain circumstances.  Since 1947 I have had the great benefit of a long correspondence—not many letters because they were not very frequent, but it went on over a long time—with Professor Bartlett, as a result of which I am considerably clearer than I was before. My feeling is that, as I indicated [on p. 42], we meet with two sorts of situation in applying statistics to data One is where we want to have a single hypothesis with which to confront the data. Do they agree with this hypothesis or do they not? Now in that situation you cannot apply Bayes’s theorem because you have not got any alternatives to think about and specify—not yet. I do not say they are not specifiable—they are not specified yet. And in that situation it seems to me the stopping rule is relevant. Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics | Leave a comment

More on using background info

For the second* bit of background on the use of background info (for the new U-Phil for 9/21/12 9/25/12, I’ll reblog:

Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs

…I am discovering that one of the biggest sources of confusion about the foundations of statistics has to do with what it means or should mean to use “background knowledge” and “judgment” in making statistical and scientific inferences. David Cox and I address this in our “Conversation” in RMM (2011)….

Insofar as humans conduct science and draw inferences, and insofar as learning about the world is not reducible to a priori deductions, it is obvious that “human judgments” are involved. True enough, but too trivial an observation to help us distinguish among the very different ways judgments should enter according to contrasting inferential accounts. When Bayesians claim that frequentists do not use or are barred from using background information, what they really mean is that frequentists do not use prior probabilities of hypotheses, at least when those hypotheses are regarded as correct or incorrect, if only approximately. So, for example, we would not assign relative frequencies to the truth of hypotheses such as (1) prion transmission is via protein folding without nucleic acid, or (2) the deflection of light is approximately 1.75” (as if, as Pierce puts it, “universes were as plenty as blackberries”). How odd it would be to try to model these hypotheses as themselves having distributions: to us, statistical hypotheses assign probabilities to outcomes or values of a random variable. Continue reading

Categories: Background knowledge, philosophy of science, Philosophy of Statistics, Statistics | Tags: , | 21 Comments

U-Phil (9/25/12) How should “prior information” enter in statistical inference?

Andrew Gelman, sent me an interesting note of his, “Ethics and the statistical use of prior information,”[i]. In section 3 he comments on some of David Cox’s remarks in a conversation we recorded:

 A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo, published in Rationality, Markets and Morals [iii] (Section 2 has some remarks on L. Wasserman.)

This was a part of a highly informal, frank, and entirely unscripted conversation, with minimal editing from the tape-recording [ii]. It was first posted on this blog on Oct. 19, 2011. A related, earlier discussion on Gelman’s blog is here.

I want to open this for your informal comments ( “U-Phil”, ~750 words,by September 21 25)[iv]. (send to error@vt.edu)

Before I give my own “deconstruction” of Gelman on the relevant section, I will post a bit of background to the question of background. For starters, here’s the relevant portion of the conversation:

COX: Deborah, in some fields foundations do not seem very important, but we both think foundations of statistical inference are important; why do you think that is?

MAYO: I think because they ask about fundamental questions of evidence, inference, and probability. I don’t think that foundations of different fields are all alike; because in statistics we’re so intimately connected to the scientific interest in learning about the world, we invariably cross into philosophical questions about empirical knowledge and inductive inference.

COX: One aspect of it is that it forces us to say what it is that we really want to know when we analyze a situation statistically. Do we want to put in a lot of information external to the data, or as little as possible. It forces us to think about questions of that sort.

MAYO: But key questions, I think, are not so much a matter of putting in a lot or a little information. …What matters is the kind of information, and how to use it to learn. This gets to the question of how we manage to be so successful in learning about the world, despite knowledge gaps, uncertainties and errors. To me that’s one of the deepest questions and it’s the main one I care about. I don’t think a (deductive) Bayesian computation can adequately answer it. Continue reading

Categories: Background knowledge, Philosophy of Statistics, U-Phil | Tags: , | 2 Comments

Return to the comedy hour…(on significance tests)

These days, so many theater productions are updated reviews of older standards. Same with the comedy hours at the Bayesian retreat, and task force meetings of significance test reformers. So (on the 1-year anniversary of this blog) let’s listen in to one of the earliest routines (with highest blog hits), but with some new reflections (first considered here and here).

‘ “Did you hear the one about the frequentist . . .

“who claimed that observing “heads” on a biased coin that lands heads with probability .05 is evidence of a statistically significant improvement over the standard treatment of diabetes, on the grounds that such an event occurs with low probability (.05)?”

The joke came from J. Kadane’s Principles of Uncertainty (2011, CRC Press*).

 “Flip a biased coin that comes up heads with probability 0.95, and tails with probability 0.05.  If the coin comes up tails reject the null hypothesis.  Since the probability of rejecting the null hypothesis if it is true is 0.05, this is a valid 5% level test.  It is also very robust against data errors; indeed it does not depend on the data at all.  It is also nonsense, of course, but nonsense allowed by the rules of significance testing.” (439)

Much laughter.

___________________

But is it allowed?  I say no. The null hypothesis in the joke can be in any field, perhaps it concerns mean transmission of Scrapie in mice (as in my early Kuru post).  I know some people view significance tests as merely rules that rarely reject erroneously, but I claim this is mistaken. Both in significance tests and in scientific hypothesis testing more generally, data indicate inconsistency with H only by being counter to what would be expected under the assumption that H is correct (as regards a given aspect observed). Were someone to tell Prusiner that the testing methods he follows actually allow any old “improbable” event (a stock split in Apple?) to reject a hypothesis about prion transmission rates, Prusiner would say that person didn’t understand the requirements of hypothesis testing in science. Since the criticism would hold no water in the analogous case of Prusiner’s test, it must equally miss its mark in the case of significance tests**.  That, recall, was Rule #1. Continue reading

Categories: Comedy, Philosophy of Statistics, Statistics | Tags: , , , | 8 Comments

Metablog: One-Year Anniversary

Some of you may remember when I first began experimenting with a “frequentists in exile” blog on Google “blogspot” a year ago. That was a pretty rag-tag blog, but knowing there was just a teeny-tiny handful of readers also made it more informal and slightly less self-conscious. I even posted a picture of the wheelchair I needed to use for a short time when, a couple of weeks in, I injured my knee at an airport rescuing my computer bag from potential theft.  Amazingly enough, some of the posts with the highest hits of the year are the ones where I shared misadventures with the TSA (and the European equivalent) while traveling with the knee brace over a few months.*  For the past several months, anything but fairly direct discussions of matters philo-statistical are banished into semi-hidden pages, which now make up a blog within a blog of “rejected posts” (soon to be public). But surely the current, more professional blog represents progress, and reviewing the blog over this past week—wow, I see where my time went!  Anyway, I will revisit some posts from time to time, especially where they link to ongoing and new issues that have cropped up in my work, and/or where they deal with unresolved issues. Continue reading

Categories: Metablog | Leave a comment

Stephen Senn: The nuisance parameter nuisance

Senn in China

Stephen Senn

Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg

“The nuisance parameter nuisance”

 A great deal of statistical debate concerns ‘univariate’ error, or disturbance, terms in models. I put ‘univariate’ in inverted commas because as soon as one writes a model of the form (say) Yi =Xiβ + Єi, i = 1 … n and starts to raise questions about the distribution of the disturbance terms, Єi one is frequently led into multivariate speculations, such as, ‘is the variance identical for every disturbance term?’ and, ‘are the disturbance terms independent?’ and not just speculations such as, ‘is the distribution of the disturbance terms Normal?’. Aris Spanos might also want me to put inverted commas around ‘disturbance’ (or ‘error’) since what I ought to be thinking about is the joint distribution of the outcomes, Yi conditional on the predictors.

However, in my statistical world of planning and analysing clinical trials, the differences made to inferences according to whether one uses parametric versus non-parametric methods is often minor. Of course, using non-parametric methods does nothing to answer the problem of non-independent observations but for experiments, as opposed to observational studies, you can frequently design-in independence. That is a major potential pitfall avoided but then there is still the issue of Normality. However, in my experience, this is rarely where the action is. Inferences rarely change dramatically on using ‘robust’ approaches (although one can always find examples with gross-outliers where they do). However, there are other sorts of problem that can affect data which can make a very big difference. Continue reading

Categories: Philosophy of Statistics, Statistics | Tags: , , , | 3 Comments

After dinner Bayesian comedy hour….

Given it’s the first anniversary of this blog, which opened with the howlers in “Overheard at the comedy hour …” let’s listen in as a Bayesian holds forth on one of the most famous howlers of the lot: the mysterious role that psychological intentions are said to play in frequentist methods such as statistical significance tests. Here it is, essentially as I remember it (though shortened), in the comedy hour that unfolded at my dinner table at an academic conference:

 Did you hear the one about the researcher who gets a phone call from the guy analyzing his data? First the guy congratulates him and says, “The results show a statistically significant difference at the .05 level—p-value .048.” But then, an hour later, the phone rings again. It’s the same guy, but now he’s apologizing. It turns out that the experimenter intended to keep sampling until the result was 1.96 standard deviations away from the 0 null—in either direction—so they had to reanalyze the data (n=169), and the results were no longer statistically significant at the .05 level.

 Much laughter.

 So the researcher is tearing his hair out when the same guy calls back again. “Congratulations!” the guy says. “I just found out that the experimenter actually had planned to take n=169 all along, so the results are statistically significant.”

 Howls of laughter.

 But then the guy calls back with the bad news . . .

It turns out that failing to score a sufficiently impressive effect after n’ trials, the experimenter went on to n” trials, and so on and so forth until finally, say, on trial number 169, he obtained a result 1.96 standard deviations from the null.

It continues this way, and every time the guy calls in and reports a shift in the p-value, the table erupts in howls of laughter! From everyone except me, sitting in stunned silence, staring straight ahead. The hilarity ensues from the idea that the experimenter’s reported psychological intentions about when to stop sampling is altering the statistical results. Continue reading

Categories: Comedy, philosophy of science, Philosophy of Statistics, Statistics | Tags: , , , | 3 Comments

Failing to Apply vs Violating the Likelihood Principle

In writing a new chapter on the Strong Likelihood Principle [i] the past few weeks, I noticed a passage in G. Casella and R. Berger (2002) that in turn recalled a puzzling remark noted in my Jan. 3, 2012 post. The post began:

A question arose from a Bayesian acquaintance:

“Although the Birnbaum result is of primary importance for sampling theorists, I’m still interested in it because many Bayesian statisticians think that model checking violates the (strong) likelihood principle (SLP), as if this principle is a fundamental axiom of Bayesian statistics”.

But this is puzzling for two reasons. First, if the LP does not preclude testing for assumptions (and he is right that it does not[ii]), then why not simply explain that rather than appeal to a disproof of something that actually never precluded model testing?   To take the disproof of the LP as grounds to announce: “So there! Now even Bayesians are free to test their models” would seem only to ingrain the original fallacy.

You can read the rest of the original post here.

The remark in G. Casella and R. Berger seems to me equivocal on this point: Continue reading

Categories: Likelihood Principle, Philosophy of Statistics, Statistics | Tags: , , , | 2 Comments

Frequentist Pursuit

A couple of readers sent me notes about a recent post on (Normal Deviate)* that introduces the term “frequentist pursuit”:

“If we manipulate the data to get a posterior that mimics the frequentist answer, is this really a success for Bayesian inference? Is it really Bayesian inference at all? Similarly, if we choose a carefully constructed prior just to mimic a frequentist answer, is it really Bayesian inference? We call Bayesian inference which is carefully manipulated to force an answer with good frequentist behavior, frequentist pursuit. There is nothing wrong with it, but why bother?

If you want good frequentist properties just use the frequentist estimator.”(Robins and Wasserman)

I take it that the Bayesian response to the question (“why bother?”) is that the computations yield that magical posterior (never mind just how to interpret them).

Cox and Mayo 2010 say, about a particular example of  “frequentist envy pursuit”:

 “Reference priors yield inferences with some good frequentist properties, at least in one-dimensional problems – a feature usually called matching. … First, as is generally true in science, the fact that a theory can be made to match known successes does not redound as strongly to that theory as did the successes that emanated from first principles or basic foundations. This must be especially so where achieving the matches seems to impose swallowing violations of its initial basic theories or principles.

Even if there are some cases where good frequentist solutions are more neatly generated through Bayesian machinery, it would show only their technical value for goals that differ fundamentally from their own.” (301)

Imitation, some say,  is the most sincere form of flattery.  I don’t agree, but doubtless  it is a good thing that we see a degree of self-imposed and/or subliminal frequentist constraints on much Bayesian work in practice.  Some (many?) Bayesians suggest that this is merely a nice extra rather than necessary, forfeiting the (non-trivial) pursuit of  frequentist (error statistical foundations) for Bayesian pursuits**.

*I had noticed this, but had no time to work through the thicket of the example he considers. I welcome a very simple upshot.

**At least some of them.

Categories: Philosophy of Statistics | Tags: , , | 5 Comments

knowledge/evidence not captured by mathematical prob.

Mayo mirror

Equivocations between informal and formal uses of “probability” (as well as “likelihood” and “confidence”) are responsible for much confusion in statistical foundations, as is remarked in a famous paper I was rereading today by Allan Birnbaum:

“It is of course common nontechnical usage to call any proposition probable or likely if it is supported by strong evidence of some kind. .. However such usage is to be avoided as misleading in this problem-area, because each of the terms probability, likelihood and confidence coefficient is given a distinct mathematical and extramathematical usage.” (1969, 139 Note 4).

For my part, I find that I never use probabilities to express degrees of evidence (either in mathematical or extramathematical uses), but I realize others might. Even so, I agree with Birnbaum “that such usage is to be avoided as misleading in” foundational discussions of evidence. We know, infer, accept, and detach from evidence, all kinds of claims without any inclination to add an additional quantity such as a degree of probability or belief arrived at via, and obeying, the formal probability calculus.

It is interesting, as a little exercise, to examine scientific descriptions of the state of knowledge in a field. A few days ago, I posted something from Weinberg on the Higgs particle. Here are some statements, with some terms emphasized:

The general features of the electroweak theory have been well tested; their validity is not what has been at stake in the recent experiments at CERN and Fermilab, and would not be seriously in doubt even if no Higgs particle had been discovered.

I see no suggestion of a formal application of Bayesian probability notions. Continue reading

Categories: philosophy of science, Philosophy of Statistics | Tags: , , , | 10 Comments

“Did Higgs Physicists Miss an Opportunity by Not Consulting More With Statisticians?”

On August 20 I posted the start of  “Discussion and Digest” by Bayesian statistician Tony O’Hagan– an oveview of  responses to his letter (ISBA website) on the use of p-values in analyzing the Higgs data, prompted, in turn, by a query of subjective Bayesian Dennis Lindley.  I now post the final section in which he discusses his own view. I think it raises many  questions of interest both as regards this case, and more generally about statistics and science. My initial July 11 post is here.

“Higgs Boson – Digest and Discussion” By Tony O’Hagan

Discussion

So here are some of my own views on this.

There are good reasons for being cautious and demanding a very high standard of evidence before announcing something as momentous as H. It is acknowledged by those who use it that the 5-sigma standard is a fudge, though. They would surely be willing to make such an announcement if they were, for instance, 99.99% certain of H’s existence, as long as that 99.99% were rigorously justified. 5-sigma is used because they don’t feel able to quantify the probability of H rigorously. So they use the best statistical analysis that they know how to do, but because they also know there are numerous factors not taken into account by this analysis – the multiple testing, the likelihood of unrecognised or unquantified deficiencies in the data, experiment or statistics, and the possibility of other explanations – they ask for what on the face of it is an absurdly high level of significance from that analysis. Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics | Tags: , | 8 Comments

Blog at WordPress.com.