Statistics

LSE Summer Seminar: Contemporary Problems in Philosophy of Statistics

Posted on May 8, 2012 by Mayo

As a visitor of the Centre for Philosophy of Natural and Social Science (CPNSS) at the London School of Economics and Political Science, I am planning to lead 5 seminars in the department of Philosophy, Logic, and Scientific Method this summer (2) and autumn (3) on Contemporary Philosophy of Statistics under the PH500 rubric, (listed under summer term). This will be rather informal, based on the book I am writing with this name. There will be at least one guest seminar leader in the fall. Anyone interested in attending or finding out more may write to me: error@vt.edu .*

Wednesday 6th June 3-5pm T206

Wednesday 13th June 3-5pm T206

Autumn term dates: To Be Announced

LSE contact person:c.j.thompson@lse.ac.uk.

PH 500. Contemporary Problems in Philosophy of Statistical Science Continue reading →

Categories: Announcement, philosophy of science, Statistics | Tags: error statistical philosophy, LSE seminar | Leave a comment

Comedy Hour at the Bayesian (Epistemology) Retreat: Highly Probable vs Highly Probed

Posted on May 5, 2012 by Mayo

Bayesian philosophers (among others) have analogous versions of the criticism in my April 28 blogpost: error probabilities (associated with inferences to hypotheses) may conflict with chosen posterior probabilities in hypotheses. Since it’s Saturday night let’s listen in to one of the comedy hours at the Bayesian retreat (note the sedate philosopher’s comedy club backdrop):

Did you hear the one about the frequentist error statistical tester who inferred a hypothesis H passed a stringent test (with data x)?

The problem was, the epistemic probability in H was so low that H couldn’t be believed! Instead we believe its denial H’! So, she will infer hypotheses that are simply unbelievable!

So clearly the error statistical testing account fails to serve in an account of knowledge or inference (i.e., an epistemic account). However severely I might wish to say that a hypothesis H has passed a test, the Bayesian critic assigns a sufficiently low prior probability to H so as to yield a low posterior probability in H[i]. But this is no argument about why this counts in favor of, rather than against, their Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis H.

To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.” This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true. This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H. Continue reading →

Categories: philosophy of science, Statistics | Tags: Comedy club, criticism of frequentist methods, epistemic probabilsm, Frequentist inference, Peter Achinstein | 34 Comments

Stephen Senn: A Paradox of Prior Probabilities

Posted on May 1, 2012 by Mayo

Stephen Senn

Head of the Methodology and Statistics Group,

Competence Center for Methodology and Statistics (CCMS), Luxembourg

This paradox is clearly inspired by and in a sense is just another form of Philip Dawid’s selection paradox[1]. See my paper in The American Statistician for a discussion of this[2]. However, I rather like this concrete example of it.

Imagine that you are about to carry out a Bayesian analysis of a new treatment for rheumatism. However, just to avoid various complications I am going to assume that you are looking at a potential side effect of the treatment. I am going to take the effect on diastolic blood pressure (DBP) as the example of a side-effect one might look at.

Now, to be truly Bayesian I think that you ought to have a look at a long list of previous treatments for rheumatism but time is short and this is not always so easy. So instead you argue like this.

I know from the results of the WHO Monica project that the standard deviation of DBP is about 11mmHg in a general population.
I have no prior opinion as to whether anti-rheumatics as a class have a beneficial or harmful effect on DBP
I think that large effects on DBP, whether harmful or beneficial, are rather improbable for a drug designed to treat rheumatism.
I believe the data are approximately Normal
I am going to use a conjugate prior for the effect of treatment with mean 0 and standard deviation = 4 mm Hg. This makes very large beneficial or harmful effects unlikely but still allows reasonable play for the data. This means that the prior variance is 16mgHg² compared to a data variance I am expecting to be about 120 mmHg². This means that as soon as I have treated 8 subjects the data mean variance should be smaller (about 15 mmHg²) that the prior mean and so I will actually be weighting the data more than the prior at that point. This seems about reasonable to me.

You can choose different figures if you want but here I am attempting to apply a standard Bayesian analysis in a reasonably honest manner. Continue reading →

Categories: Statistics | Tags: Bayesian inference, Jack Good, Philip Dawid, Prior probability | 13 Comments

Comedy Hour at the Bayesian Retreat: P-values versus Posteriors

Posted on April 28, 2012 by Mayo

The natural follow-up question to my last blog post is: What happens when numbers disagree across different philosophies? Since it’s Saturday night let’s listen in to one of the comedy hours at the Bayesian retreat:

Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?

JB: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!

Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!

Raucous laughter ensues!

(Hah, hah,…. I feel I’m back in high school: “So funny, I forgot to laugh!)

The frequentist tester should retort:

Frequentist significance tester: But you assumed 50% of the null hypotheses are true, and computed P(H₀|x) (imagining P(H₀)= .5)—and then assumed my p-value should agree with the number you get!

But, our significance tester is not heard from as they move on to the next joke….

Of course it is well-known that for a fixed p-value, with a sufficiently large n, even a statistically significant result can correspond to large posteriors in H₀ [i] . Somewhat more recent work generalizes the result, e.g., J. Berger and Sellke, 1987. Although from their Bayesian perspective, it appears that p-values come up short as measures of evidence, the significance testers balk at the fact that use of the recommended priors allows highly significant results to be interpreted as no evidence against the null — or even evidence for it! An interesting twist in recent work is to try to “reconcile” the p-value and the posterior e.g., Berger 2003[ii].

The conflict between p-values and Bayesian posteriors considers the two sided test of the Normal mean, H₀: μ = μ₀ versus H₁: μ ≠ μ₀ .

“If n = 50 one can classically ‘reject H₀ at significance level p = .05,’ although Pr (H₀|x) = .52 (which would actually indicate that the evidence favors H₀).” (Berger and Sellke, 1987, p. 113).

If n = 1000, a result statistically significant at the .05 level leads to a posterior to the null of .82!

CHART

Table 1 (modified) from J.O. Berger and T. Selke (1987) “Testing a Point Null Hypothesis,” JASA 82(397) : 113.

Many find the example compelling evidence that the p-value “overstates evidence against a null” because it claims to use an “impartial” or “uninformative”(?) Bayesian prior probability assignment of .5 toH₀, the remaining .5 being spread out over the alternative parameter space. Others charge that the problem is not p-values but the high prior (Casella and R.Berger, 1987). Moreover, the “spiked concentration of belief in the null” is at odds with the prevailing view “we know all nulls are false”. Note too the conflict with confidence interval reasoning since the value zero (0) lies outside the corresponding confidence interval (Mayo 2005).

But often, as in the opening joke, the prior assignment is claimed to be keeping to the frequentist camp and frequentist error probabilities: it is imagined that we sample randomly from a population of hypotheses, some proportion of which are assumed to be true, 50% is a common number used. We randomly draw a hypothesis and get this particular one, maybe it concerns the mean deflection of light, or perhaps it is an assertion of bioequivalence of two drugs or whatever. The percentage “initially true” (in this urn of nulls) serves as the prior probability for H₀. I see this gambit in statistics, psychology, philosophy and elsewhere, and yet it commits a fallacious instantiation of probabilities:

50% of the null hypotheses in a given pool of nulls are true.

This particular null H₀was randomly selected from this urn (and, it may be added, nothing else is known, or the like).

Therefore P(H₀ is true) = .5.

It isn’t that one cannot play a carnival game of reaching into an urn of nulls (and one can imagine lots of choices for what to put in the urn), and use a Bernouilli model for the chance of drawing a true hypothesis (assuming we could even tell), but this “generic hypothesis” is no longer the particular hypothesis one aims to use in computing the probability of data x₀ (be it on eclipse data, risk rates, or whatever) under hypothesis H₀. [iii] In any event .5 is not the frequentist probability that the chosen null H₀ is true. (Note the selected null would get the benefit of being selected from an urn of nulls where few have been shown false yet: “innocence by association”.)

Yet J. Berger claims his applets are perfectly frequentist, and by adopting his recommended O-priors, we frequentists can become more frequentist (than using our flawed p-values)[iv]. We get what he calls conditional p-values (of a special sort). This is a reason for a coining a different name, e.g., frequentist error statistician.

Upshot: Berger and Sellke tell us they will cure the significance tester’s tendency to exaggerate the evidence against the null (in two-sided testing) by using some variant on a spiked prior. But the result of their “cure” is that outcomes may too readily be taken as no evidence against, or even evidence for, the null hypothesis, even if it is false. We actually don’t think we need a cure. Faced with conflicts between error probabilities and Bayesian posterior probabilities, the error statistician may well conclude that the flaw lies with the latter measure. This is precisely what Fisher argued:

Discussing a test of the hypothesis that the stars are distributed at random, Fisher takes the low p-value (about 1 in 33,000) to “exclude at a high level of significance any theory involving a random distribution” (Fisher, 1956, page 42). Even if one were to imagine that H₀ had an extremely high prior probability, Fisher continues—never minding “what such a statement of probability a priori could possibly mean”—the resulting high posteriori probability to H₀, he thinks, would only show that “reluctance to accept a hypothesis strongly contradicted by a test of significance” (44) . . . “is not capable of finding expression in any calculation of probability a posteriori” (43). Sampling theorists do not deny there is ever a legitimate frequentist prior probability distribution for a statistical hypothesis: one may consider hypotheses about such distributions and subject them to probative tests. Indeed, Fisher says, if one were to consider the claim about the a priori probability to be itself a hypothesis, it would be rejected by the data!

[i] A result my late colleague I.J. wanted me to call the Jeffreys-Good-Lindley Paradox).

[ii] An applet is available at http://www.stat.duke.edu/∼berger

[iii] Bayesian philosophers, e.g., Achinstein, allow this does not yield a frequentist prior, but he claims it yields an acceptable prior for the epistemic probabilist (e.g., See Error and Inference 2010).

[iv]Does this remind you of how the Bayesian is said to become more subjective by using the Berger O-Bayesian prior? See Berger deconstruction.

References & Related articles

Berger, J. O. (2003). “Could Fisher, Jeffreys and Neyman have Agreed on Testing?” Statistical Science 18: 1-12.

Berger, J. O. and Sellke, T. (1987). “Testing a point null hypothesis: The irreconcilability of p values and evidence,” (with discussion). J. Amer. Statist. Assoc. 82: 112–139.

Cassella G. and Berger, R.. (1987). “Reconciling Bayesian and Frequentist Evidence in the One-sided Testing Problem,” (with discussion). J. Amer. Statist. Assoc. 82 106–111, 123–139.

Fisher, R. A., (1956) Statistical Methods and Scientific Inference, Edinburgh: Oliver and Boyd.

Jeffreys, (1939) Theory of Probability, Oxford: Oxford University Press.

Mayo, D. (2003), Comment on J. O. Berger’s “Could Fisher,Jeffreys and Neyman Have Agreed on Testing?”, Statistical Science 18, 19-24.

Mayo, D. (2004). “An Error-Statistical Philosophy of Evidence,” in M. Taper and S. Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. Chicago: University of Chicago Press: 79-118.

Mayo, D.G. and Cox, D. R. (2006) “Frequentists Statistics as a Theory of Inductive Inference,” Optimality: The Second Erich L. Lehmann Symposium (ed. J. Rojo), Lecture Notes-Monograph series, Institute of Mathematical Statistics (IMS), Vol. 49: 77-97.

Mayo, D. and Kruse, M. (2001). “Principles of Inference and Their Consequences,” in D. Cornfield and J. Williamson (eds.) Foundations of Bayesianism. Dordrecht: Kluwer Academic Publishes: 381-403.

Mayo, D. and Spanos, A. (2011) “Error Statistics” in Philosophy of Statistics , Handbook of Philosophy of Science Volume 7 Philosophy of Statistics, (General editors: Dov M. Gabbay, Paul Thagard and John Woods; Volume eds. Prasanta S. Bandyopadhyay and Malcolm R. Forster.) Elsevier: 1-46.

Categories: Statistics | Tags: Comedy club, default priors, Null hypothesis, p-value vs posterior, Prior probability, statistical significance | 53 Comments

Matching Numbers Across Philosophies

Posted on April 25, 2012 by Mayo

The search for an agreement on numbers across different statistical philosophies is an understandable pastime in foundations of statistics. Perhaps identifying matching or unified numbers, apart from what they might mean, would offer a glimpse as to shared underlying goals? Jim Berger (2003) assures us there is no sacrilege in agreeing on methodology without philosophy, claiming “while the debate over interpretation can be strident, statistical practice is little affected as long as the reported numbers are the same” (Berger, 2003, p. 1).

Do readers agree?

Neyman and Pearson (or perhaps it was mostly Neyman) set out to determine when tests of statistical hypotheses may be considered “independent of probabilities a priori” ([p. 201). In such cases, frequentist and Bayesian may agree on a critical or rejection region.

The agreement between “default” Bayesians and frequentists in the case of one-sided Normal (IID) testing (known σ) is very familiar. As noted in Ghosh, Delampady, and Samanta (2006, p. 35), if we wish to reject a null value when “the posterior odds against it are 19:1 or more, i.e., if posterior probability of H₀ is < .05” then the rejection region matches that of the corresponding test of H₀, (at the .05 level) if that were the null hypothesis. By contrast, they go on to note the also familiar fact that there would be disagreement between the frequentist and Bayesian if one were instead testing the two sided: H₀: μ=μ₀ vs. H₁: μ≠μ₀ with known σ. In fact, the same outcome that would be regarded as evidence against the null in the one-sided test (for the default Bayesian and frequentist) can result in statistically significant results being construed as no evidence against the null —for the Bayesian– or even evidence for it (due to a spiked prior).[i] Continue reading →

Categories: Statistics | Tags: Bayesian probability, Frequentist inference, frequentist-Bayesian unifications, Statistical hypothesis testing | 7 Comments

U-Phil: Jon Williamson: Deconstructing Dynamic Dutch Books

Posted on April 23, 2012 by Mayo

Jon Williamson

I am posting Jon Williamson’s* (Philosophy, Kent) U-Phil from 4-15-12

Jon Williamson

In this paper http://www.springerlink.com/content/q175036678w17478 (Synthese 178:67–85) I identify four ways in which Bayesian conditionalisation can fail. Of course not all Bayesians advocate conditionalisation as a universal rule, and I argue that objective Bayesianism as based on the maximum entropy principle should be preferred to subjective Bayesianism as based on conditionalisation, where the two disagree.

Conditionalisation is just one possible way of updating probabilities and I think it’s interesting to see how different formal approaches compare.

*Williamson participated in our June 2010 “Phil-Stat Meets Phil Sci” conference at the LSE, and we jointly ran a conference at Kent in June 2009.

Categories: Statistics, U-Phil | Tags: Bayesian probability, Deconstruction, Dutch books, Jon Williamson, Principle of maximum entropy | 10 Comments

Jean Miller: Happy Sweet 16 to EGEK #2 (Hasok Chang Review of EGEK)

Posted on April 21, 2012 by Mayo

Jean Miller here, reporting back from the island. Tonight we complete our “sweet sixteen” celebration of Mayo’s EGEK (1996) with the book review by Dr. Hasok Chang (currently the Hans Rausing Professor of History and Philosophy of Science at the University of Cambridge). His was chosen as our top favorite in the category of ‘reviews by philosophers’. Enjoy!

REVIEW: British Journal of the Philosophy of Science 48 (1997), 455-459
DEBORAH MAYO Error and the Growth of Experimental Knowledge,
The University of Chicago Press, 1996
By: Hasok Chang

Deborah Mayo’s Error and the Growth of Experimental Knowledge is a rich, useful, and accessible book. It is also a large volume which few people can realistically be expected to read cover to cover. Considering those factors, the main focus of this review will be on providing various potential readers with guidelines for making the best use of the book.

As the author herself advises, the main points can be grasped by reading the first and the last chapters. The real benefit, however, would only come from studying some of the intervening chapters closely. Below I will offer comments on several of the major strands that can be teased apart, though they are found rightly intertwined in the book. Continue reading →

Categories: philosophy of science, Statistics | Tags: Error and the Growth of Experimental Knowledge, error statistical philosophy, Hasok Chang, severity | 2 Comments

Jean Miller: Happy Sweet 16 to EGEK! (Shalizi Review: “We Have Ways of Making You Talk”)

Posted on April 18, 2012 by Mayo

Jean Miller here. (I obtained my PhD with D. Mayo in Phil/STS at VT.) Some of us “island philosophers” have been looking to pick our favorite book reviews of EGEK (Mayo 1996; Lakatos Prize 1999) to celebrate its “sweet sixteen” this month. This review, by Dr. Cosma Shalizi (CMU, Stat) has been chosen as the top favorite (in the category of reviews outside philosophy). Below are some excerpts–it was hard to pick, as each paragraph held some new surprise, or unique way to succinctly nail down the views in EGEK. You can read the full review here. Enjoy.

“We Have Ways of Making You Talk, or, Long Live Peircism-Popperism-Neyman-Pearson Thought!”
by Cosma Shalizi

After I’d bungled teaching it enough times to have an idea of what I was doing, one of the first things students in my introductory physics classes learned (or anyway were taught), and which I kept hammering at all semester, was error analysis: estimating the uncertainty in measurements, propagating errors from measured quantities into calculated ones, and some very quick and dirty significance tests, tests for whether or not two numbers agree, within their associated margins of error. I did this for purely pragmatic reasons: it seemed like one of the most useful things we were supposed to teach, and also one of the few areas where what I did had any discernible effect on what they learnt. Now that I’ve read Mayo’s book, I’ll be able to offer another excuse to my students the next time I teach error analysis, namely, that it’s how science really works.

I exaggerate her conclusion slightly, but only slightly. Mayo is a dues-paying philosopher of science (literally, it seems), and like most of the breed these days is largely concerned with questions of method and justification, of “ampliative inference” (C. S. Peirce) or “non-demonstrative inference” (Bertrand Russell). Put bluntly and concretely: why, since neither can be deduced rigorously from unquestionable premises, should we put more trust in David Grinspoon‘s ideas about Venus than in those of Immanuel Velikovsky? A nice answer would be something like, “because good scientific theories are arrived at by employing thus-and-such a method, which infallibly leads to the truth, for the following self-evident reasons.” A nice answer, but not one which is seriously entertained by anyone these days, apart from some professors of sociology and literature moonlighting in the construction of straw men. In the real world, science is alas fallible, subject to constant correction, and very messy. Still, mess and all, we somehow or other come up with reliable, codified knowledge about the world, and it would be nice to know how the trick is turned: not only would it satisfy curiosity (“the most agreeable of all vices” — Nietzsche), and help silence such people as do, in fact, prefer Velikovsky to Grinspoon, but it might lead us to better ways of turning the trick. Asking scientists themselves is nearly useless: you’ll almost certainly just get a recital of whichever school of methodology we happened to blunder into in college, or impatience at asking silly questions and keeping us from the lab. If this vice is to be indulged in, someone other than scientists will have to do it: namely, the methodologists. Continue reading →

Categories: philosophy of science, Statistics | Tags: Bayesian foundations in shambles, Error and the Growth of Experimental Knowledge, error statistical philosophy, frequentists foundations, severity | 33 Comments

A. Spanos: Jerzy Neyman and his Enduring Legacy

Posted on April 16, 2012 by Mayo

A Statistical Model as a Chance Mechanism

Aris Spanos

Jerzy Neyman (April 16, 1894 – August 5, 1981), was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals.

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for non-random samples. Fisher’s original parametric statistical model M_θ(x) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data x₀:=(x₁,x₂,…,x_n) can be viewed as a ‘truly representative sample’ from that ‘population’:

“The postulate of randomness thus resolves itself into the question, Of what population is this a random sample? (ibid., p. 313), underscoring that: the adequacy of our choice may be tested a posteriori.’’ (p. 314)

In cases where data x₀ come from sample surveys or it can be viewed as a typical realization of a random sample X:=(X₁,X₂,…,X_n), i.e. Independent and Identically Distributed (IID) random variables, the ‘population’ metaphor can be helpful in adding some intuitive appeal to the inductive dimension of statistical inference, because one can imagine using a subset of a population (the sample) to draw inferences pertaining to the whole population.

This ‘infinite population’ metaphor, however, is of limited value in most applied disciplines relying on observational data. To see how inept this metaphor is consider the question: what is the hypothetical ‘population’ when modeling the gyrations of stock market prices? More generally, what is observed in such cases is a certain on-going process and not a fixed population from which we can select a representative sample. For that very reason, most economists in the 1930s considered Fisher’s statistical modeling irrelevant for economic data! Continue reading →

Categories: Statistics | Tags: chance mechanism, frequentist probability, induction, Jerzy Neyman, Ronald Fisher, statistical Generating Mechanism, statistical model | 2 Comments

U-Phil: Deconstructing Dynamic Dutch-Books?

Posted on April 15, 2012 by Mayo

Oh, she takes care of herself, she can wait if she wants,
She’s ahead of her time.
Oh, and she never gives out and she never gives in,
She just changes her mind.

(Billy Joel, “She’s Always a Woman”)

If we agree that we have degrees of belief in any and all propositions, then, it is often argued (by Bayesians), that if your beliefs do not conform to the probability calculus, you are being incoherent, and will lose money for sure (by a clever enough bookie). We can accept the claim that, were we required to take bets on our degrees of belief, then given that we prefer not to lose, we would not accept bets that ensured our losing. But this is a tautology, as others have pointed out, and entails nothing about degree of belief assignments. “That an agent ought not to accept a set of wagers according to which she loses come what may, if she would prefer not to lose, is a matter of deductive logic and not a property of beliefs” (Bacchus, Kyburg, and Thalos 1990: 476).[i] Nor need coerced (or imaginary) betting rates actually measure an agent’s degrees of belief in the truth of scientific hypothesis..

Nowadays, surprisingly, most Bayesian philosophers seem to dismiss as irrelevant the variety of threats of being Dutch-booked. Confronted with counterexamples in which violating Bayes’s rule seems perfectly rational on intuitive grounds, Bayesians contort themselves into a great many knots in order to retain the underlying Bayesian philosophy while sacrificing updating rules, long held to be the very essence of Bayesian reasoning. To face contemporary positions squarely calls for rather imaginative deconstructions. I invite your deconstructions (to error@vt.edu) by April 23 (see So You Want to Do a Philosophical Analysis). Says Howson:

“It is the entirely rational claim that I may be induced to act irrationally that the dynamic Dutch book argument, absurdly, would condemn as incoherent”. (Howson 1997: 287)[ii] [iii]

It used to be that frequentists and others who sounded the alarm about temporal incoherency were declared irrational. Now, it is the traditional insistence on updating by Bayes’s rule that was irrational all along. Continue reading →

Categories: Statistics, U-Phil | Tags: Deconstruction, Dutch books (dynamic) | 22 Comments

That Promissory Note From Lehmann’s Letter; Schmidt to Speak

Posted on April 12, 2012 by Mayo

Juliet Shaffer and Erich Lehmann

Monday, April 16, is Jerzy Neyman’s birthday, but this post is not about Neyman (that comes later, I hope). But in thinking of Neyman, I’m reminded of Erich Lehmann, Neyman’s first student, and a promissory note I gave in a post on September 15, 2011. I wrote:

“One day (in 1997), I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that). …. I remember it contained two especially noteworthy pieces of information, one intriguing, the other quite surprising. The intriguing one (I’ll come back to the surprising one another time, if reminded) was this: He told me he was sitting in a very large room at an ASA meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, dark table sat just one book, all alone, shiny red. He said he wondered if it might be of interest to him! So he walked up to it…. It turned out to be my Error and the Growth of Experimental Knowledge (1996, Chicago), which he reviewed soon after.”

But what about the “surprising one” that I was to come back to “if reminded”? (yes, one person did remind me last month). The surprising one is that Lehmann’s letter—this is his first letter to me– asked me to please read a paper by Frank Schmidt to appear in his wife Juliet Shaffer’s new (at the time) journal, Psychological Methods, as he wondered if I had any ideas as to what may be done to answer such criticisms of frequentist tests! But, clearly, few people could have been in a better position than Lehmann to “do something about” these arguments …hence my surprise. But I think he was reluctant…. Continue reading →

Categories: Statistics | Tags: criticism of frequentist methods, Frank L. Schmidt, Jerzy Neyman, Lehmann, significance tests | 1 Comment

N. Schachtman: Judge Posner’s Digression on Regression

Posted on April 9, 2012 by Mayo

I am pleased to post Nathan Schactman’s most recent blog entry on statistics in the law: he has gratefully agreed to respond to comments and queries on this blog*.

Judge Posner’s Digression on Regression

April 6th, 2012

Cases that deal with linear regression are not particularly exciting except to a small brand of “quant” lawyers who see such things “differently.” Judge Posner, the author of several books, including Economic Analysis of Law (8th ed. 2011), is a judge who sees things differently as well.

In a case decided late last year, Judge Posner took the occasion to chide the district court and the parties’ legal counsel for failing to assess critically a regression analysis offered by an expert witness on the quantum of damages in a contract case. ATA Airlines Inc. (ATA), a subcontractor of Federal Express Corporation, sued FedEx for breaching an alleged contract to include ATA in a lucrative U.S. military deal.

Remarkably, the contract liability was a non-starter; the panel of the Seventh Circuit reversed and rendered the judgment in favor of the plaintiff. There never was a contract, and so the case should never have gone to trial. ATA Airlines, Inc. v. Federal Exp. Corp., 665 F.3d 882, 888-89 (2011).

End of Story?

In a diversity case, based upon state law, with no liability, you would think that the panel would and perhaps should stop once it reached the conclusion that there was no contract upon which to predicate liability. Anything more would be, of course, pure obiter dictum, but Judge Posner could not resist the teaching moment, both for the trial judge below, the parties, their counsel, and the bar: Continue reading →

Categories: Statistics | Tags: evidence based policy, law and statistics, N.Schachtman, Regression analysis, Richard Posner, United States Court of Appeals for the Seventh Circuit | 2 Comments

Going Where the Data Take Us

Posted on April 6, 2012 by Mayo

A reader, Cory J, sent me a question in relation to a talk of mine he once attended:

I have the vague ‘memory’ of an example that was intended to bring out a central difference between broadly Bayesian methodology and broadly classical statistics. I had thought it involved a case in which a Bayesian would say that the data should be conditionalized on, and supports H, whereas a classical statistician effectively says that the data provides no support to H. …We know the data, but we also know of the data that only ‘supporting’ data would be given us. A Bayesian was then supposed to say that we should conditionalize on the data that we have, even if we know that we wouldn’t have been given contrary data had it been available.

That only “supporting” data would be presented need not be problematic in itself; it all depends on how this is interpreted. There might be no negative results to be had (H might be true) , and thus none to “be given us”. Your last phrase, however, does describe a pejorative case for a frequentist error statistician, in that, if “we wouldn’t have been given contrary data” to H (in the sense of data in conflict with what H asserts), even “had it been available” then the procedure had no chance of finding or reporting flaws in H. Thus only data in accordance with H would be presented, even if H is false; so H passes a “test” with minimal stringency or severity. I discuss several examples in papers below (I think the reader had in mind Mayo and Kruse 2001). Continue reading →

Categories: double-counting, Statistics | Tags: double-counting, severity, Velikovsky dodge | 4 Comments

Fallacy of Rejection and the Fallacy of Nouvelle Cuisine

Posted on April 4, 2012 by Mayo

In February, in London, criminologist Katrin H. and I went to see Jackie Mason do his shtick, a one-man show billed as his swan song to England. It was like a repertoire of his “Greatest Hits” without a new or updated joke in the mix. Still, hearing his rants for the nth time was often quite hilarious.

A sample: If you want to eat nothing, eat nouvelle cuisine. Do you know what it means? No food. The smaller the portion the more impressed people are, so long as the food’s got a fancy French name, haute cuisine. An empty plate with sauce!

As one critic wrote, Mason’s jokes “offer a window to a different era,” one whose caricatures and biases one can only hope we’ve moved beyond: http://www.guardian.co.uk/stage/2012/feb/21/jackie-mason-live-review

But it’s one thing for Jackie Mason to scowl at a seat in the front row and yell to the shocked audience member in his imagination, “These are jokes! They are just jokes!” and another to reprise statistical howlers, which are not jokes, to me. This blog found its reason for being partly as a place to expose, understand, and avoid them. Recall the September 26, 2011 post “Whipping Boys and Witch Hunters”: https://errorstatistics.com/2011/09/26/whipping-boys-and-witch-hunters-comments-are-now-open/: [i]

Fortunately, philosophers of statistics would surely not reprise decades-old howlers and fallacies. After all, it is the philosopher’s job to clarify and expose the conceptual and logical foibles of others; and even if we do not agree, we would never merely disregard and fail to address the criticisms in published work by other philosophers. Oh wait, ….one of the leading texts repeats the fallacy in their third edition: Continue reading →

Categories: Statistics | Tags: criticism of frequentist methods, fallacy of rejection, Jackie Mason, sample size, significance tests | 1 Comment

Philosophy of Statistics: Retraction Watch, Vol. 1, No. 1

Posted on April 1, 2012 by Mayo

This morning I received a paper I have been asked to review (anonymously as is typical). It is to head up a forthcoming issue of a new journal called Philosophy of Statistics: Retraction Watch. This is the first I’ve heard of the journal, and I plan to recommend they publish the piece, conditional on revisions. I thought I would post the abstract here. It’s that interesting.

“Some Slightly More Realistic Self-Criticism in Recent Work in Philosophy of Statistics,” Philosophy of Statistics: Retraction Watch, Vol. 1, No. 1 (2012), pp. 1-19.

In this paper we delineate some serious blunders that we and others have made in published work on frequentist statistical methods. First, although we have claimed repeatedly that a core thesis of the frequentist testing approach is that a hypothesis may be rejected with increasing confidence as the power of the test increases, we now see that this is completely backwards, and we regret that we have never addressed, or even fully read, the corrections found in Deborah Mayo’s work since at least 1983, and likely even before that.

Second, we have been wrong to claim that Neyman-Pearson (N-P) confidence intervals are inconsistent because in special cases it is possible for a specific 95% confidence interval to be known to be correct. Not only are the examples required to show this absurdly artificial, but the frequentist could simply interpret this “vacuous interval” “as a statement that all parameter values are consistent with the data at a particular level,” which, as Cox and Hinkley note, is an informative statement about the limitations in the data (Cox and Hinkley 1974, 226). Continue reading →

Categories: Comedy, Statistics | Tags: April's Fools, confidence intervals, criticism of frequentist methods, humor, Likelihood Principle, Retraction Watch | 4 Comments

Comment on the Barnard and Copas (2002) Empirical Example: Aris Spanos

Posted on March 28, 2012 by Mayo

I am grateful to A. Spanos for letting me post a link to his comments on a paper S. Senn shared last week. You can find a pdf of his comments here.

You can read the original Bernard and Copas (2002) article here

Categories: Statistics | Tags: Barnard, Copas, likelihood ratio, post-data severity evaluation, Senn | 21 Comments

The New York Times Goes to War Against Generic Drug Manufacturers: Schactman

Posted on March 25, 2012 by Mayo

Schachtman gives an interesting legal update today on his blog concerning the issue in my post Generic Drugs Resistant to Lawsuits” (Mar. 22, 2012). I post it here:

The New York Times Goes to War Against Generic Drug Manufacturers

By: Nathan Schachtman, Esq., PC*

Last week marked the launch of a New York Times a rhetorically fevered, legally sophomoric campaign against generic drug preemption. Saturday saw an editorial, “A Bizarre Outcome on Generic Drugs,” New York Times (March 24, 2012), which screamed, “Bizarre”! “Outrageous”!

The New York Times editorialists have their knickers in a knot over the inability of people, who are allegedly harmed by adverse drug reactions from generic medications, to sue the generic manufacturers. The editorial follows a front-page article, from earlier last week, which decried the inability to sue generic drug sellers. See Katie Thomas, “Generic Drugs Proving Resistant to Damage Suits,” New York Times (Mar. 21, 2012).

The Times‘ writers think that it is “bizarre” and “outrageous” that these people are out of court due to federal preemption of state court tort laws that might have provided a remedy.

In particular, the Times suggests that the law is irrational for allowing Ms. Diana Levine to recover against Wyeth for the loss of her arm to gangrene after receiving Phenergan by intravenous push, while another plaintiff, Ms. Schork, cannot recover for a similar injury, from a generic manufacturer of promethazine, the same medication. Wyeth v. Levine, 555 U.S. 555 (2009). See also Brief of Petitioner Wyeth, in Wyeth v. Levine (May 2008).

Of course, both Ms. Levine and Ms. Schork received compensation from their healthcare providers, who deviated from their standard of care when they carelessly injected the medication into arteries, contrary to clear instructions. At the time that Levine received her treatment, the Phenergan package insert contained four separate warnings about the risk of gangrene from improper injection of the medication into an artery. For instance, the “Adverse Reactions” section of the Phenergan label indicated: “INTRA-ARTERIAL INJECTION [CAN] RESULT IN GANGRENE OF THE AFFECTED EXTREMITY.” Continue reading →

Categories: Statistics | Tags: evidence based policy, Generic drug, law and statistics, Lawsuit, malpractice, Tort law | Leave a comment

Generic Drugs Resistant to Lawsuits

Posted on March 22, 2012 by Mayo

Waiting for my plane at La Guardia, I see that the NYT has an article on page one about the disparity between suing brand name vs. generic drug makers for failure to adequately warn of serious side effects on their drug labels. Can it be that no one is responsible for monitoring/updating drug label warnings once a drug becomes generic?

Debbie Schork, a deli worker at a supermarket in Indiana, had to have her hand amputated after an emergency room nurse injected her with an anti-nausea drug, causing gangrene. She sued the manufacturer named in the hospital’s records for failing to warn about the risks of injecting it. Her case was quietly thrown out of court last fall.

That result stands in sharp contrast to the highly publicized case of Diana Levine, a professional musician from Vermont. Her hand and forearm were amputated because of gangrene after a physician assistant at a health clinic injected her with the same drug. She sued the drug maker, Wyeth, and won $6.8 million.

The financial outcomes were radically different for one reason: Ms. Schork had received the generic version of the drug, known as promethazine, while Ms. Levine had been given the brand name, Phenergan.

“Explain the difference between the generic and the real one — it’s just a different company making the same thing,” Ms. Schork said.

Continue reading →

Categories: Statistics | Tags: evidence based policy, Food and Drug Administration, Generic drug, Pharmaceutical industry, Supreme Court | 4 Comments

Objectivity (#5): Three Reactions to the Challenge of Objectivity (in inference):

Posted on March 18, 2012 by Mayo

(1) If discretionary judgments are thought to introduce subjectivity in inference, a classic strategy thought to achieve objectivity is to extricate such choices, replacing them with purely formal a priori computations or agreed-upon conventions (see March 14). If leeway for discretion introduces subjectivity, then cutting off discretion must yield objectivity! Or so some argue. Such strategies may be found, to varying degrees, across the different approaches to statistical inference.

The inductive logics of the type developed by Carnap promised to be an objective guide for measuring degrees of confirmation in hypotheses, despite much-discussed problems, paradoxes, and conflicting choices of confirmation logics. In Carnapian inductive logics, initial assignments of probability are based on a choice of language and on intuitive, logical principles. The consequent logical probabilities can then be updated (given the statements of evidence) with Bayes’s Theorem. The fact that the resulting degrees of confirmation are at the same time analytical and a priori—giving them an air of objectivity–reveals the central weakness of such confirmation theories as “guides for life”, e.g., —as guides, say, for empirical frequencies or for finding things out in the real world. Something very similar happens with the varieties of “objective’” Bayesian accounts, both in statistics and in formal Bayesian epistemology in philosophy (a topic to which I will return; if interested, see my RMM contribution).

A related way of trying to remove latitude for discretion might be to define objectivity in terms of the consensus of a specified group, perhaps of experts, or of agents with “diverse” backgrounds. Once again, such a convention may enable agreement yet fail to have the desired link-up with the real world. It would be necessary to show why consensus reached by the particular choice of group (another area for discretion) achieves the learning goals of interest.

Continue reading →

Categories: Objectivity, Objectivity, Statistics | Tags: Bayesianism, Kyburg, Savage | Leave a comment

Objectivity (#4) and the “Argument From Discretion”

Posted on March 14, 2012 by Mayo

We constantly hear that procedures of inference are inescapably subjective because of the latitude of human judgment as it bears on the collection, modeling, and interpretation of data. But this is seriously equivocal: Being the product of a human subject is hardly the same as being subjective, at least not in the sense we are speaking of—that is, as a threat to objective knowledge. Are all these arguments about the allegedly inevitable subjectivity of statistical methodology rooted in equivocations? I argue that they are!

Insofar as humans conduct science and draw inferences, it is obvious that human judgments and human measurements are involved. True enough, but too trivial an observation to help us distinguish among the different ways judgments should enter, and how, nevertheless, to avoid introducing bias and unwarranted inferences. The issue is not that a human is doing the measuring, but whether we can reliably use the thing being measured to find out about the world.

Continue reading →

Categories: Objectivity, Objectivity, Statistics | Tags: dirty hands argument, evidence based policy | 29 Comments

Statistics

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.