Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

metablog old fashion typewriterMemory lane: Did you ever consider how some of the colorful exchanges among better-known names in statistical foundations could be the basis for high literary drama in the form of one-act plays (even if appreciated by only 3-7 people in the world)? (Think of the expressionist exchange between Bohr and Heisenberg in Michael Frayn’s play Copenhagen, except here there would be no attempt at all to popularize—only published quotes and closely remembered conversations would be included, with no attempt to create a “story line”.)  Somehow I didn’t think so. But rereading some of Savage’s high-flown praise of Birnbaum’s “breakthrough” argument (for the Likelihood Principle) today, I was swept into a “(statistical) theater of the absurd” mindset.

The first one came to me in autumn 2008 while I was giving a series of seminars on philosophy of statistics at the LSE. Modeled on a disappointing (to me) performance of The Woman in Black, “A Funny Thing Happened at the [1959] Savage Forum” relates Savage’s horror at George Barnard’s announcement of having rejected the Likelihood Principle!

The current piece also features George Barnard and since Monday (9/23) is Barnard’s birthday, I’m digging it out of “rejected posts” to reblog it. It recalls our first meeting in London in 1986. I’d sent him a draft of my paper “Why Pearson Rejected the Neyman-Pearson Theory of Statistics” (later adapted as chapter 11 of EGEK) to see whether I’d gotten Pearson right. He’d traveled quite a ways, from Colchester, I think. It was June and hot, and we were up on some kind of a semi-enclosed rooftop. Barnard was sitting across from me looking rather bemused.Barnard-1979-picture

The curtain opens with Barnard and Mayo on the roof, lit by a spot mid-stage. He’s drinking (hot) tea; she, a Diet Coke. The dialogue (is what I recall from the time[i]):

 Barnard: I read your paper. I think it is quite good.  Did you know that it was I who told Fisher that Neyman-Pearson statistics had turned his significance tests into little more than acceptance procedures?

Mayo:  Thank you so much for reading my paper.  I recall a reference to you in Pearson’s response to Fisher, but I didn’t know the full extent.

Barnard: I was the one who told Fisher that Neyman was largely to blame. He shouldn’t be too hard on Egon.  His statistical philosophy, you are aware, was different from Neyman’s.

Mayo:  That’s interesting.  I did quote Pearson, at the end of his response to Fisher, as saying that inductive behavior was “Neyman’s field, not mine”.  I didn’t know your role in his laying the blame on Neyman!

Fade to black. The lights go up on Fisher, stage left, flashing back some 30 years earlier . . . ….

Fisher: Now, acceptance procedures are of great importance in the modern world.  When a large concern like the Royal Navy receives material from an engineering firm it is, I suppose, subjected to sufficiently careful inspection and testing to reduce the frequency of the acceptance of faulty or defective consignments. . . . I am casting no contempt on acceptance procedures, and I am thankful, whenever I travel by air, that the high level of precision and reliability required can really be achieved by such means.  But the logical differences between such an operation and the work of scientific discovery by physical or biological experimentation seem to me so wide that the analogy between them is not helpful . . . . [Advocates of behavioristic statistics are like]

Russians [who] are made familiar with the ideal that research in pure science can and should be geared to technological performance, in the comprehensive organized effort of a five-year plan for the nation. . . .

In the U.S. also the great importance of organized technology has I think made it easy to confuse the process appropriate for drawing correct conclusions, with those aimed rather at, let us say, speeding production, or saving money. (Fisher 1955, 69-70)

Fade to black.  The lights go up on Egon Pearson stage right (who looks like he does in my sketch [frontispiece] from EGEK 1996, a bit like a young C. S. Peirce):

Pearson: There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans. . . . Indeed, to dispel the picture of the Russian technological bogey, I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot . . . . To the best of my ability I was searching for a way of expressing in mathematical terms what appeared to me to be the requirements of the scientist in applying statistical tests to his data.  (Pearson 1955, 204)

Fade to black. The spotlight returns to Barnard and Mayo, but brighter. It looks as if it’s gotten hotter.  Barnard wipes his brow with a white handkerchief.  Mayo drinks her Diet Coke.

Barnard (ever so slightly angry): You have made one blunder in your paper. Fisher would never have made that remark about Russia.

There is a tense silence.

Mayo: But—it was a quote.

End of Act 1.

Given this was pre-internet, we couldn’t go to the source then and there, so we agreed to search for the paper in the library. Well, you get the idea. Maybe I could call the piece “Stat on a Hot Tin Roof.”

If you go see it, don’t say I didn’t warn you.

I’ve gotten various new speculations over the years as to why he had this reaction to the mention of Russia (check discussions in earlier posts with this play). Feel free to share yours. Some new (to me) information on Barnard is in George Box’s recent autobiography.


[i] We had also discussed this many years later, in 1999.

 

Categories: Barnard, phil/history of stat, rejected post, Statistics | Tags: , , , , | 3 Comments

Uncle Sam wants YOU to help with scientific reproducibility!

You still have a few days to respond to the call of your country to solve problems of scientific reproducibility!

The following passages come from Retraction Watch, with my own recommendations at the end.

“White House takes notice of reproducibility in science, and wants your opinion”

ostpThe White House’s Office of Science and Technology Policy (OSTP) is taking a look at innovation and scientific research, and issues of reproducibility have made it onto its radar.

Here’s the description of the project from the Federal Register:

The Office of Science and Technology Policy and the National Economic Council request public comments to provide input into an upcoming update of the Strategy for American Innovation, which helps to guide the Administration’s efforts to promote lasting economic growth and competitiveness through policies that support transformative American innovation in products, processes, and services and spur new fundamental discoveries that in the long run lead to growing economic prosperity and rising living standards.

I wonder what Steven Pinker would say about some of the above verbiage?

And here’s what’s catching the eye of people interested in scientific reproducibility:

(11) Given recent evidence of the irreproducibility of a surprising number of published scientific findings, how can the Federal Government leverage its role as a significant funder of scientific research to most effectively address the problem?

The OSTP is the same office that, in 2013, took what Nature called “a long-awaited leap forward for open access” when it said “that publications from taxpayer-funded research should be made free to read after a year’s delay.That OSTP memo came after more than 65,000 people “signed a We the People petition asking for expanded public access to the results of taxpayer-funded research.”

Have ideas on improving reproducibility? Emails to innovationstrategy@ostp.gov are preferred, according to the notice, which also explains how to fax or mail comments. The deadline is September 23.

Off the top of my head, how about:

Promote the use of methodologies that:

  • control and assess the capabilities of methods to avoid mistaken inferences from data;
  • require demonstrated self-criticism all the way from the data collection, modelling and interpretation (statistical and substantive);
  • describe what is especially shaky or poorly probed thus far (and spell out how subsequent studies are most likely to locate those flaws)[i]

Institute penalties for QRPs and fraud?

Please offer your suggestions in the comments, or directly to Uncle Sam.

 [i]It may require a certain courage on the part of researchers, journalists, referees.

Categories: Announcement, reproducibility | 17 Comments

A crucial missing piece in the Pistorius trial? (2): my answer (Rejected Post)

images-1

Time for a break with a “Rejected Post”[i]

There’s one crucial point that Prosecutor Nell overlooked and failed to employ in the Oscar Pistorius trial–or so it appears. In fact I haven’t heard anyone mention it—so maybe it’s not as critical as I think it is. Before revealing (what I regard as) an important missing piece, I ask readers and legal beagles out there for their informal take.

Here are some items from the announced verdict (which do not directly give away the missing piece, but may be enough to deduce it). (A general article is here.)

Oscar Pistorius ‘not guilty’ of girlfriend’s murder, rules judge Thokozile Masipa

AP | September 11, 2014, 17.09 pm IST

Before the break, Judge Masipa ruled out “dolus eventualis”[ii], saying Mr Pistorius could not have foreseen he would kill the person behind the toilet door.

“How could the accused have reasonably foreseen the shot he fired would have killed the deceased? Clearly he did not subjectively foresee this, that he would have killed the person behind the door, let alone the deceased,” said Judge Masipa.

The judge said the defence argues it is highly improbable the accused could have made this up so quickly and consistently, even in his bail application, [really?]

….Evidence shows that at time he fired shots at toilet door, Mr Pistorius believed the deceased was in the bedroom, the judge says. This belief was communicated to a number of people shortly after the incident, she added.

The judge said there is “nothing in the evidence to suggest that Mr Pistorius’ belief was not genuinely entertained”. She cites reasons including the bathroom window being open, and the toilet door being shut.

… He… [said] he genuinely, though erroneously, believed that his life and that of the deceased was in danger,” the judge said….

The starting point is “whether accused had intention to kill person behind toilet door,” the judge said.

“The blow was meant for the person behind the toilet door who the accused believed was an intruder. The blow struck and killed the person behind the door. The fact that the person behind the door turned out to be the deceased and not an intruder [is] irrelevant,” the judge said.…

The accused was clearly not candid with the court when he said he did not want to shoot anyone, as he had a loaded firearm and was ready to shoot, the judge said.

The deceased was killed under “very peculiar” circumstances, the judge said. It makes no sense that Ms Steenkamp did not hear Mr Pistorius scream “get out”. The other question is why the accused fired not one, but four shots before he ran back to the room to try and find Ms Steenkamp, the judge says.….

Mr Pistorius had approached the bathroom with a gun, the judge said. However, the state has argued that if Mr Pistorius had no intention to shoot anyone, he cannot use self-defence as a defence, the judge said.

The essence of the accused’s defence is that he had no intention to shoot anyone, but if he is found to have such an intention, it was because he believed he was under threat from an intruder, the judge added.….

Judge Masipa: “[Mr Pistorius] stated that if he wanted to shoot the intruder, he would have shot higher up and more in the direction where the opening of the door would be to the far right of the door and at chest height. I pause to state that this… is inconsistent with someone who shot without thinking. I shall revert to this later in my judgement.”

“The accused stated that he never thought of the possibility that he could kill people in the toilet. He considered however that thinking back retrospectively, it would be a probability that someone could be killed in the toilet.”…

The prosecution has argued that the presence of partly digested food in Ms Steenkamp’s stomach indicated that Mr Pistorius’ testimony of when the couple had their last meal was not true, the judge noted.

However, she said: “The experts agreed that gastric emptying was not an exact science. It would therefore be unwise for this court to figure out what the presence of partially digested food might mean.”

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Brief background of what Oscar claimed: He is right next to Reeva in bed (she asks if he can’t sleep), he gets up, hears someone in the bathroom, gets his gun, tells Reeva to call the cops, and get down on the floor (!) even though he doesn’t see her, even though he gets no response at all, none. Well there’s lots more about furniture that couldn’t have been where he said, he alleged the cops moved everything to screw up his story…He begins to stalk “the intruder”:

^^^^^^^^^^^^^

Oscar repeatedly emphasized (through the trial) his fear of this (alleged) intruder, he stayed quiet, moved stealthily into the bathroom, terrified that this intruder would discern his position and overpower him at any moment. He couldn’t let that happen. He had to get the intruder before he got him. He doesn’t call the nearby guard for help, doesn’t run out the house with Reeva, nothing like that. Oscar says he could only think of his terrible fear of the intruder…. It’s his style to confront danger until he himself quashes it. He could never turn his back on a threat of danger to himself..

As he rounds the bend of the bathroom, he hears the intruder shut the door of the toilet area. He recounts (at length) how frightened he was that the intruder will come out any second and get him! So he shoots, once. A pause maybe…then shoots 3 more times and stops. Why did he stop?

There’s no sound from the locked toilet now, but he doesn’t know he’d killed the intruder. If the intruder is maybe standing on the toilet he might have escaped all the bullets. He doesn’t find out, he doesn’t shoot any more. Yet he has no reason to think the danger is gone. (The situation’s not really different from before he fired the shots, from his alleged perspective.)

If the intruder survived the blasts, he’s going to come get him. It would not have been safe to turn his back. Yet he leaves the bathroom to (allegedly) start looking behind curtains (!) for Reeva….

So why did he stop shooting?

Answer: because the screaming stopped.

 

[i]This is a “rejected post” since it it is off the topic of PhilStat/PhilSci. It moves over to a distinct Rejected Post blog (eventually). I use it less often now that Twitter is available for micro versions of such things.

[ii]“Dolus eventualis murder, also known as common murder, is a lesser charge. In essence, it means that you are responsible for the foreseeable consequences of your actions.

According to South African law, intent in the form of dolus eventualis means it is enough to find someone guilty of murder if the perpetrator objectively foresees the possibility of his or her act causing death and persists regardless of the consequences.”

http://www.independent.co.uk/news/world/africa/dolus-eventualis-what-is-it-and-how-has-it-affected-the-oscar-pistorius-trial-9726275.html

Categories: rejected post | 8 Comments

“The Supernal Powers Withhold Their Hands And Let Me Alone” : C.S. Peirce

C. S. Peirce: 10 Sept, 1839-19 April, 1914

C. S. Peirce: 10 Sept, 1839-19 April, 1914

Memory Lane* in Honor of C.S. Peirce’s Birthday:
(Part 3) of “Peircean Induction and the Error-Correcting Thesis”

Deborah G. Mayo
Transactions of the Charles S. Peirce Society 41(2) 2005: 299-319

(9/10) Peircean Induction and the Error-Correcting Thesis (Part I)

(9/10) (Part 2) Peircean Induction and the Error-Correcting Thesis

8. Random sampling and the uniformity of nature

We are now at the point to address the final move in warranting Peirce’s [self-correcting thesis] SCT. The severity or trustworthiness assessment, on which the error correcting capacity depends, requires an appropriate link (qualitative or quantitative) between the data and the data generating phenomenon, e.g., a reliable calibration of a scale in a qualitative case, or a probabilistic connection between the data and the population in a quantitative case. Establishing such a link, however, is regarded as assuming observed regularities will persist, or making some “uniformity of nature” assumption—the bugbear of attempts to justify induction.

But Peirce contrasts his position with those favored by followers of Mill, and “almost all logicians” of his day, who “commonly teach that the inductive conclusion approximates to the truth because of the uniformity of nature” (2.775). Inductive inference, as Peirce conceives it (i.e., severe testing) does not use the uniformity of nature as a premise. Rather, the justification is sought in the manner of obtaining data. Justifying induction is a matter of showing that there exist methods with good error probabilities. For this it suffices that randomness be met only approximately, that inductive methods check their own assumptions, and that they can often detect and correct departures from randomness.

… It has been objected that the sampling cannot be random in this sense. But this is an idea which flies far away from the plain facts. Thirty throws of a die constitute an approximately random sample of all the throws of that die; and that the randomness should be approximate is all that is required. (1.94)

Peirce backs up his defense with robustness arguments. For example, in an (attempted) Binomial induction, Peirce asks, “what will be the effect upon inductive inference of an imperfection in the strictly random character of the sampling” (2.728). What if, for example, a certain proportion of the population had twice the probability of being selected? He shows that “an imperfection of that kind in the random character of the sampling will only weaken the inductive conclusion, and render the concluded ratio less determinate, but will not necessarily destroy the force of the argument completely” (2.728). This is particularly so if the sample mean is near 0 or 1. In other words, violating experimental assumptions may be shown to weaken the trustworthiness or severity of the proceeding, but this may only mean we learn a little less.

Yet a further safeguard is at hand:

Nor must we lose sight of the constant tendency of the inductive process to correct itself. This is of its essence. This is the marvel of it. …even though doubts may be entertained whether one selection of instances is a random one, yet a different selection, made by a different method, will be likely to vary from the normal in a different way, and if the ratios derived from such different selections are nearly equal, they may be presumed to be near the truth. (2.729)

Here, the marvel is an inductive method’s ability to correct the attempt at random sampling. Still, Peirce cautions, we should not depend so much on the self-correcting virtue that we relax our efforts to get a random and independent sample. But if our effort is not successful, and neither is our method robust, we will probably discover it. “This consideration makes it extremely advantageous in all ampliative reasoning to fortify one method of investigation by another” (ibid.).

“The Supernal Powers Withhold Their Hands And Let Me Alone”

Peirce turns the tables on those skeptical about satisfying random sampling—or, more generally, satisfying the assumptions of a statistical model. He declares himself “willing to concede, in order to concede as much as possible, that when a man draws instances at random, all that he knows is that he tried to follow a certain precept” (2.749). There might be a “mysterious and malign connection between the mind and the universe” that deliberately thwarts such efforts. He considers betting on the game of rouge et noire: “could some devil look at each card before it was turned, and then influence me mentally” to bet or not, the ratio of successful bets might differ greatly from 0.5. But, as Peirce is quick to point out, this would equally vitiate deductive inferences about the expected ratio of successful bets.

Consider our informal example of weighing with calibrated scales. If I check the properties of the scales against known, standard weights, then I can check if my scales are working in a particular case. Were the scales infected by systematic error, I would discover this by finding systematic mismatches with the known weights; I could then subtract it out in measurements. That scales have given properties where I know the object’s weight indicates they have the same properties when the weights are unknown, lest I be forced to assume that my knowledge or ignorance somehow influences the properties of the scale. More generally, Peirce’s insightful argument goes, the experimental procedure thus confirmed where the measured property is known must work as well when it is unknown unless a mysterious and malign demon deliberately thwarts my efforts.

Peirce therefore grants that the validity of induction is based on assuming “that the supernal powers withhold their hands and let me alone, and that no mysterious uniformity … interferes with the action of chance” (ibid.). But this is very different from the uniformity of nature assumption.

…the negative fact supposed by me [no mysterious force interferes with the action of chance] is merely the denial of any major premise from which the falsity of the inductive conclusion could be deduced. Actually so long as the influence of this mysterious source not be overwhelming, the wonderful self-correcting nature of the ampliative inference would enable us, even so, to detect and make allowance for them. (2.749)

Not only do we not need the uniformity of nature assumption, Peirce declares “That there is a general tendency toward uniformity in nature is not merely an unfounded, it is an absolutely absurd, idea in any other sense than that man is adapted to his surroundings” (2.750). In other words, it is not nature that is uniform, it is we who are able to find patterns enough to serve our needs and interests. But the validity of inductive inference does not depend on this.

9. Conclusion

For Peirce, “the true guarantee of the validity of induction” is that it is a method of reaching conclusions which corrects itself; inductive methods—understood as methods of severe testing—are justified to the extent that they are error-correcting methods (SCT). I have argued that the well-known skepticism as regards Peirce’s SCT is based on erroneous views concerning the nature of inductive testing as well as what is required for a method to be self-correcting. By revisiting these two theses, justifying the SCT boils down to showing that severe testing methods exist and that they enable reliable means for learning from error.

An inductive inference to hypothesis H is warranted to the extent that H passes a severe test, that is, one which, with high probability, would have detected a specific flaw or departure from what H asserts, and yet it did not. Deliberately making use of known flaws and fallacies in reasoning with limited and uncertain data, tests may be constructed that are highly trustworthy probes in detecting and discriminating errors in particular cases. Modern statistical methods (e.g., statistical significance tests) based on controlling a test’s error probabilities provide tools which, when properly interpreted, afford severe tests. While on the one hand, contemporary statistical methods increase the mathematical rigor and generality of Peirce’s SCT, on the other, Peirce provides something current statistical methodology lacks: an account of inductive inference and a philosophy of experiment that links the justification for statistical tests to a more general rationale for scientific induction. Combining the mathematical contributions of modern statistics with the inductive philosophy of Peirce sets the stage for developing an adequate solution to the age-old problem of induction. To carry out this project fully is a topic for future work.**

[You can find a pdf version of this paper here.]

REFERENCES and Notes (see part 1)

*This is reblogged from 1 year ago. If there’s a single philosopher I’d say will reward your careful rereading, it’s Peirce. Maybe a glimmer of the idea of inductive-statistical inference as severe testing will shine through…

**That was 2005; I think (hope) I’ve made headway since then.

Categories: C.S. Peirce, Error Statistics, phil/history of stat | 11 Comments

Statistical Science: The Likelihood Principle issue is out…!

Stat SciAbbreviated Table of Contents:

Table of ContentsHere are some items for your Saturday-Sunday reading. 

Link to complete discussion: 

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29 (2014), no. 2, 227-266.

Links to individual papers:

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle. Statistical Science 29 (2014), no. 2, 227-239.

Dawid, A. P. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 240-241.

Evans, Michael. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 242-246.

Martin, Ryan; Liu, Chuanhai. Discussion: Foundations of Statistical Inference, Revisited. Statistical Science 29 (2014), no. 2, 247-251.

Fraser, D. A. S. Discussion: On Arguments Concerning Statistical Principles. Statistical Science 29 (2014), no. 2, 252-253.

Hannig, Jan. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 254-258.

Bjørnstad, Jan F. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 259-260.

Mayo, Deborah G. Rejoinder: “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 261-266.

Abstract: An essential component of inference based on familiar frequentist notions, such as p-values, significance and confidence levels, is the relevant sampling distribution. This feature results in violations of a principle known as the strong likelihood principle (SLP), the focus of this paper. In particular, if outcomes x and y from experiments E1 and E2 (both with unknown parameter θ), have different probability models f1( . ), f2( . ), then even though f1(xθ) = cf2(yθ) for all θ, outcomes x and ymay have different implications for an inference about θ. Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox [Ann. Math. Statist. 29 (1958) 357–372] proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which Ei produced the measurement, the assessment should be in terms of the properties of Ei. The surprising upshot of Allan Birnbaum’s [J.Amer.Statist.Assoc.57(1962) 269–306] argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP), entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases also refute [WCP entails SLP].

Key words: Birnbaumization, likelihood principle (weak and strong), sampling theory, sufficiency, weak conditionality

Regular readers of this blog know that the topic of the “Strong Likelihood Principle (SLP)” has come up quite frequently. Numerous informal discussions of earlier attempts to clarify where Birnbaum’s argument for the SLP goes wrong may be found on this blog. [SEE PARTIAL LIST BELOW.[i]] These mostly stem from my initial paper Mayo (2010) [ii]. I’m grateful for the feedback.

In the months since this paper has been accepted for publication, I’ve been asked, from time to time, to reflect informally on the overall journey: (1) Why was/is the Birnbaum argument so convincing for so long? (Are there points being overlooked, even now?) (2) What would Birnbaum have thought? (3) What is the likely upshot for the future of statistical foundations (if any)?

I’ll try to share some responses over the next week. (Naturally, additional questions are welcome.)

[i] A quick take on the argument may be found in the appendix to: “A Statistical Scientist Meets a Philosopher of Science: A conversation between David Cox and Deborah Mayo (as recorded, June 2011)”

 UPhils and responses

 

 

Categories: Birnbaum, Birnbaum Brakes, frequentist/Bayesian, Likelihood Principle, phil/history of stat, Statistics | 40 Comments

All She Wrote (so far): Error Statistics Philosophy Contents-3 years on

 

old blogspot typewriter

.

Error Statistics Philosophy: Blog Contents
By: D. G. Mayo[i]

Each month, I will mark (in red) 3 relevant posts (from that month 3 yrs ago) for readers wanting to catch-up or review central themes and discussions.

September 2011

October 2011

November 2011

December 2011

January 2012

February 2012

March 2012

April 2012

May 2012

June 2012

July 2012

August 2012

September 2012

October 2012

November 2012

December 2012

January 2013

  • (1/2) Severity as a ‘Metastatistical’ Assessment
  • (1/4) Severity Calculator
  • (1/6) Guest post: Bad Pharma? (S. Senn)
  • (1/9) RCTs, skeptics, and evidence-based policy
  • (1/10) James M. Buchanan
  • (1/11) Aris Spanos: James M. Buchanan: a scholar, teacher and friend
  • (1/12) Error Statistics Blog: Table of Contents
  • (1/15) Ontology & Methodology: Second call for Abstracts, Papers
  • (1/18) New Kvetch/PhilStock
  • (1/19) Saturday Night Brainstorming and Task Forces: (2013) TFSI on NHST
  • (1/22) New PhilStock
  • (1/23) P-values as posterior odds?
  • (1/26) Coming up: December U-Phil Contributions….
  • (1/27) U-Phil: S. Fletcher & N.Jinn
  • (1/30) U-Phil: J. A. Miller: Blogging the SLP

February 2013

  • (2/2) U-Phil: Ton o’ Bricks
  • (2/4) January Palindrome Winner
  • (2/6) Mark Chang (now) gets it right about circularity
  • (2/8) From Gelman’s blog: philosophy and the practice of Bayesian statistics
  • (2/9) New kvetch: Filly Fury
  • (2/10) U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof
  • (2/11) U-Phil: Mayo’s response to Hennig and Gandenberger
  • (2/13) Statistics as a Counter to Heavyweights…who wrote this?
  • (2/16) Fisher and Neyman after anger management?
  • (2/17) R. A. Fisher: how an outsider revolutionized statistics
  • (2/20) Fisher: from ‘Two New Properties of Mathematical Likelihood’
  • (2/23) Stephen Senn: Also Smith and Jones
  • (2/26) PhilStock: DO < $70
  • (2/26) Statistically speaking…

March 2013

  • (3/1) capitalizing on chance
  • (3/4) Big Data or Pig Data?
  • (3/7) Stephen Senn: Casting Stones
  • (3/10) Blog Contents 2013 (Jan & Feb)
  • (3/11) S. Stanley Young: Scientific Integrity and Transparency
  • (3/13) Risk-Based Security: Knives and Axes
  • (3/15) Normal Deviate: Double Misunderstandings About p-values
  • (3/17) Update on Higgs data analysis: statistical flukes (1)
  • (3/21) Telling the public why the Higgs particle matters
  • (3/23) Is NASA suspending public education and outreach?
  • (3/27) Higgs analysis and statistical flukes (part 2)
  • (3/31) possible progress on the comedy hour circuit?

April 2013

  • (4/1) Flawed Science and Stapel: Priming for a Backlash?
  • (4/4) Guest Post. Kent Staley: On the Five Sigma Standard in Particle Physics
  • (4/6) Who is allowed to cheat? I.J. Good and that after dinner comedy hour….
  • (4/10) Statistical flukes (3): triggering the switch to throw out 99.99% of the data
  • (4/11) O & M Conference (upcoming) and a bit more on triggering from a participant…..
  • (4/14) Does statistics have an ontology? Does it need one? (draft 2)
  • (4/19) Stephen Senn: When relevance is irrelevant
  • (4/22) Majority say no to inflight cell phone use, knives, toy bats, bow and arrows, according to survey
  • (4/23) PhilStock: Applectomy? (rejected post)
  • (4/25) Blog Contents 2013 (March)
  • (4/27) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill, comedy hour)
  • (4/29) What should philosophers of science do? (falsification, Higgs, statistics, Marilyn)

May 2013

  • (5/3) Schedule for Ontology & Methodology, 2013
  • (5/6) Professorships in Scandal?
  • (5/9) If it’s called the “The High Quality Research Act,” then ….
  • (5/13) ‘No-Shame’ Psychics Keep Their Predictions Vague: New Rejected post
  • (5/14) “A sense of security regarding the future of statistical science…” Anon review of Error and Inference
  • (5/18) Gandenberger on Ontology and Methodology (May 4) Conference: Virginia Tech
  • (5/19) Mayo: Meanderings on the Onto-Methodology Conference
  • (5/22) Mayo’s slides from the Onto-Meth conference
  • (5/24) Gelman sides w/ Neyman over Fisher in relation to a famous blow-up
  • (5/26) Schachtman: High, Higher, Highest Quality Research Act
  • (5/27) A.Birnbaum: Statistical Methods in Scientific Inference
  • (5/29) K. Staley: review of Error & Inference

June 2013

  • (6/1) Winner of May Palindrome Contest
  • (6/1) Some statistical dirty laundry
  • (6/5) Do CIs Avoid Fallacies of Tests? Reforming the Reformers (Reblog 5/17/12):
  • (6/6) PhilStock: Topsy-Turvy Game
  • (6/6) Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)
  • (6/8) Richard Gill: “Integrity or fraud… or just questionable research practices?”
  • (6/11) Mayo: comment on the repressed memory research
  • (6/14) P-values can’t be trusted except when used to argue that p-values can’t be trusted!
  • (6/19) PhilStock: The Great Taper Caper
  • (6/19) Stanley Young: better p-values through randomization in microarrays
  • (6/22) What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri
  • (6/26) Why I am not a “dualist” in the sense of Sander Greenland
  • (6/29) Palindrome “contest” contest
  • (6/30) Blog Contents: mid-year

July 2013

  • (7/3) Phil/Stat/Law: 50 Shades of gray between error and fraud
  • (7/6) Bad news bears: ‘Bayesian bear’ rejoinder–reblog mashup
  • (7/10) PhilStatLaw: Reference Manual on Scientific Evidence (3d ed) on Statistical Significance (Schachtman)
  • (7/11) Is Particle Physics Bad Science? (memory lane)
  • (7/13) Professor of Philosophy Resigns over Sexual Misconduct (rejected post)
  • (7/14) Stephen Senn: Indefinite irrelevance
  • (7/17) Phil/Stat/Law: What Bayesian prior should a jury have? (Schachtman)
  • (7/19) Msc Kvetch: A question on the Martin-Zimmerman case we do not hear
  • (7/20) Guest Post: Larry Laudan. Why Presuming Innocence is Not a Bayesian Prior
  • (7/23) Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs
  • (7/26) New Version: On the Birnbaum argument for the SLP: Slides for JSM talk

August 2013

  • (8/1) Blogging (flogging?) the SLP: Response to Reply- Xi’an Robert
  • (8/5) At the JSM: 2013 International Year of Statistics
  • (8/6) What did Nate Silver just say? Blogging the JSM
  • (8/9) 11th bullet, multiple choice question, and last thoughts on the JSM
  • (8/11) E.S. Pearson: “Ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot”
  • (8/13) Blogging E.S. Pearson’s Statistical Philosophy
  • (8/15) A. Spanos: Egon Pearson’s Neglected Contributions to Statistics
  • (8/17) Gandenberger: How to Do Philosophy That Matters (guest post)
  • (8/21) Blog contents: July, 2013
  • (8/22) PhilStock: Flash Freeze
  • (8/22) A critical look at “critical thinking”: deduction and induction
  • (8/28) Is being lonely unnatural for slim particles? A statistical argument
  • (8/31) Overheard at the comedy hour at the Bayesian retreat-2 years on

September 2013

  • (9/2) Is Bayesian Inference a Religion?
  • (9/3) Gelman’s response to my comment on Jaynes
  • (9/5) Stephen Senn: Open Season (guest post)
  • (9/7) First blog: “Did you hear the one about the frequentist…”? and “Frequentists in Exile”
  • (9/10) Peircean Induction and the Error-Correcting Thesis (Part I)
  • (9/10) (Part 2) Peircean Induction and the Error-Correcting Thesis
  • (9/12) (Part 3) Peircean Induction and the Error-Correcting Thesis
  • (9/14) “When Bayesian Inference Shatters” Owhadi, Scovel, and Sullivan (guest post)
  • (9/18) PhilStock: Bad news is good news on Wall St.
  • (9/18) How to hire a fraudster chauffeur
  • (9/22) Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”
  • (9/23) Barnard’s Birthday: background, likelihood principle, intentions
  • (9/24) Gelman est efffectivement une erreur statistician
  • (9/26) Blog Contents: August 2013
  • (9/29) Highly probable vs highly probed: Bayesian/ error statistical differences

October 2013

  • (10/3) Will the Real Junk Science Please Stand Up? (critical thinking)
  • (10/5) Was Janina Hosiasson pulling Harold Jeffreys’ leg?
  • (10/9) Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock
  • (10/12) Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”
  • (10/19) Blog Contents: September 2013
  • (10/19) Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*
  • (10/25) Bayesian confirmation theory: example from last post…
  • (10/26) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)
  • (10/31) WHIPPING BOYS AND WITCH HUNTERS

November 2013

  • (11/2) Oxford Gaol: Statistical Bogeymen
  • (11/4) Forthcoming paper on the strong likelihood principle
  • (11/9) Null Effects and Replication
  • (11/9) Beware of questionable front page articles warning you to beware of questionable front page articles (iii)
  • (11/13) T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)
  • (11/16) PhilStock: No-pain bull
  • (11/16) S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)
  • (11/18) Lucien Le Cam: “The Bayesians hold the Magic”
  • (11/20) Erich Lehmann: Statistician and Poet
  • (11/23) Probability that it is a statistical fluke [i]
  • (11/27) “The probability that it be a statistical fluke” [iia]
  • (11/30) Saturday night comedy at the “Bayesian Boy” diary (rejected post*)

December 2013

  • (12/3) Stephen Senn: Dawid’s Selection Paradox (guest post)
  • (12/7) FDA’s New Pharmacovigilance
  • (12/9) Why ecologists might want to read more philosophy of science (UPDATED)
  • (12/11) Blog Contents for Oct and Nov 2013
  • (12/14) The error statistician has a complex, messy, subtle, ingenious piece-meal approach
  • (12/15) Surprising Facts about Surprising Facts
  • (12/19) A. Spanos lecture on “Frequentist Hypothesis Testing”
  • (12/24) U-Phil: Deconstructions [of J. Berger]: Irony & Bad Faith 3
  • (12/25) “Bad Arguments” (a book by Ali Almossawi)
  • (12/26) Mascots of Bayesneon statistics (rejected post)
  • (12/27) Deconstructing Larry Wasserman
  • (12/28) More on deconstructing Larry Wasserman (Aris Spanos)
  • (12/28) Wasserman on Wasserman: Update! December 28, 2013
  • (12/31) Midnight With Birnbaum (Happy New Year)

January 2014

  • (1/2) Winner of the December 2013 Palindrome Book Contest (Rejected Post)
  • (1/3) Error Statistics Philosophy: 2013
  • (1/4) Your 2014 wishing well. …
  • (1/7) “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech)
  • (1/11) Two Severities? (PhilSci and PhilStat)
  • (1/14) Statistical Science meets Philosophy of Science: blog beginnings
  • (1/16) Objective/subjective, dirty hands and all that: Gelman/Wasserman blogolog (ii)
  • (1/18) Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]
  • (1/22) Phil6334: “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos (Virginia Tech) UPDATE: JAN 21
  • (1/24) Phil 6334: Slides from Day #1: Four Waves in Philosophy of Statistics
  • (1/25) U-Phil (Phil 6334) How should “prior information” enter in statistical inference?
  • (1/27) Winner of the January 2014 palindrome contest (rejected post)
  • (1/29) BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics
  • (1/31) Phil 6334: Day #2 Slides

February 2014

  • (2/1) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs B-boosts)
  • (2/3) PhilStock: Bad news is bad news on Wall St. (rejected post)
  • (2/5) “Probabilism as an Obstacle to Statistical Fraud-Busting” (draft iii)
  • (2/9) Phil6334: Day #3: Feb 6, 2014
  • (2/10) Is it true that all epistemic principles can only be defended circularly? A Popperian puzzle
  • (2/12) Phil6334: Popper self-test
  • (2/13) Phil 6334 Statistical Snow Sculpture
  • (2/14) January Blog Table of Contents
  • (2/15) Fisher and Neyman after anger management?
  • (2/17) R. A. Fisher: how an outsider revolutionized statistics
  • (2/18) Aris Spanos: The Enduring Legacy of R. A. Fisher
  • (2/20) R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
  • (2/21) STEPHEN SENN: Fisher’s alternative to the alternative
  • (2/22) Sir Harold Jeffreys’ (tail-area) one-liner: Sat night comedy [draft ii]
  • (2/24) Phil6334: February 20, 2014 (Spanos): Day #5
  • (2/26) Winner of the February 2014 palindrome contest (rejected post)
  • (2/26) Phil6334: Feb 24, 2014: Induction, Popper and pseudoscience (Day #4)

March 2014

  • (3/1) Cosma Shalizi gets tenure (at last!) (metastat announcement)
  • (3/2) Significance tests and frequentist principles of evidence: Phil6334 Day #6
  • (3/3) Capitalizing on Chance (ii)
  • (3/4) Power, power everywhere–(it) may not be what you think! [illustration]
  • (3/8) Msc kvetch: You are fully dressed (even under you clothes)?
  • (3/8) Fallacy of Rejection and the Fallacy of Nouvelle Cuisine
  • (3/11) Phil6334 Day #7: Selection effects, the Higgs and 5 sigma, Power
  • (3/12) Get empowered to detect power howlers
  • (3/15) New SEV calculator (guest app: Durvasula)
  • (3/17) Stephen Senn: “Delta Force: To what extent is clinical relevance relevant?” (Guest Post)
  • (3/19) Power taboos: Statue of Liberty, Senn, Neyman, Carnap, Severity
  • (3/22) Fallacies of statistics & statistics journalism, and how to avoid them: Summary & Slides Day #8 (Phil 6334)
  • (3/25) The Unexpected Way Philosophy Majors Are Changing The World Of Business
  • (3/26) Phil6334:Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)
  • (3/28) Severe osteometric probing of skeletal remains: John Byrd
  • (3/29) Winner of the March 2014 palindrome contest (rejected post)
  • (3/30) Phil6334: March 26, philosophy of misspecification testing (Day #9 slides)

April 2014

  • (4/1) Skeptical and enthusiastic Bayesian priors for beliefs about insane asylum renovations at Dept of Homeland Security: I’m skeptical and unenthusiastic
  • (4/3) Self-referential blogpost (conditionally accepted*)
  • (4/5) Who is allowed to cheat? I.J. Good and that after dinner comedy hour. . ..
  • (4/6) Phil6334: Duhem’s Problem, highly probable vs highly probed; Day #9 Slides
  • (4/8) “Out Damned Pseudoscience: Non-significant results are the new ‘Significant’ results!” (update)
  • (4/12) “Murder or Coincidence?” Statistical Error in Court: Richard Gill (TEDx video)
  • (4/14) Phil6334: Notes on Bayesian Inference: Day #11 Slides
  • (4/16) A. Spanos: Jerzy Neyman and his Enduring Legacy
  • (4/17) Duality: Confidence intervals and the severity of tests
  • (4/19) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill)
  • (4/21) Phil 6334: Foundations of statistics and its consequences: Day#12
  • (4/23) Phil 6334 Visitor: S. Stanley Young, “Statistics and Scientific Integrity”
  • (4/26) Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day #13)
  • (4/30) Able Stats Elba: 3 Palindrome nominees for April! (rejected post)

May 2014

  • (5/1) Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle
  • (5/3) You can only become coherent by ‘converting’ non-Bayesianly
  • (5/6) Winner of April Palindrome contest: Lori Wike
  • (5/7) A. Spanos: Talking back to the critics using error statistics (Phil6334)
  • (5/10) Who ya gonna call for statistical Fraudbusting? R.A. Fisher, P-values, and error statistics (again)
  • (5/15) Scientism and Statisticism: a conference* (i)
  • (5/17) Deconstructing Andrew Gelman: “A Bayesian wants everybody else to be a non-Bayesian.”
  • (5/20) The Science Wars & the Statistics Wars: More from the Scientism workshop
  • (5/25) Blog Table of Contents: March and April 2014
  • (5/27) Allan Birnbaum, Philosophical Error Statistician: 27 May 1923 – 1 July 1976
  • (5/31) What have we learned from the Anil Potti training and test data frameworks? Part 1 (draft 2)

June 2014

  • (6/5) Stephen Senn: Blood Simple? The complicated and controversial world of bioequivalence (guest post)
  • (6/9) “The medical press must become irrelevant to publication of clinical trials.”
  • (6/11) A. Spanos: “Recurring controversies about P values and confidence intervals revisited”
  • (6/14) “Statistical Science and Philosophy of Science: where should they meet?”
  • (6/21) Big Bayes Stories? (draft ii)
  • (6/25) Blog Contents: May 2014
  • (6/28) Sir David Hendry Gets Lifetime Achievement Award
  • (6/30) Some ironies in the ‘replication crisis’ in social psychology (4th and final installment)

July 2014

  • (7/7) Winner of June Palindrome Contest: Lori Wike
  • (7/8) Higgs Discovery 2 years on (1: “Is particle physics bad science?”)
  • (7/10) Higgs Discovery 2 years on (2: Higgs analysis and statistical flukes)
  • (7/14) “P-values overstate the evidence against the null”: legit or fallacious? (revised)
  • (7/23) Continued:”P-values overstate the evidence against the null”: legit or fallacious?
  • (7/26) S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)
  • (7/31) Roger Berger on Stephen Senn’s “Blood Simple” with a response by Senn (Guest Posts)

August 2014

  • (08/03) Blogging Boston JSM2014?
  • (08/05) Neyman, Power, and Severity
  • (08/06) What did Nate Silver just say? Blogging the JSM 2013
  • (08/09) Winner of July Palindrome: Manan Shah
  • (08/09) Blog Contents: June and July 2014
  • (08/11) Egon Pearson’s Heresy
  • (08/17) Are P Values Error Probabilities? Or, “It’s the methods, stupid!” (2nd install)
  • (08/23) Has Philosophical Superficiality Harmed Science?
  • (08/29) BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

[i]Table of Contents compiled by N. Jinn & J. Miller)*

*I thank Jean Miller for her assiduous work on the blog, and all contributors and readers for helping “frequentists in exile” to feel (and truly become) less exiled–wherever they may be!

Categories: blog contents, Metablog, Statistics | Leave a comment

3 in blog years: Sept 3 is 3rd anniversary of errorstatistics.com

Where did you hear this?  “Join me, if you will, for a little deep-water drilling, as I cast about on my isle of Elba.” Remember this and this? And this philosophical treatise on “moving blog day”? Oy, did I really write all this stuff?

http://errorstatistics.blogspot.com/2011/09/overheard-at-comedy-hour-at-bayesian_03.html

cake baked by blog staff for 3 year anniversary of errorstatistics.com

I still see this as my rag-tag amateur blog. I never learned html and don’t have time to now. But the blog enterprise was more jocund and easy-going then–just an experiment, really, and a place to discuss our RMM papers. (And, of course, a home for error statistical philosophers-in-exile).

A blog table of contents for all three years will appear tomorrow.

Anyway, 2 representatives from Elba flew into NYC and  baked this cake in my never-used Chef’s oven (based on the cover/table of contents of EGEK 1996). We’ll be celebrating at A Different Place tonight[i]–so if you’re in the neighborhood, stop by after 8pm for an Elba Grease (on me).

Do you want a free signed copy of EGEK? Say why in 25 words or less (to error@vt.edu), and the Fund for E.R.R.O.R.* will send them to the top 3 submissions (by 9/10/14).**

Acknowledgments: I want to thank the many commentators for their frequent insights and for keeping things interesting and lively. Among the regulars, and semi-regulars (but with impact) off the top of my head, and in no order: Senn, Yanofsky, Byrd, Gelman, Schachtman, Kepler, McKinney, S. Young, Matloff, O’Rourke, Gandenberger, Wasserman, E. Berk, Spanos, Glymour, Rohde, Greenland, Omaclaren,someone named Mark, assorted guests, original guests, and anons, and mysterious visitors, related twitterers (who would rather tweet from afar). I’m sure I’ve left some people out. Thanks to students and participants in the spring 2014 seminar with Aris Spanos (slides and lecture notes are still up).

I’m especially grateful to my regular guest bloggers: Stephen Senn and Aris Spanos, and to those who were subjected to deconstructions and to U-Phils in years past. (I may return to that some time.) Other guest posters for 2014 will be acknowledged in the year round up.

I thank blog compilers, Jean Miler and Nicole Jinn, and give special thanks for the tireless efforts of Jean Miller who has slogged through html, or whatever it is, when necessary, has scanned and put up dozens of articles to make them easy for readers to access, taken slow ferries back and forth to the island of Elba, and fixed gazillions of glitches on a daily basis. Last, but not least, to the palindromists who have been winning lots of books recently (1 day left for August submissions).

*Experimental Reasoning, Reliability, Objectivity and Rationality.

** Accompany submissions with an e-mail address and regular address. All submissions remain private. Elba judges decisions are final. Void in any places where prohibited by laws, be they laws of likelihood or Napoleanic laws-in-exile. But seriously, we’re giving away 3 books.

[i]email for directions.

Categories: Announcement, Statistics | 12 Comments

BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

.

.

1.An Assumed Law of Statistical Evidence (law of likelihood)

Nearly all critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with the following general assumption about the nature of inductive evidence or support:

Data x are better evidence for hypothesis H1 than for H0 if x are more probable under H1 than under H0.

Ian Hacking (1965) called this the logic of support: x supports hypotheses H1 more than H0 if H1 is more likely, given x than is H0:

Pr(x; H1) > Pr(x; H0).

[With likelihoods, the data x are fixed, the hypotheses vary.]*

Or,

x is evidence for H1 over H0 if the likelihood ratio LR (H1 over H0 ) is greater than 1.

It is given in other ways besides, but it’s the same general idea. (Some will take the LR as actually quantifying the support, others leave it qualitative.)

In terms of rejection:

“An hypothesis should be rejected if and only if there is some rival hypothesis much better supported [i.e., much more likely] than it is.” (Hacking 1965, 89)

2. Barnard (British Journal of Philosophy of Science )

But this “law” will immediately be seen to fail on our minimal severity requirement. Hunting for an impressive fit, or trying and trying again, it’s easy to find a rival hypothesis H1 much better “supported” than H0 even when H0 is true. Or, as Barnard (1972) puts it, “there always is such a rival hypothesis, viz. that things just had to turn out the way they actually did” (1972 p. 129).  H0: the coin is fair, gets a small likelihood (.5)k given k tosses of a coin, while H1: the probability of heads is 1 just on those tosses that yield a head, renders the sequence of k outcomes maximally likely. This is an example of Barnard’s “things just had to turn out as they did”. Or, to use an example with P-values: a statistically significant difference, being improbable under the null H0 , will afford high likelihood to any number of explanations that fit the data well.

3.Breaking the law (of likelihood) by going to the “second,” error statistical level:

How does it fail our severity requirement? First look at what the frequentist error statistician must always do to critique an inference: she must consider the capability of the inference method that purports to provide evidence for a claim. She goes to a higher level or metalevel, as it were. In this case, the likelihood ratio plays the role of the needed statistic d(X). To put it informally, she asks:

What’s the probability the method would yield an LR disfavoring H0 compared to some alternative H1  even if H0 is true?

Continue reading

Categories: highly probable vs highly probed, law of likelihood, Likelihood Principle, Statistics | 72 Comments

Has Philosophical Superficiality Harmed Science?

images

.

I have been asked what I thought of some criticisms of the scientific relevance of philosophy of science, as discussed in the following snippet from a recent Scientific American blog. My title elicits the appropriate degree of ambiguity, I think. 

Quantum Gravity Expert Says “Philosophical Superficiality” Has Harmed Physics

By John Horgan | August 21, 2014 |  14

“I interviewed Rovelli by phone in the early 1990s when I was writing a story for Scientific American about loop quantum gravity, a quantum-mechanical version of gravity proposed by Rovelli, Lee Smolin and Abhay Ashtekar[i]

Horgan: What’s your opinion of the recent philosophy-bashing by Stephen Hawking, Lawrence Krauss and Neil deGrasse Tyson?

Rovelli: Seriously: I think they are stupid in this.   I have admiration for them in other things, but here they have gone really wrong.  Look: Einstein, Heisenberg, Newton, Bohr…. and many many others of the greatest scientists of all times, much greater than the names you mention, of course, read philosophy, learned from philosophy, and could have never done the great science they did without the input they got from philosophy, as they claimed repeatedly. You see: the scientists that talk philosophy down are simply superficial: they have a philosophy (usually some ill-digested mixture of Popper and Kuhn) and think that this is the “true” philosophy, and do not realize that this has limitations.

Here is an example: theoretical physics has not done great in the last decades. Why? Well, one of the reasons, I think, is that it got trapped in a wrong philosophy: the idea that you can make progress by guessing new theory and disregarding the qualitative content of previous theories.  This is the physics of the “why not?”  Why not studying this theory, or the other? Why not another dimension, another field, another universe?  Science has never advanced in this manner in the past.  Science does not advance by guessing. It advances by new data or by a deep investigation of the content and the apparent contradictions of previous empirically successful theories.  Quite remarkably, the best piece of physics done by the three people you mention is Hawking’s black-hole radiation, which is exactly this.  But most of current theoretical physics is not of this sort.  Why?  Largely because of the philosophical superficiality of the current bunch of scientists.”

I find it intriguing that Rovelli suggests that “Science does not advance by guessing. It advances by new data or by a deep investigation of the content and the apparent contradictions of previous empirically successful theories.” I think this is an interesting and subtle claim with which I agree. Continue reading

Categories: StatSci meets PhilSci, strong likelihood principle | 33 Comments

Are P Values Error Probabilities? or, “It’s the methods, stupid!” (2nd install)

f1ce127a4cfe95c4f645f0cc98f04fca

.

Despite the fact that Fisherians and Neyman-Pearsonians alike regard observed significance levels, or P values, as error probabilities, we occasionally hear allegations (typically from those who are neither Fisherian nor N-P theorists) that P values are actually not error probabilities. The denials tend to go hand in hand with allegations that P values exaggerate evidence against a null hypothesis—a problem whose cure invariably invokes measures that are at odds with both Fisherian and N-P tests. The Berger and Sellke (1987) article from a recent post is a good example of this. When leading figures put forward a statement that looks to be straightforwardly statistical, others tend to simply repeat it without inquiring whether the allegation actually mixes in issues of interpretation and statistical philosophy. So I wanted to go back and look at their arguments. I will post this in installments.

1. Some assertions from Fisher, N-P, and Bayesian camps

Here are some assertions from Fisherian, Neyman-Pearsonian and Bayesian camps: (I make no attempt at uniformity in writing the “P-value”, but retain the quotes as written.)

a) From the Fisherian camp (Cox and Hinkley):

For given observations y we calculate t = tobs = t(y), say, and the level of significance pobs by

pobs = Pr(T > tobs; H0).

….Hence pobs is the probability that we would mistakenly declare there to be evidence against H0, were we to regard the data under analysis as being just decisive against H0.” (Cox and Hinkley 1974, 66).

Thus pobs would be the Type I error probability associated with the test.

b) From the Neyman-Pearson N-P camp (Lehmann and Romano):

“[I]t is good practice to determine not only whether the hypothesis is accepted or rejected at the given significance level, but also to determine the smallest significance level…at which the hypothesis would be rejected for the given observation. This number, the so-called p-value gives an idea of how strongly the data contradict the hypothesis. It also enables others to reach a verdict based on the significance level of their choice.” (Lehmann and Romano 2005, 63-4) 

Very similar quotations are easily found, and are regarded as uncontroversial—even by Bayesians whose contributions stood at the foot of Berger and Sellke’s argument that P values exaggerate the evidence against the null. Continue reading

Categories: frequentist/Bayesian, J. Berger, P-values, Statistics | 31 Comments

Egon Pearson’s Heresy

E.S. Pearson: 11 Aug 1895-12 June 1980.

Today is Egon Pearson’s birthday: 11 August 1895-12 June, 1980.
E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

(Nowadays, some people concentrate to an absurd extent on “science-wise error rates in dichotomous screening”.)

When Erich Lehmann, in his review of my “Error and the Growth of Experimental Knowledge” (EGEK 1996), called Pearson “the hero of Mayo’s story,” it was because I found in E.S.P.’s work, if only in brief discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of N-P statistics. Granted, these “evidential” attitudes and practices have never been explicitly codified to guide the interpretation of N-P tests. If they had been, I would not be on about providing an inferential philosophy all these years.[i] Nevertheless, “Pearson and Pearson” statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect. One of the few sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It begins like this: Continue reading

Categories: phil/history of stat, Philosophy of Statistics, Statistics | Tags: , | 2 Comments

Blog Contents: June and July 2014

Image of business woman rolling a giant stone

.

Blog Contents: June and July 2014*

(6/5) Stephen Senn: Blood Simple? The complicated and controversial world of bioequivalence (guest post)

(6/9) “The medical press must become irrelevant to publication of clinical trials.”

(6/11) A. Spanos: “Recurring controversies about P values and confidence intervals revisited”

(6/14) “Statistical Science and Philosophy of Science: where should they meet?”

(6/21) Big Bayes Stories? (draft ii)

(6/25) Blog Contents: May 2014

(6/28) Sir David Hendry Gets Lifetime Achievement Award

(6/30) Some ironies in the ‘replication crisis’ in social psychology (4th and final installment) Continue reading

Categories: blog contents | Leave a comment

Winner of July Palindrome: Manan Shah

Shah

Manan Shah

Winner of July 2014 Contest:

Manan Shah

Palindrome: 

Trap May Elba, Dr. of Fanatic. I fed naan, deli-oiled naan, deficit an affordable yam part.

The requirements: 

In addition to using Elba, a candidate for a winning palindrome must have used fanatic. An optional second word was: part. An acceptable palindrome with both words would best an acceptable palindrome with just fanatic

Bio:

Manan Shah is a mathematician and owner of Think. Plan. Do. LLC. (www.ThinkPlanDoLLC.com). He also maintains the “Math Misery?” blog at www.mathmisery.com. He holds a PhD in Mathematics from Florida State University.

Continue reading

Categories: Palindrome, Rejected Posts | Leave a comment

What did Nate Silver just say? Blogging the JSM 2013

imagesMemory Lane: August 6, 2013. My initial post on JSM13 (8/5/13) was here.

Nate Silver gave his ASA Presidential talk to a packed audience (with questions tweeted[i]). Here are some quick thoughts—based on scribbled notes (from last night). Silver gave a list of 10 points that went something like this (turns out there were 11):

1. statistics are not just numbers

2. context is needed to interpret data

3. correlation is not causation

4. averages are the most useful tool

5. human intuitions about numbers tend to be flawed and biased

6. people misunderstand probability

7. we should be explicit about our biases and (in this sense) should be Bayesian?

8. complexity is not the same as not understanding

9. being in the in crowd gets in the way of objectivity

10. making predictions improves accountability Continue reading

Categories: Statistics, StatSci meets PhilSci | 3 Comments

Neyman, Power, and Severity

April 16, 1894 – August 5, 1981

NEYMAN: April 16, 1894 – August 5, 1981

Jerzy Neyman: April 16, 1894-August 5, 1981. This reblogs posts under “The Will to Understand Power” & “Neyman’s Nursery” here & here.

Way back when, although I’d never met him, I sent my doctoral dissertation, Philosophy of Statistics, to one person only: Professor Ronald Giere. (And he would read it, too!) I knew from his publications that he was a leading defender of frequentist statistical methods in philosophy of science, and that he’d worked for at time with Birnbaum in NYC.

Some ten 15 years ago, Giere decided to quit philosophy of statistics (while remaining in philosophy of science): I think it had to do with a certain form of statistical exile (in philosophy). He asked me if I wanted his papers—a mass of work on statistics and statistical foundations gathered over many years. Could I make a home for them? I said yes. Then came his caveat: there would be a lot of them.

As it happened, we were building a new house at the time, Thebes, and I designed a special room on the top floor that could house a dozen or so file cabinets. (I painted it pale rose, with white lacquered book shelves up to the ceiling.) Then, for more than 9 months (same as my son!), I waited . . . Several boxes finally arrived, containing hundreds of files—each meticulously labeled with titles and dates.  More than that, the labels were hand-typed!  I thought, If Ron knew what a slob I was, he likely would not have entrusted me with these treasures. (Perhaps he knew of no one else who would  actually want them!) Continue reading

Categories: Neyman, phil/history of stat, power, Statistics | Tags: , , , | 4 Comments

Blogging Boston JSM2014?

.

.

I’m not there. (Several people have asked, I guess because I blogged JSM13.) If you hear of talks (or anecdotes) of interest to error statistics.com, please comment here (or twitter: @learnfromerror)

Categories: Announcement | 7 Comments

Roger Berger on Stephen Senn’s “Blood Simple” with a response by Senn (Guest posts)

Roger BergerRoger L. Berger

School Director & Professor
School of Mathematical & Natural Science
Arizona State University

Comment on S. Senn’s post: Blood Simple? The complicated and controversial world of bioequivalence”(*)

First, I do agree with Senn’s statement that “the FDA requires conventional placebo-controlled trials of a new treatment to be tested at the 5% level two-sided but since they would never accept a treatment that was worse than placebo the regulator’s risk is 2.5% not 5%.” The FDA procedure essentially defines a one-sided test with Type I error probability (size) of .025. Why it is not just called this, I do not know. And if the regulators believe .025 is the appropriate Type I error probability, then perhaps it should be used in other situations, e.g., bioequivalence testing, as well.

Senn refers to a paper by Hsu and me (Berger and Hsu (1996)), and then attempts to characterize what we said. Unfortunately, I believe he has mischaracterized. Continue reading

Categories: bioequivalence, frequentist/Bayesian, PhilPharma, Statistics | Tags: , | 22 Comments

S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)

Stephen Senn

.

Stephen Senn
Head, Methodology and Statistics Group
Competence Center for Methodology and Statistics (CCMS)
Luxembourg

Responder despondency: myths of personalized medicine

The road to drug development destruction is paved with good intentions. The 2013 FDA report, Paving the Way for Personalized Medicine  has an encouraging and enthusiastic foreword from Commissioner Hamburg and plenty of extremely interesting examples stretching back decades. Given what the report shows can be achieved on occasion, given the enthusiasm of the FDA and its commissioner, given the amazing progress in genetics emerging from the labs, a golden future of personalized medicine surely awaits us. It would be churlish to spoil the party by sounding a note of caution but I have never shirked being churlish and that is exactly what I am going to do. Continue reading

Categories: evidence-based policy, Statistics, Stephen Senn | 49 Comments

Continued:”P-values overstate the evidence against the null”: legit or fallacious?

.

continued…

Categories: Bayesian/frequentist, CIs and tests, fallacy of rejection, highly probable vs highly probed, P-values, Statistics | 39 Comments

“P-values overstate the evidence against the null”: legit or fallacious? (revised)

0. July 20, 2014: Some of the comments to this post reveal that using the word “fallacy” in my original title might have encouraged running together the current issue with the fallacy of transposing the conditional. Please see a newly added Section 7.

Continue reading

Categories: Bayesian/frequentist, CIs and tests, fallacy of rejection, highly probable vs highly probed, P-values, Statistics | 71 Comments

Blog at WordPress.com. The Adventure Journal Theme.

Follow

Get every new post delivered to your Inbox.

Join 429 other followers