Posts Tagged With: Neyman

Power and Severity with nonsignificant results: more power puzzles? (ii)

Posted on March 14, 2026 by Mayo

The concept of a test’s power, originating in Neyman-Pearson’s early work, by and large, is a pre-data concept for purposes of specifying a test (notably, determining worthwhile sample size), and choosing between tests. In some papers, however, Neyman lists a third goal for power: to interpret test results post data much in the spirit of what is often called “power analysis”. This is to determine the discrepancy from a null hypothesis that may be ruled out, given nonsignificant results. One example is in a paper “The Problem of Inductive Inference” (Neyman 1955)–already a surprising title for behaviorist Neyman. The reason I’m bringing this up is that it has direct bearing on some of today’s most puzzling (and problematic) post-data uses of power. Interestingly, in that 1955 paper, Neyman is talking to none other than the logical positivist philosopher of confirmation, Rudof Carnap:

I am concerned with the term “degree of confirmation” introduced by Carnap. …We have seen that the application of the locally best one-sided test to the data … failed to reject the hypothesis [that the n observations come from a source in which the null hypothesis is true]. The question is: does this result “confirm” the hypothesis that H₀ is true of the particular data set? (Neyman, pp 40-41).

Neyman continues: Continue reading →

Categories: Neyman's Nursery, power analysis | Tags: negative result, Neyman, power, severe testing | Leave a comment

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

Posted on August 8, 2015 by Mayo

Memory lane: Did you ever consider how some of the colorful exchanges among better-known names in statistical foundations could be the basis for high literary drama in the form of one-act plays (even if appreciated by only 3-7 people in the world)? (Think of the expressionist exchange between Bohr and Heisenberg in Michael Frayn’s play Copenhagen, except here there would be no attempt at all to popularize—only published quotes and closely remembered conversations would be included, with no attempt to create a “story line”.) Somehow I didn’t think so. But rereading some of Savage’s high-flown praise of Birnbaum’s “breakthrough” argument (for the Likelihood Principle) today, I was swept into a “(statistical) theater of the absurd” mindset.(Update Aug, 2015 [ii])

The first one came to me in autumn 2008 while I was giving a series of seminars on philosophy of statistics at the LSE. Modeled on a disappointing (to me) performance of The Woman in Black, “A Funny Thing Happened at the [1959] Savage Forum” relates Savage’s horror at George Barnard’s announcement of having rejected the Likelihood Principle!

The current piece also features George Barnard. It recalls our first meeting in London in 1986. I’d sent him a draft of my paper on E.S. Pearson’s statistical philosophy, “Why Pearson Rejected the Neyman-Pearson Theory of Statistics” (later adapted as chapter 11 of EGEK) to see whether I’d gotten Pearson right. Since Tuesday (Aug 11) is Pearson’s birthday, I’m reblogging this. Barnard had traveled quite a ways, from Colchester, I think. It was June and hot, and we were up on some kind of a semi-enclosed rooftop. Barnard was sitting across from me looking rather bemused.

The curtain opens with Barnard and Mayo on the roof, lit by a spot mid-stage. He’s drinking (hot) tea; she, a Diet Coke. The dialogue (is what I recall from the time[i]):

Barnard: I read your paper. I think it is quite good. Did you know that it was I who told Fisher that Neyman-Pearson statistics had turned his significance tests into little more than acceptance procedures? Continue reading →

Categories: Barnard, phil/history of stat, Statistics | Tags: Fisher, George Bernard, Likelihood Principle, Neyman, Pearson | Leave a comment

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

Posted on September 21, 2014 by Mayo

The current piece also features George Barnard and since Monday (9/23) is Barnard’s birthday, I’m digging it out of “rejected posts” to reblog it. It recalls our first meeting in London in 1986. I’d sent him a draft of my paper “Why Pearson Rejected the Neyman-Pearson Theory of Statistics” (later adapted as chapter 11 of EGEK) to see whether I’d gotten Pearson right. He’d traveled quite a ways, from Colchester, I think. It was June and hot, and we were up on some kind of a semi-enclosed rooftop. Barnard was sitting across from me looking rather bemused.

The curtain opens with Barnard and Mayo on the roof, lit by a spot mid-stage. He’s drinking (hot) tea; she, a Diet Coke. The dialogue (is what I recall from the time[i]):

Mayo: Thank you so much for reading my paper. I recall a reference to you in Pearson’s response to Fisher, but I didn’t know the full extent.

Barnard: I was the one who told Fisher that Neyman was largely to blame. He shouldn’t be too hard on Egon. His statistical philosophy, you are aware, was different from Neyman’s.

Mayo: That’s interesting. I did quote Pearson, at the end of his response to Fisher, as saying that inductive behavior was “Neyman’s field, not mine”. I didn’t know your role in his laying the blame on Neyman!

Fade to black. The lights go up on Fisher, stage left, flashing back some 30 years earlier . . . ….

Fisher: Now, acceptance procedures are of great importance in the modern world. When a large concern like the Royal Navy receives material from an engineering firm it is, I suppose, subjected to sufficiently careful inspection and testing to reduce the frequency of the acceptance of faulty or defective consignments. . . . I am casting no contempt on acceptance procedures, and I am thankful, whenever I travel by air, that the high level of precision and reliability required can really be achieved by such means. But the logical differences between such an operation and the work of scientific discovery by physical or biological experimentation seem to me so wide that the analogy between them is not helpful . . . . [Advocates of behavioristic statistics are like]

Russians [who] are made familiar with the ideal that research in pure science can and should be geared to technological performance, in the comprehensive organized effort of a five-year plan for the nation. . . .

In the U.S. also the great importance of organized technology has I think made it easy to confuse the process appropriate for drawing correct conclusions, with those aimed rather at, let us say, speeding production, or saving money. (Fisher 1955, 69-70)

Fade to black. The lights go up on Egon Pearson stage right (who looks like he does in my sketch [frontispiece] from EGEK 1996, a bit like a young C. S. Peirce):

Pearson: There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans. . . . Indeed, to dispel the picture of the Russian technological bogey, I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot . . . . To the best of my ability I was searching for a way of expressing in mathematical terms what appeared to me to be the requirements of the scientist in applying statistical tests to his data. (Pearson 1955, 204)

Fade to black. The spotlight returns to Barnard and Mayo, but brighter. It looks as if it’s gotten hotter. Barnard wipes his brow with a white handkerchief. Mayo drinks her Diet Coke.

Barnard (ever so slightly angry): You have made one blunder in your paper. Fisher would never have made that remark about Russia.

There is a tense silence.

Mayo: But—it was a quote.

End of Act 1.

Given this was pre-internet, we couldn’t go to the source then and there, so we agreed to search for the paper in the library. Well, you get the idea. Maybe I could call the piece “Stat on a Hot Tin Roof.”

If you go see it, don’t say I didn’t warn you.

I’ve gotten various new speculations over the years as to why he had this reaction to the mention of Russia (check discussions in earlier posts with this play). Feel free to share yours. Some new (to me) information on Barnard is in George Box’s recent autobiography.

[i] We had also discussed this many years later, in 1999.

Categories: Barnard, phil/history of stat, rejected post, Statistics | Tags: Fisher, George Bernard, Likelihood Principle, Neyman, Pearson | 3 Comments

Neyman, Power, and Severity

Posted on August 5, 2014 by Mayo

NEYMAN: April 16, 1894 – August 5, 1981

Jerzy Neyman: April 16, 1894-August 5, 1981. This reblogs posts under “The Will to Understand Power” & “Neyman’s Nursery” here & here.

Way back when, although I’d never met him, I sent my doctoral dissertation, Philosophy of Statistics, to one person only: Professor Ronald Giere. (And he would read it, too!) I knew from his publications that he was a leading defender of frequentist statistical methods in philosophy of science, and that he’d worked for at time with Birnbaum in NYC.

Some ~~ten~~ 15 years ago, Giere decided to quit philosophy of statistics (while remaining in philosophy of science): I think it had to do with a certain form of statistical exile (in philosophy). He asked me if I wanted his papers—a mass of work on statistics and statistical foundations gathered over many years. Could I make a home for them? I said yes. Then came his caveat: there would be a lot of them.

As it happened, we were building a new house at the time, Thebes, and I designed a special room on the top floor that could house a dozen or so file cabinets. (I painted it pale rose, with white lacquered book shelves up to the ceiling.) Then, for more than 9 months (same as my son!), I waited . . . Several boxes finally arrived, containing hundreds of files—each meticulously labeled with titles and dates. More than that, the labels were hand-typed! I thought, If Ron knew what a slob I was, he likely would not have entrusted me with these treasures. (Perhaps he knew of no one else who would actually want them!) Continue reading →

Categories: Neyman, phil/history of stat, power, Statistics | Tags: negative result, Neyman, power, severe testing | 5 Comments

Was Janina Hosiasson pulling Harold Jeffreys’ leg?

Posted on October 5, 2013 by Mayo

Hosiasson 1899-1942

The very fact that Jerzy Neyman considers she might have been playing a “mischievous joke” on Harold Jeffreys (concerning probability) is enough to intrigue and impress me (with Hosiasson!). I’ve long been curious about what really happened. Eleonore Stump, a leading medieval philosopher and friend (and one-time colleague), and I pledged to travel to Vilnius to research Hosiasson. I first heard her name from Neyman’s dedication of Lectures and Conferences in Mathematical Statistics and Probability: “To the memory of: Janina Hosiasson, murdered by the Gestapo” along with around 9 other “colleagues and friends lost during World War II.” (He doesn’t mention her husband Lindenbaum, shot alongside her*.) Hosiasson is responsible for Hempel’s Raven Paradox, and I definitely think we should be calling it Hosiasson’s (Raven) Paradox for much of the lost credit to her contributions to Carnapian confirmation theory[i].

But what about this mischievous joke she might have pulled off with Harold Jeffreys? Or did Jeffreys misunderstand what she intended to say about this howler, or? Since it’s a weekend and all of the U.S. monuments and parks are shut down, you might read this snippet and share your speculations…. The following is from Neyman 1952:

“Example 6.—The inclusion of the present example is occasioned by certain statements of Harold Jeffreys (1939, 300) which suggest that, in spite of my  insistence on the phrase, “probability that an object A will possess the  property B,” and in spite of the five foregoing examples, the definition of  probability given above may be misunderstood.  Jeffreys is an important proponent of the subjective theory of probability  designed to measure the “degree of reasonable belief.” His ideas on the  subject are quite radical. He claims (1939, 303) that no consistent theory of probability is possible without the basic notion of degrees of reasonable belief.  His further contention is that proponents of theories of probabilities alternative to his own forget their definitions “before the ink is dry.” In  Jeffreys’ opinion, they use the notion of reasonable belief without ever  noticing that they are using it and, by so doing, contradict the principles  which they have laid down at the outset.

The necessity of any given axiom in a mathematical theory is something  which is subject to proof. … However, Dr. Jeffreys’ contention that the notion of degrees of reasonable  belief and his Axiom 1are necessary for the development of the theory  of probability is not backed by any attempt at proof. Instead, he considers  definitions of probability alternative to his own and attempts to show by  example that, if these definitions are adhered to, the results of their application would be totally unreasonable and unacceptable to anyone. Some  of the examples are striking. On page 300, Jeffreys refers to an article of  mine in which probability is defined exactly as it is in the present volume.  Jeffreys writes:

 The first definition is sometimes called the “classical” one, and is stated in much  modern work, notably that of J. Neyman.

However, Jeffreys does not quote the definition that I use but chooses  to reword it as follows:

If there are n possible alternatives, for m of which p is true, then the probability of  p is defined to be m/n. 

He goes on to say:

The first definition appears at the beginning of De Moivre’s book (Doctrine of  Chances, 1738). It often gives a definite value to a probability; the trouble is that the  value is one that its user immediately rejects. Thus suppose that we are considering  two boxes, one containing one white and one black ball, and the other one white and  two black. A box is to be selected at random and then a ball at random from that box.  What is the probability that the ball will be white? There are five balls, two of which  are white. Therefore, according to the definition, the probability is 2/5. But most  statistical writers, including, I think, most of those that professedly accept the definition, would give (1/2)•(1/2) + (1/2)•(1/3) = 5/12. This follows at once on the present theory,  the terms representing two applications of the product rule to give the probability of  drawing each of the two white balls. These are then added by the addition rule. But  the proposition cannot be expressed as the disjunction of five alternatives out of twelve.  My attention was called to this point by Miss J. Hosiasson. 

The solution, 2/5, suggested by Jeffreys as the result of an allegedly  strict application of my definition of probability is obviously wrong. The  mistake seems to be due to Jeffreys’ apparently harmless rewording of the  definition. If we adhere to the original wording (p. 4) and, in particular, to the  phrase “probability of an object A having the property B,” then, prior to attempting a solution, we would probably ask ourselves the questions:  ”What are the ‘objects A’ in this particular case?” and “What is the ’property B,’ the probability of which it is desired to compute?” Once  these questions have been asked, the answer to them usually follows and  determines the solution.

In the particular example of Dr. Jeffreys, the objects A are obviously  not balls, but pairs of random selections, the first of a box and the second  of a ball. If we like to state the problem without dangerous abbreviations,  the probability sought is that of a pair of selections ending with a white  ball. All the conditions of there being two boxes, the first with two balls  only and the second with three, etc., must be interpreted as picturesque  descriptions of the F.P.S. of pairs of selections. The elements of this set  fall into four categories, conveniently described by pairs of symbols (1,w), (1,b), (2,w), (2,b), so that, for example, (2,w) stands for a pair of  selections in which the second box was selected in the first instance, and  then this was followed by the selection of the white ball. Denote by n1,w, n1,b, n2,w, and n2,b the (unknown) numbers of the elements of the F.P.S. belonging to each of the above categories, and by n their sum. Then the probability sought is “(Neyman 1952, 10-11).

Then there are the detailed computations from which Neyman gets the right answer (entered 10/9/13):

P{w|pair of selections} = (n1,w + n2,w)/n.

The conditions of the problem imply

P{1|pair of selections} = (n1,w + n1,b)/n = ½,

P{2|pair of selections} = (n2,w + n2,b)/n = ½,

P{w| pair of selections beginning with box No. 1} = n1,w/(n1,w + n1,b) = ½,

P{w| pair of selections beginning with box No. 2} = n2,w/(n2,w + n2,b) = 1/3.

It follows

n1,w = 1/2(n1,w + n1,b) = n/4,

n2,w = 1/3(n2,w + n2,b) = n/6,

P{w|pair of selections} = 5/12.

The method of computing probability used here is a direct enumeration  of elements of the F.P.S. For this reason it is called the “direct method.”  As we can see from this particular example, the direct method is occasionally cumbersome and the correct solution is more easily reached through  the application of certain theorems basic in the theory of probability. These theorems, the addition theorem and the multiplication theorem, are very  easy to apply, with the result that students frequently manage to learn the  machinery of application without understanding the theorems. To check  whether or not a student does understand the theorems, it is advisable to  ask him to solve problems by the direct method. If he cannot, then he  does not understand what he is doing.

Checks of this kind were part of the regular program of instruction in  Warsaw where Miss Hosiasson was one of my assistants. Miss Hosiasson  was a very talented lady who has written several interesting contributions  to the theory of probability. One of these papers deals specifically with  various misunderstandings which, under the high sounding name of paradoxes, still litter the scientific books and journals. Most of these paradoxes originate from lack of precision in stating the conditions of the  problems studied. In these circumstances, it is most unlikely that Miss  Hosiasson could fail in the application of the direct method to a simple  problem like the one described by Dr. Jeffreys. On the other hand, I can  well imagine Miss Hosiasson making a somewhat mischievous joke. 

Some of the paradoxes solved by Miss Hosiasson are quite amusing…….”  (Neyman 1952, 10-13)

What think you? I will offer a first speculation in a comment.

The entire book Neyman (1952) may be found here, in plain text, here.

*June, 2017: I read somewhere today that her husband was killed in 41, so before she was, but all refs I know are sketchy.

[i]Of course there are many good, recent sources on the philosophy and history of Carnap, some of which mention her, but obviously do not touch on this matter. I read that Hosiasson was trying to build a Carnapian-style inductive logic setting out axioms (which to my knowledge Carnap never did). That was what some of my fledgling graduate school attempts had tried, but the axioms always seemed to admit counterexamples (if non-trivial). So much for the purely syntactic approach. But I wish I’d known of her attempts back then, and especially her treatment of paradoxes of confirmation. {I’m sometimes tempted to give a logic for severity, but I fight the temptation.)

REFERENCES

Hosiasson, J. (1931) Why do we prefer probabilities relative to many data? Mind 40 (157): 23-36 (1931)

Hosiasson-Lindenbaum, J. (1940) On confirmation Journal of Symbolic Logic 5 (4): 133-148 (1940)

Hosiasson, J. (1941) Induction et analogie: Comparaison de leur fondement Mind 50 (200): 351-365 (1941)

Hosiasson-Lindenbaum, J. (1948) Theoretical Aspects of the Advancement of Knowledge Synthese 7 (4/5):253 – 261 (1948)

Jeffreys, H. (1939) Theory of Probability (1st ed.). Oxford: The Clarendon Press

Neyman, J. (1952) Lectures and Conferences in Mathematical Statistics and Probability. Graduate School, U.S. Dept. of Agriculture

Categories: Hosiasson, phil/history of stat, Statistics | Tags: Jerzy Neyman, Neyman | 22 Comments

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

Posted on September 22, 2013 by Mayo

Memory lane: Did you ever consider how some of the colorful exchanges among better-known names in statistical foundations could be the basis for high literary drama in the form of one-act plays (even if appreciated by only 3-7 people in the world)? (Think of the expressionist exchange between Bohr and Heisenberg in Michael Frayn’s play Copenhagen, except here there would be no attempt at all to popularize—only published quotes and closely remembered conversations would be included, with no attempt to create a “story line”.) Somehow I didn’t think so. But rereading some of Savage’s high-flown praise of Birnbaum’s “breakthrough” argument (for the Likelihood Principle) today, I was swept into a “(statistical) theater of the absurd” mindset.

The current piece taking shape also features George Barnard and since tomorrow (9/23) is his birthday, I’m digging it out of “rejected posts”. It recalls our first meeting in London in 1986. I’d sent him a draft of my paper “Why Pearson Rejected the Neyman-Pearson Theory of Statistics” (later adapted as chapter 11 of EGEK) to see whether I’d gotten Pearson right. He’d traveled quite a ways, from Colchester, I think. It was June and hot, and we were up on some kind of a semi-enclosed rooftop. Barnard was sitting across from me looking rather bemused.

The curtain opens with Barnard and Mayo on the roof, lit by a spot mid-stage. He’s drinking (hot) tea; she, a Diet Coke. The dialogue (is what I recall from the time[i]):

Mayo: Thank you so much for reading my paper. I recall a reference to you in Pearson’s response to Fisher, but I didn’t know the full extent.

Barnard: I was the one who told Fisher that Neyman was largely to blame. He shouldn’t be too hard on Egon. His statistical philosophy, you are aware, was different from Neyman’s. Continue reading →

Categories: Barnard, phil/history of stat, rejected post, Statistics | Tags: Fisher, George Bernard, Likelihood Principle, Neyman, Pearson | 6 Comments

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”? (Rejected Post Feb 20)

Posted on February 20, 2012 by Mayo

Dear Reader: Not having been at this very long, I don’t know if it’s common for bloggers to collect a pile of rejected posts that one thinks better of before posting. Well, here’s one that belongs up in a “rejected post” page (and will be tucked away soon enough), but since we have so recently posted the Fisher–Neyman–Pearson “triad”, the blog-elders of Elba have twisted my elbow (repeatedly) to share this post, from back in the fall of 2011, London. Sincerely, D. Mayo

Egon Pearson on a Gate (by D. Mayo)

Did you ever consider how some of the colorful exchanges among better-known names in statistical foundations could be the basis for high literary drama in the form of one-act plays (even if appreciated by only 3-7 people in the world)? (Think of the expressionist exchange between Bohr and Heisenberg in Michael Frayn’s play Copenhagen, except here there would be no attempt at all to popularize—only published quotes and closely remembered conversations would be included, with no attempt to create a “story line”.) Somehow I didn’t think so. But rereading some of Savage’s high-flown praise of Birnbaum’s “breakthrough” argument (for the Likelihood Principle) today, I was swept into a “(statistical) theater of the absurd” mindset.

Continue reading →

Categories: Statistics | Tags: Fisher, George Bernard, Likelihood Principle, Neyman, Pearson | 13 Comments

Neyman’s Nursery (NN2): Power and Severity [Continuation of Oct. 22 Post]:

Posted on November 9, 2011 by Mayo

Let me pick up where I left off in “Neyman’s Nursery,” [built to house Giere’s statistical papers-in-exile]. The main goal of the discussion is to get us to exercise correctly our “will to understand power”, if only little by little. One of the two surprising papers I came across the night our house was hit by lightening has the tantalizing title “The Problem of Inductive Inference” (Neyman 1955). It reveals a use of statistical tests strikingly different from the long-run behavior construal most associated with Neyman. Surprising too, Neyman is talking to none other than the logical positivist philosopher of confirmation, Rudof Carnap:

I am concerned with the term “degree of confirmation” introduced by Carnap. …We have seen that the application of the locally best one-sided test to the data … failed to reject the hypothesis [that the n observations come from a source in which the null hypothesis is true]. The question is: does this result “confirm” the hypothesis that H₀ is true of the particular data set? (Neyman, pp 40-41).

Neyman continues:

The answer … depends very much on the exact meaning given to the words “confirmation,” “confidence,” etc. If one uses these words to describe one’s intuitive feeling of confidence in the hypothesis tested H₀, then…. the attitude described is dangerous.… [T]he chance of detecting the presence [of discrepancy from the null], when only [n] observations are available, is extremely slim, even if [the discrepancy is present]. Therefore, the failure of the test to reject H₀ cannot be reasonably considered as anything like a confirmation of H₀. The situation would have been radically different if the power function [corresponding to a discrepancy of interest] were, for example, greater than 0.95. (ibid.)

The general conclusion is that it is a little rash to base one’s intuitive confidence in a given hypothesis on the fact that a test failed to reject this hypothesis. A more cautious attitude would be to form one’s intuitive opinion only after studying the power function of the test applied.

Neyman alludes to a one-sided test of the mean of a Normal distribution with n iid samples, and known standard deviation, call it test T+. (Whether Greek symbols will appear where they should, I cannot say; it’s being worked on back at Elba).

H₀: µ ≤ µ₀ against H₁: µ > µ₀.

The test statistic d(X) is the standardized sample mean.

The test rule: Infer a (positive) discrepancy from µ₀ iff {d(x₀) > cα) where cα corresponds to a difference statistically significant at the α level.

In Carnap’s example the test could not reject the null hypothesis, i.e., d(x₀) ≤ cα, but (to paraphrase Neyman) the problem is that the chance of detecting the presence of discrepancy δ from the null, with so few observations, is extremely slim, even if [δ is present].

We are back to our old friend: interpreting negative results!

“One may be confident in the absence of that discrepancy only if the power to detect it were high.”

The power of the test T+ to detect discrepancy δ:

(1) P(d(X) > cα; µ = µ₀ + δ)

It is interesting to hear Neyman talk this way since it is at odds with the more behavioristic construal he usually championed. He sounds like a Cohen-style power analyst! Still, power is calculated relative to an outcome just missing the cutoff cα. This is, in effect, the worst case of a negative (non significant) result, and if the actual outcome corresponds to a larger p-value, that should be taken into account in interpreting the results. It is more informative, therefore, to look at the probability of getting a worse fit (with the null hypothesis) than you did:

(2) P(d(X) > d(x0); µ = µ₀ + δ)

In this example, this gives a measure of the severity (or degree of corroboration) for the inference µ < µ₀ + δ.

Although (1) may be low, (2) may be high (For numbers, see Mayo and Spanos 2006).

Spanos and I (Mayo and Spanos 2006) couldn’t find a term in the literature defined precisely this way–the way I’d defined it in Mayo (1996) and before. We were thinking at first of calling it “attained power” but then came across what some have called “observed power” which is very different (and very strange). Those measures are just like ordinary power but calculated assuming the value of the mean equals the observed mean! (Why anyone would want to do this and then apply power analytic reasoning is unclear. I’ll come back to this in my next post.) Anyway, we refer to it as the Severity Interpretation of “Acceptance” (SIA) in Mayo and Spanos 2006.

The claim in (2) could also be made out viewing the p-value as a random variable, calculating its distribution for various alternatives (Cox 2006, 25). This reasoning yields a core frequentist principle of evidence (FEV) in Mayo and Cox 2010, 256):

FEV:¹ A moderate p-value is evidence of the absence of a discrepancy d from H₀ only if there is a high probability the test would have given a worse fit with H₀ (i.e., smaller p value) were a discrepancy d to exist.

It is important to see that it is only in the case of a negative result that severity for various inferences is in the same direction as power. In the case of significant results, d(x) in excess of the cutoff, the opposite concern arises—namely, the test is too sensitive. So severity is always relative to the particular inference being entertained: speaking of the “severity of a test” simpliciter is an incomplete statement in this account. These assessments enable sidestepping classic fallacies of tests that are either too sensitive or not sensitive enough.²
________________________________________

The full version of our frequentist principle of evidence FEV corresponds to the interpretation of a small p-value:

x is evidence of a discrepancy d from H₀ iff, if H₀ is a correct description of the mechanism generating x, then, with high probability a less discordant result would have occurred.

Severity (SEV) may be seen as a meta-statistical principle that follows the same logic as FEV reasoning within the formal statistical analysis.

By making a SEV assessment relevant to the inference under consideration, we obtain a measure where high (low) values always correspond to good (poor) evidential warrant.
It didn’t have to be done this way, but I decided it was best, even though it means appropriately swapping out the claim H for which one wants to assess SEV.

NOTE: There are 5 Neyman’s Nursery posts (NN1-NN5). NN3 is here. Search this blog for the others.

REFERENCES:

Cohen, J. (1992) A Power Primer.
Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences, 2^nd ed. Hillsdale, Erlbaum, NJ.

Mayo, D. and Spanos, A. (2006), “Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction,” British Journal of Philosophy of Science, 57: 323-357.

Mayo, D. and Cox, D. (2010), “Frequentist Statistics as a Theory of Inductive Inference,” in D. Mayo and A. Spanos (2011), pp. 247-275.

Mayo, D. and Spanos, A. (eds.) (2010), Error and Inference, Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, CUP.

Neyman, J. (1955), “The Problem of Inductive Inference,” Communications on Pure and Applied Mathematics, VIII, 13-46.

Categories: Neyman's Nursery, Statistics | Tags: negative result, Neyman, power, severe testing | Leave a comment

Posts Tagged With: Neyman

Power and Severity with nonsignificant results: more power puzzles? (ii)

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

Neyman, Power, and Severity

Was Janina Hosiasson pulling Harold Jeffreys’ leg?

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”? (Rejected Post Feb 20)

Neyman’s Nursery (NN2): Power and Severity [Continuation of Oct. 22 Post]:

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.