Posts Tagged With: power

Power and Severity with nonsignificant results: more power puzzles? (ii)

Posted on March 14, 2026 by Mayo

The concept of a test’s power, originating in Neyman-Pearson’s early work, by and large, is a pre-data concept for purposes of specifying a test (notably, determining worthwhile sample size), and choosing between tests. In some papers, however, Neyman lists a third goal for power: to interpret test results post data much in the spirit of what is often called “power analysis”. This is to determine the discrepancy from a null hypothesis that may be ruled out, given nonsignificant results. One example is in a paper “The Problem of Inductive Inference” (Neyman 1955)–already a surprising title for behaviorist Neyman. The reason I’m bringing this up is that it has direct bearing on some of today’s most puzzling (and problematic) post-data uses of power. Interestingly, in that 1955 paper, Neyman is talking to none other than the logical positivist philosopher of confirmation, Rudof Carnap:

I am concerned with the term “degree of confirmation” introduced by Carnap. …We have seen that the application of the locally best one-sided test to the data … failed to reject the hypothesis [that the n observations come from a source in which the null hypothesis is true]. The question is: does this result “confirm” the hypothesis that H₀ is true of the particular data set? (Neyman, pp 40-41).

Neyman continues: Continue reading →

Categories: Neyman's Nursery, power analysis | Tags: negative result, Neyman, power, severe testing | Leave a comment

Stephen Senn: Fisher’s Alternative to the Alternative

Posted on February 19, 2015 by Mayo

As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog Senn from 3 years ago.

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.

The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading →

Categories: Fisher, Statistics, Stephen Senn | Tags: power, Ronald Fisher, Savage, Stephen Senn | 60 Comments

2015 Saturday Night Brainstorming and Task Forces: (4th draft)

Posted on January 31, 2015 by Mayo

TFSI workgroup

Saturday Night Brainstorming: The TFSI on NHST–part reblog from here and here, with a substantial 2015 update!

Each year leaders of the movement to “reform” statistical methodology in psychology, social science, and other areas of applied statistics get together around this time for a brainstorming session. They review the latest from the Task Force on Statistical Inference (TFSI), propose new regulations they would like to see adopted, not just by the APA publication manual any more, but all science journals! Since it’s Saturday night, let’s listen in on part of an (imaginary) brainstorming session of the New Reformers.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Frustrated that the TFSI has still not banned null hypothesis significance testing (NHST)–a fallacious version of statistical significance tests that dares to violate Fisher’s first rule: It’s illicit to move directly from statistical to substantive effects–the New Reformers have created, and very successfully published in, new meta-level research paradigms designed expressly to study (statistically!) a central question: have the carrots and sticks of reward and punishment been successful in decreasing the use of NHST, and promoting instead use of confidence intervals, power calculations, and meta-analysis of effect sizes? Or not?

Most recently, the group has helped successfully launch a variety of “replication and reproducibility projects”. Having discovered how much the reward structure encourages bad statistics and gaming the system, they have cleverly pushed to change the reward structure: Failed replications (from a group chosen by a crowd-sourced band of replicationistas ) would not be hidden in those dusty old file drawers, but would be guaranteed to be published without that long, drawn out process of peer review. Do these failed replications indicate the original study was a false positive? or that the replication attempt is a false negative? It’s hard to say.

This year, as is typical, there is a new member who is pitching in to contribute what he hopes are novel ideas for reforming statistical practice. In addition, for the first time, there is a science reporter blogging the meeting for her next free lance “bad statistics” piece for a high impact science journal. Notice, it seems this committee only grows, no one has dropped off, in the 3 years I’ve followed them.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pawl: This meeting will come to order. I am pleased to welcome our new member, Dr. Ian Nydes, adding to the medical strength we have recently built with epidemiologist S.C.. In addition, we have a science writer with us today, Jenina Oozo. To familiarize everyone, we begin with a review of old business, and gradually turn to new business.

Franz: It’s so darn frustrating after all these years to see researchers still using NHST methods; some of the newer modeling techniques routinely build on numerous applications of those pesky tests.

Jake: And the premier publication outlets in the social sciences still haven’t mandated the severe reforms sorely needed. Hopefully the new blood, Dr. Ian Nydes, can help us go beyond resurrecting the failed attempts of the past. Continue reading →

Categories: Comedy, reforming the reformers, science communication, Statistical fraudbusting, statistical tests, Statistics | Tags: criticism of frequentists methods, NHST, power, reformers, significance tests, Sir Karl Popper, test ban | 19 Comments

Neyman, Power, and Severity

Posted on August 5, 2014 by Mayo

NEYMAN: April 16, 1894 – August 5, 1981

Jerzy Neyman: April 16, 1894-August 5, 1981. This reblogs posts under “The Will to Understand Power” & “Neyman’s Nursery” here & here.

Way back when, although I’d never met him, I sent my doctoral dissertation, Philosophy of Statistics, to one person only: Professor Ronald Giere. (And he would read it, too!) I knew from his publications that he was a leading defender of frequentist statistical methods in philosophy of science, and that he’d worked for at time with Birnbaum in NYC.

Some ~~ten~~ 15 years ago, Giere decided to quit philosophy of statistics (while remaining in philosophy of science): I think it had to do with a certain form of statistical exile (in philosophy). He asked me if I wanted his papers—a mass of work on statistics and statistical foundations gathered over many years. Could I make a home for them? I said yes. Then came his caveat: there would be a lot of them.

As it happened, we were building a new house at the time, Thebes, and I designed a special room on the top floor that could house a dozen or so file cabinets. (I painted it pale rose, with white lacquered book shelves up to the ceiling.) Then, for more than 9 months (same as my son!), I waited . . . Several boxes finally arrived, containing hundreds of files—each meticulously labeled with titles and dates. More than that, the labels were hand-typed! I thought, If Ron knew what a slob I was, he likely would not have entrusted me with these treasures. (Perhaps he knew of no one else who would actually want them!) Continue reading →

Categories: Neyman, phil/history of stat, power, Statistics | Tags: negative result, Neyman, power, severe testing | 5 Comments

STEPHEN SENN: Fisher’s alternative to the alternative

Posted on February 21, 2014 by Mayo

Reblogging 2 years ago:

By: Stephen Senn

This year [2012] marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading →

Categories: Fisher, Statistics, Stephen Senn | Tags: power, Ronald Fisher, Savage, Stephen Senn | 31 Comments

Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)

Posted on June 6, 2013 by Mayo

Having reblogged the 5/17/12 post on “reforming the reformers” yesterday, I thought I should reblog its follow-up: 6/2/12.

Consider again our one-sided Normal test T+, with null H₀: μ < μ₀ vs μ >μ₀ and μ₀ = 0, α=.025, and σ = 1, but let n = 25. So M is statistically significant only if it exceeds .392. Suppose M (the sample mean) just misses significance, say

Mo = .39.

The flip side of a fallacy of rejection (discussed before) is a fallacy of acceptance, or the fallacy of misinterpreting statistically insignificant results. To avoid the age-old fallacy of taking a statistically insignificant result as evidence of zero (0) discrepancy from the null hypothesis μ =μ₀, we wish to identify discrepancies that can and cannot be ruled out. For our test T+, we reason from insignificant results to inferential claims of the form:

μ < μ₀ + γ

Fisher continually emphasized that failure to reject was not evidence for the null. Neyman, we saw, in chastising Carnap, argued for the following kind of power analysis:

Neymanian Power Analysis (Detectable Discrepancy Size DDS): If data x are not statistically significantly different from H₀, and the power to detect discrepancy γ is high (low), then x constitutes good (poor) evidence that the actual effect is < γ. (See 11/9/11 post).

By taking into account the actual x₀, a more nuanced post-data reasoning may be obtained.

“In the Neyman-Pearson theory, sensitivity is assessed by means of the power—the probability of reaching a preset level of significance under the assumption that various alternative hypotheses are true. In the approach described here, sensitivity is assessed by means of the distribution of the random variable P, considered under the assumption of various alternatives. “ (Cox and Mayo 2010, p. 291):

This may be captured in :

FEV(ii): A moderate p-value is evidence of the absence of a discrepancy d from Ho only if there is a high probability the test would have given a worse fit with H0 (i.e., a smaller p value) were a discrepancy d to exist. (Mayo and Cox 2005, 2010, 256).

This is equivalently captured in the Rule of Acceptance (Mayo (EGEK) 1996, and in the severity interpretation for acceptance, SIA, Mayo and Spanos (2006, p. 337):

SIA: (a): If there is a very high probability that [the observed difference] would have been larger than it is, were μ > μ1, then μ < μ1 passes the test with high severity,…

But even taking tests and CIs just as we find them, we see that CIs do not avoid the fallacy of acceptance: they do not block erroneous construals of negative results adequately. Continue reading →

Categories: CIs and tests, Error Statistics, reformers, Statistics | Tags: confidence intervals, criticism of frequentist methods, fallacy of acceptance, fallacy of rejection, P-value, power, R. Carnap, reformers | Leave a comment

Saturday Night Brainstorming and Task Forces: (2013) TFSI on NHST

Posted on January 19, 2013 by Mayo

Saturday Night Brainstorming: The TFSI on NHST–reblogging with a 2013 update. Please see most recent 2015 update.

Each year leaders of the movement to reform statistical methodology in psychology, social science and other areas of applied statistics get together around this time for a brainstorming session. They review the latest from the Task Force on Statistical Inference (TFSI), propose new regulations they would like the APA publication manual to adopt, and strategize about how to institutionalize improvements to statistical methodology.

While frustrated that the TFSI has still not banned null hypothesis significance testing (NHST), since attempts going back to at least 1996, the reformers have created, and very successfully published in, new meta-level research paradigms designed expressly to study (statistically!) a central question: have the carrots and sticks of reward and punishment been successful in decreasing the use of NHST, and promoting instead use of confidence intervals, power calculations, and meta-analysis of effect sizes? Or not?

This year there are a couple of new members who are pitching in to contribute what they hope are novel ideas for reforming statistical practice. Since it’s Saturday night, let’s listen in on part of an (imaginary) brainstorming session of the New Reformers. This is a 2013 update of an earlier blogpost. Continue reading →

Categories: Comedy, reformers, statistical tests, Statistics | Tags: criticism of frequentists methods, NHST, power, reformers, significance tests, Sir Karl Popper, test ban | 8 Comments

Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)

Posted on June 2, 2012 by Mayo

*The title is to be sung to the tune of “Anything You Can Do I Can Do Better” from one of my favorite plays, Annie Get Your Gun (‘you’ being replaced by ‘test’).

This post may be seen to continue the discussion in May 17 post on Reforming the Reformers.

Consider again our one-sided Normal test T+, with null H₀: μ < μ₀ vs μ >μ₀ and μ₀ = 0, α=.025, and σ = 1, but let n = 25. So M is statistically significant only if it exceeds .392. Suppose M just misses significance, say

Mo = .39.

The flip side of a fallacy of rejection (discussed before) is a fallacy of acceptance, or the fallacy of misinterpreting statistically insignificant results. To avoid the age-old fallacy of taking a statistically insignificant result as evidence of zero (0) discrepancy from the null hypothesis μ =μ0, we wish to identify discrepancies that can and cannot be ruled out. For our test T+, we reason from insignificant results to inferential claims of the form:

μ < μ₀ + γ

Fisher continually emphasized that failure to reject was not evidence for the null. Neyman, we saw, in chastising Carnap, argued for the following kind of power analysis:

Neymanian Power Analysis (Detectable Discrepancy Size DDS): If data x are not statistically significantly different from H₀, and the power to detect discrepancy γ is high(low), then x constitutes good (poor) evidence that the actual effect is no greater than γ. (See 11/9/11 post)

By taking into account the actual x₀, a more nuanced post-data reasoning may be obtained.

“In the Neyman-Pearson theory, sensitivity is assessed by means of the power—the probability of reaching a preset level of significance under the assumption that various alternative hypotheses are true. In the approach described here, sensitivity is assessed by means of the distribution of the random variable P, considered under the assumption of various alternatives. “ (Cox and Mayo 2010, p. 291):

Continue reading →

Categories: Reformers: Prionvac, Statistics | Tags: confidence intervals, criticism of frequentist methods, fallacy of acceptance, fallacy of rejection, P-value, power, R. Carnap, reformers | 8 Comments

Saturday Night Brainstorming & Task Forces: The TFSI on NHST

Posted on May 12, 2012 by Mayo

Each year leaders of the movement to reform statistical methodology in psychology and related social sciences get together for a brainstorming session. They review the latest from the Task Force on Statistical Inference (TFSI), propose new regulations they would like the APA publication manual to adopt, and strategize about how to institutionalize improvements to statistical methodology. See my discussion of the New Reformers in the blogposts of Sept 26, Oct. 3 and 4, 2011[i]

Since it’s Saturday night, let’s listen in on part of an (imaginary) brainstorming session of the New Reformers, somewhere near an airport in a major metropolitan area.[ii] Please see 2015 update here. Continue reading →

Categories: Statistics | Tags: criticism of frequentists methods, NHST, power, reformers, significance tests, Sir Karl Popper, test ban | 7 Comments

Guest Blogger. STEPHEN SENN: Fisher’s alternative to the alternative

Posted on February 12, 2012 by Mayo

By: Stephen Senn

This year marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading →

Categories: Statistics | Tags: power, Ronald Fisher, Savage, Statistical hypothesis testing, Stephen Senn | 4 Comments

R.A.FISHER: Statistical Methods and Scientific Inference

Posted on February 11, 2012 by Mayo

In honor of R.A. Fisher’s birthday this week (Feb 17), in a year that will mark 50 years since his death, we will post the “Triad” exchange between Fisher, Pearson and Neyman, and other guest contributions*

by Sir Ronald Fisher (1955)

SUMMARY

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

“Repeated sampling from the same population”,
Errors of the “second kind”,
“Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

TO CONTINUE READING R. A. FISHER’S PAPER, CLICK HERE.

*If you wish to contribute something in connection to Fisher, send to error@vt.edu

Categories: Statistics | Tags: Abraham Wald, Egon Pearson, fallacies, Jerzy Neyman, power, Ronald Fisher, Statistical hypothesis testing | 6 Comments

Logic Takes a Bit of a Hit!: (NN4) Continuing: Shpower ("observed" power) vs Power:

Posted on November 15, 2011 by Mayo

Logic takes a bit of a hit—student driver behind me. Anyway, managed to get to JFK, and meant to explain a bit more clearly the first “shpower” post.
I’m not saying shpower is illegitimate in its own right, or that it could not have uses, only that finding that the logic for power analytic reasoning does not hold for shpower is no skin off the nose of power analytic reasoning. Continue reading →

Categories: Neyman's Nursery, Statistics | Tags: Arbunkle, Hoenig and Heisey, power, shpower | Leave a comment

Neyman’s Nursery (NN3): SHPOWER vs POWER

Posted on November 12, 2011 by Mayo

EGEK weighs 1 pound

Before leaving base again, I have a rule to check on weight gain since the start of my last trip. I put this off til the last minute, especially when, like this time, I know I’ve overeaten while traveling. The most accurate of the 4 scales I generally use (one is at my doctor’s) is actually in Neyman’s Nursery upstairs. To my surprise, none of these scales showed any discernible increase over when I left. At least one of the 4 scales would surely have registered a weight gain of 1 pound or more, had I gained it, and yet none of them do; that is an indication I’ve not gained a pound or more. I check that each scale reliably indicates 1 pound, because I know that is the weight of the book EGEK (you can even see this on the scale shown), and they each show exactly one pound when EGEK is weighed. Having evidence I’ve gained less than 1 pound, there is even less grounds for supposing I’ve gained as much as 5 pounds, right? Continue reading →

Categories: Neyman's Nursery, Statistics | Tags: power, power analytic reasoning, shpower | Leave a comment

Neyman’s Nursery (NN2): Power and Severity [Continuation of Oct. 22 Post]:

Posted on November 9, 2011 by Mayo

Let me pick up where I left off in “Neyman’s Nursery,” [built to house Giere’s statistical papers-in-exile]. The main goal of the discussion is to get us to exercise correctly our “will to understand power”, if only little by little. One of the two surprising papers I came across the night our house was hit by lightening has the tantalizing title “The Problem of Inductive Inference” (Neyman 1955). It reveals a use of statistical tests strikingly different from the long-run behavior construal most associated with Neyman. Surprising too, Neyman is talking to none other than the logical positivist philosopher of confirmation, Rudof Carnap:

I am concerned with the term “degree of confirmation” introduced by Carnap. …We have seen that the application of the locally best one-sided test to the data … failed to reject the hypothesis [that the n observations come from a source in which the null hypothesis is true]. The question is: does this result “confirm” the hypothesis that H₀ is true of the particular data set? (Neyman, pp 40-41).

Neyman continues:

The answer … depends very much on the exact meaning given to the words “confirmation,” “confidence,” etc. If one uses these words to describe one’s intuitive feeling of confidence in the hypothesis tested H₀, then…. the attitude described is dangerous.… [T]he chance of detecting the presence [of discrepancy from the null], when only [n] observations are available, is extremely slim, even if [the discrepancy is present]. Therefore, the failure of the test to reject H₀ cannot be reasonably considered as anything like a confirmation of H₀. The situation would have been radically different if the power function [corresponding to a discrepancy of interest] were, for example, greater than 0.95. (ibid.)

The general conclusion is that it is a little rash to base one’s intuitive confidence in a given hypothesis on the fact that a test failed to reject this hypothesis. A more cautious attitude would be to form one’s intuitive opinion only after studying the power function of the test applied.

Neyman alludes to a one-sided test of the mean of a Normal distribution with n iid samples, and known standard deviation, call it test T+. (Whether Greek symbols will appear where they should, I cannot say; it’s being worked on back at Elba).

H₀: µ ≤ µ₀ against H₁: µ > µ₀.

The test statistic d(X) is the standardized sample mean.

The test rule: Infer a (positive) discrepancy from µ₀ iff {d(x₀) > cα) where cα corresponds to a difference statistically significant at the α level.

In Carnap’s example the test could not reject the null hypothesis, i.e., d(x₀) ≤ cα, but (to paraphrase Neyman) the problem is that the chance of detecting the presence of discrepancy δ from the null, with so few observations, is extremely slim, even if [δ is present].

We are back to our old friend: interpreting negative results!

“One may be confident in the absence of that discrepancy only if the power to detect it were high.”

The power of the test T+ to detect discrepancy δ:

(1) P(d(X) > cα; µ = µ₀ + δ)

It is interesting to hear Neyman talk this way since it is at odds with the more behavioristic construal he usually championed. He sounds like a Cohen-style power analyst! Still, power is calculated relative to an outcome just missing the cutoff cα. This is, in effect, the worst case of a negative (non significant) result, and if the actual outcome corresponds to a larger p-value, that should be taken into account in interpreting the results. It is more informative, therefore, to look at the probability of getting a worse fit (with the null hypothesis) than you did:

(2) P(d(X) > d(x0); µ = µ₀ + δ)

In this example, this gives a measure of the severity (or degree of corroboration) for the inference µ < µ₀ + δ.

Although (1) may be low, (2) may be high (For numbers, see Mayo and Spanos 2006).

Spanos and I (Mayo and Spanos 2006) couldn’t find a term in the literature defined precisely this way–the way I’d defined it in Mayo (1996) and before. We were thinking at first of calling it “attained power” but then came across what some have called “observed power” which is very different (and very strange). Those measures are just like ordinary power but calculated assuming the value of the mean equals the observed mean! (Why anyone would want to do this and then apply power analytic reasoning is unclear. I’ll come back to this in my next post.) Anyway, we refer to it as the Severity Interpretation of “Acceptance” (SIA) in Mayo and Spanos 2006.

The claim in (2) could also be made out viewing the p-value as a random variable, calculating its distribution for various alternatives (Cox 2006, 25). This reasoning yields a core frequentist principle of evidence (FEV) in Mayo and Cox 2010, 256):

FEV:¹ A moderate p-value is evidence of the absence of a discrepancy d from H₀ only if there is a high probability the test would have given a worse fit with H₀ (i.e., smaller p value) were a discrepancy d to exist.

It is important to see that it is only in the case of a negative result that severity for various inferences is in the same direction as power. In the case of significant results, d(x) in excess of the cutoff, the opposite concern arises—namely, the test is too sensitive. So severity is always relative to the particular inference being entertained: speaking of the “severity of a test” simpliciter is an incomplete statement in this account. These assessments enable sidestepping classic fallacies of tests that are either too sensitive or not sensitive enough.²
________________________________________

The full version of our frequentist principle of evidence FEV corresponds to the interpretation of a small p-value:

x is evidence of a discrepancy d from H₀ iff, if H₀ is a correct description of the mechanism generating x, then, with high probability a less discordant result would have occurred.

Severity (SEV) may be seen as a meta-statistical principle that follows the same logic as FEV reasoning within the formal statistical analysis.

By making a SEV assessment relevant to the inference under consideration, we obtain a measure where high (low) values always correspond to good (poor) evidential warrant.
It didn’t have to be done this way, but I decided it was best, even though it means appropriately swapping out the claim H for which one wants to assess SEV.

NOTE: There are 5 Neyman’s Nursery posts (NN1-NN5). NN3 is here. Search this blog for the others.

REFERENCES:

Cohen, J. (1992) A Power Primer.
Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences, 2^nd ed. Hillsdale, Erlbaum, NJ.

Mayo, D. and Spanos, A. (2006), “Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction,” British Journal of Philosophy of Science, 57: 323-357.

Mayo, D. and Cox, D. (2010), “Frequentist Statistics as a Theory of Inductive Inference,” in D. Mayo and A. Spanos (2011), pp. 247-275.

Mayo, D. and Spanos, A. (eds.) (2010), Error and Inference, Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, CUP.

Neyman, J. (1955), “The Problem of Inductive Inference,” Communications on Pure and Applied Mathematics, VIII, 13-46.

Categories: Neyman's Nursery, Statistics | Tags: negative result, Neyman, power, severe testing | Leave a comment

Posts Tagged With: power

Power and Severity with nonsignificant results: more power puzzles? (ii)

Stephen Senn: Fisher’s Alternative to the Alternative

2015 Saturday Night Brainstorming and Task Forces: (4th draft)

Neyman, Power, and Severity

STEPHEN SENN: Fisher’s alternative to the alternative

Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)

Saturday Night Brainstorming and Task Forces: (2013) TFSI on NHST

Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)

Saturday Night Brainstorming & Task Forces: The TFSI on NHST

Guest Blogger. STEPHEN SENN: Fisher’s alternative to the alternative

R.A.FISHER: Statistical Methods and Scientific Inference

Logic Takes a Bit of a Hit!: (NN4) Continuing: Shpower ("observed" power) vs Power:

Neyman’s Nursery (NN3): SHPOWER vs POWER

Neyman’s Nursery (NN2): Power and Severity [Continuation of Oct. 22 Post]:

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.