power

‘Low power’ and an all too standard error (continuation of “don’t turn power on its head”)

Posted on June 7, 2026 by Mayo

“In my opinion, a great deal of confusion about statistics can be traced to the fact that the point estimate is seen as being the be all and end all, the expression of uncertainty being forgotten….to provide a point estimate without also providing a standard error is, indeed, an all too standard error.”

Stephen Senn: “Error point: the importance of knowing how much you don’t know”

In my previous blogpost, (“How not to turn power on its head”), I argued, in relation to a one-sided test of mean μ (e.g., H₀: µ ≤ 0 vs H₁: µ > 0 with known SE):

If POW(μ′) is high (e.g., over .5), then a just significant result is poor evidence that μ > μ′; while if POW(μ′) is low (e.g., less than .2), it is good evidence that μ > μ′ where μ′ is a value greater than 0 (provided assumptions for these claims hold approximately).

By a “just statistically significant result” I mean one that just makes it to the threshold for statistical significance, write it as M* (my last post used D*). The reasoning is essentially this: Because it’s very improbable to obtain as low a P-value as we did, were μ as small as μ′—that is, because POW(μ′) is low—the result indicates we are in a world where μ is greater than μ′. This is exactly the reasoning that allows us to infer μ > 0 with a statistically significant result. Indeed, the power of the test against μ₀ is α. It is supposed that the statistical assumptions needed for the error probabilities to apply hold adequately.

Why then do we often hear that low power is associated with “exaggerated” or “inflated” effects? As we reasoned in the previous post, low power against μ′ strengthens the inference that μ exceeds μ′. Can the same feature—low power—also be associated with overestimation? The answer is, yes it can, but only one of the claims corresponds to a correct application of statistical significance tests.

More specifically, the overestimation charge stems from supposing the observed result M* is taken as a (point) estimate of the population mean (i.e., estimating μ = M*, without providing the SE)–an unkosher (but not so uncommon) move–and then considering a value μ′ against which the test has low power. Since M* is the just-significant cutoff, clearly M* will exceed μ′ (at least in a good test). So if the true population mean takes a value against which the test has low power, and M* is taken as a point estimate of μ, the result will be to “overestimate” the population mean. While the true value is unknown, this if-then claim is correct. Likewise, if the power to detect the true μ is high, the observed M*, will underestimate μ–if M* is used as a point estimate.

To clarify these points, it helps to contrast two different questions that are often run together:

Does the observed (just) statistically significant result M* warrant inferring μ > μ′ (when POW(μ′) is low)?
Does the observed (just) statistically significant result M* exceed μ′ (when POW(μ′) is low)?

The answer to both questions is yes. The very fact invoked to show that M* exceeds μ′—yielding a “yes” answer to #2–namely, that a result at least as large as M* would be improbable were μ = μ′—is precisely what warrants inferring that μ > μ′–yielding a “yes” answer to #1.

However common it may be to equate the observed statistically significant result M* with the population mean, that is not a warranted inference from a significance test. For one thing, significance test inferences are inequalities, not point claims or point estimates. A statistically significant result warrants inferring μ > μ₀ and, more generally, warrants inferring μ > μ′ for values μ′ against which the test has sufficiently low power–although it is not typically put that way. It would more typically be put in terms of the p-value reached in relation to a discrepancy from H₀. (We would get a p-value function over different discrepancies.) What is the p-value were we testing H₀: µ ≤ M*, and observed our just significant result M*? Answer: .5. Thus, to take M* as warranting µ > M*, would be to follow a method that is wrong 50% of the time. (See mountains out of molehill fallacy in SIST.)

There is, of course, a relation between tests and estimation. Rejecting H₀ (at level α) is equivalent to inferring that μ exceeds the corresponding lower confidence bound (at level 1- α, for the 1-sided case). Obtaining this lower bound requires subtracting a number of SEs (e.g., 1.5, 1.65, 1.96, 2) from M*.

Observe that POW(μ₀) = α and POW(M*) = .5. We can relate the consequence of μ′ moving farther below M*:

As μ′ moves farther below M*	Consequence
M* − μ′ increases	Greater overestimation if the observed M* is used to estimate μ
POW(μ′) decreases	The probability of obtaining M ≥ M* under μ = μ′ decreases
P-value for μ = μ′ decreases	Stronger evidence that μ > μ′

Thus, as power against μ′ decreases, the amount by which M* exceeds μ′ increases, but so too does the evidence that μ exceeds μ′. The very circumstance that yields greater overestimation when M* is used to estimate μ yields stronger evidence that μ exceeds μ′.

One final point. If a testing procedure is selectively reporting only statistically significant results, then the original error probabilities no longer apply–whether to the test or equivalent CI estimation.

Share your queries and thoughts in the comments to this post.

For a related post see “Do underpowered tests exaggerate population effects?”

See also the discussion on pp. 359-361 of Mayo (2018, CUP): Statistical Inference as Severe Testing: How to get beyond the statistics wars? (SIST). The relevant excerpt can be found here.

Categories: power, reforming the reformers | Leave a comment

How not to turn power on its head

Posted on May 11, 2026 by Mayo

In giving some informal remarks about power at a seminar a couple of weeks ago, I proposed that the tendency to turn the notion of power on its head might be avoided by imagining we need to define a test’s error probabilities in terms of its power alone. We can refer to the power against the null hypothesis, rather than alluding to a type 1 error probability, for example. What do I mean by turning power on its head? I mean, at least here, supposing that a test provides poor evidence of discrepancies that the test has low power to detect. Continue reading →

Categories: power | 3 Comments

Continuing the blizzard of 26 power puzzles

Posted on March 3, 2026 by Mayo

The mayor of NYC offered $30 an hour to help shovel the ~ 30 inches of snow that fell last Sunday and Monday. From what I hear, it was a very effective program. Here’s a little power puzzle to very easily shovel through [1]

Suppose you are reading about a result x that is just statistically significant at level α (i.e., P-value = α) in a one-sided test T+ of the mean of a Normal distribution with n iid samples, and (for simplicity) known σ: H₀: µ ≤ 0 against H₁: µ > 0. I have heard some people say:

A. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is poor evidence of a discrepancy (from the null) corresponding to µ’. (i.e., there’s poor evidence that µ > µ’ ). I am keeping symbols as simple as possible. *See point on language in notes.

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is warranted, or at least not problematic.

I have heard other people say:

B. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is good evidence of a discrepancy (from the null) corresponding to µ’ (i.e., there’s good evidence that µ > µ’).

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is unwarranted.

Which is correct, from the perspective of the (error statistical) philosophy, within which power and associated tests are defined? Continue reading →

Categories: blizzard of 26 power puzzles, power, reforming the reformers | 1 Comment

A Blizzard of Power Puzzles Replicate in Meta-Research

Posted on March 1, 2026 by Mayo

I often say that the most misunderstood concept in error statistics is power. One week ago, stuck in the blizzard of 2026 in NYC —exciting, if also a bit unnerving, with airports closed for two and a half days and no certainty of when I might fly out—I began collecting the many power howlers I’ve discussed in the past, because some of them are being replicated in todays meta-research about replication failure! Apparently, mistakes about statistical concepts replicate quite reliably—even when statistically significant effects do not. Others I find in medical reports of clinical trials of treatments I’m trying to evaluate in real life! Here’s one variant: A statistically significant result in a clinical trial with fairly high (e.g., .8) power to detect an impressive improvement δ’ is taken as good evidence of its impressive improvement δ’. Often the high power of .8 is even used as a (posterior) probability of the hypothesis of improvement being δ’. [0] If these do not immediately strike you as fallacious, compare:

If the house is fully ablaze, then very probably the fire alarm goes off.
If the fire alarm goes off, then very probably the house is fully ablaze.

The first bullet is saying the fire alarm has high power to detect the house being fully ablaze. It does not mean the converse in the second bullet. Continue reading →

Categories: blizzard of 26, power, SIST, statistical significance tests | Tags: misunderstanding power, power analysts, Ziliac & McCloskey | 11 Comments

Leisurely Cruise February 2026: power, shpower, positive predictive value

Posted on February 12, 2026 by Mayo

2025-6 Leisurely Cruise

The following is the February stop of our leisurely cruise (meeting 6 from my 2020 Seminar at the LSE). There was a guest speaker, Professor David Hand. Slides and videos are below. Ship StatInfasSt may head back to port or continue for an additional stop or two, if there is interest. Although I often say on this blog that the classical notion of power, as defined by Neyman and Pearson, is one of the most misunderstood notions in stat foundations. I did not know, in writing SIST, just how ingrained those misconceptions would become. I’ll write more on this in my next post. (The following is from SIST pp. 354-356, the pages are provided below)

Shpower and Retrospective Power Analysis

It’s unusual to hear books condemn an approach in a hush-hush sort of way without explaining what’s so bad about it. This is the case with something called post hoc power analysis, practiced by some who live on the outskirts of Power Peninsula. Psst, don’t go there. We hear “there’s a sinister side to statistical power, … I’m referring to post hoc power” (Cumming 2012, pp. 340-1), also called observed power and retrospective (retro) power. I will be calling it shpower analysis. It distorts the logic of ordinary power analysis (from insignificant results). The “post hoc” part comes in because it’s based on the observed results. The trouble is that ordinary power analysis is also post-data. The criticisms are often wrongly taken to reject both. Continue reading →

Categories: 2025-2026 Leisurely Cruise, power | Leave a comment

Are We Listening? Part II of “Sennsible significance” Commentary on Senn’s Guest Post

Posted on July 9, 2025 by Mayo

This is Part II of my commentary on Stephen Senn’s guest post, Be Careful What You Wish For. In this follow-up, I take up two topics:

(1) A terminological point raised in the comments to Part I, and
(2) A broader concern about how a popular reform movement reinforces precisely the mistaken construal Senn warns against.

But first, a question—are we listening? Because what underlies what Senn is saying is subtle, and yet what’s at stake is quite important for today’s statistical controversies. It’s not just a matter of which of four common construals is most apt for the population effect we wish to have high power to detect.[1] As I hear Senn, he’s also flagging a misunderstanding that allows some statistical reformers to (wrongly) dictate what statistical significance testers “wish” for in the first place. Continue reading →

Categories: clinical relevance, power, reforming the reformers, S. Senn | 5 Comments

“Sennsible significance” Commentary on Senn’s Guest Post (Part I)

Posted on June 17, 2025 by Mayo

Have the points in Stephen Senn’s guest post fully come across? Responding to comments from diverse directions has given Senn a lot of work, for which I’m very grateful. But I say we should not leave off the topic just yet. I don’t think the core of Senn’s argument has gotten the attention it deserves. So, we’re not done yet.[0]

I will write my commentary in two parts, so please return for Part II. In Part I, I’ll attempt to give an overarching version of Senn’s warning (“Be careful what you wish for”) and his main recommendation. He will tell me if he disagrees. All quotes are from his post. In Senn’s opening paragraph:

…Even if a hypothesis is rejected and the effect is assumed genuine, it does not mean it is important…many a distinguished commentator on clinical trials has confused the difference you would be happy to find with the difference you would not like to miss. The former is smaller than the latter. For reasons I have explained in this blog [reblogged here], you should use the latter for determining the sample size as part of a conventional power calculation.

Continue reading →

Categories: clinical relevance, power, S. Senn | 6 Comments

Stephen Senn (guest post): “Relevant significance? Be careful what you wish for”

Posted on May 21, 2025 by Mayo

Stephen Senn

Consultant Statistician
Edinburgh

Relevant significance?

Be careful what you wish for

Despised and Rejected

Scarcely a good word can be had for statistical significance these days. We are admonished (as if we did not know) that just because a null hypothesis has been ‘rejected’ by some statistical test, it does not mean it is not true and thus it does not follow that significance implies a genuine effect of treatment. Continue reading →

Categories: clinical relevance, power, S. Senn | 47 Comments

(Guest Post) Stephen Senn: “Delta Force: To what extent is clinical relevance relevant?” (reblog)

Posted on May 16, 2025 by Mayo

Senn

Errorstatistics.com has been extremely fortunate to have contributions by leading medical statistician, Stephen Senn, over many years. Recently, he provided me with a new post that I’m about to put up, but as it builds on an earlier post, I’ll reblog that one first. Following his new post, I’ll share some reflections on the issue.

Stephen Senn
Consultant Statistician
Edinburgh, Scotland

Delta Force
To what extent is clinical relevance relevant?

Inspiration
This note has been inspired by a Twitter exchange with respected scientist and famous blogger David Colquhoun. He queried whether a treatment that had 2/3 of an effect that would be described as clinically relevant could be useful. I was surprised at the question, since I would regard it as being pretty obvious that it could but, on reflection, I realise that things that may seem obvious to some who have worked in drug development may not be obvious to others, and if they are not obvious to others are either in need of a defence or wrong. I don’t think I am wrong and this note is to explain my thinking on the subject. Continue reading →

Categories: power, Statistics, Stephen Senn | 2 Comments

A statistically significant result indicates H’ (μ > μ’) when POW(μ’) is low (not the other way round)–but don’t ignore the standard error

Posted on May 9, 2022 by Mayo

1. New monsters. One of the bizarre facts of life in the statistics wars is that a method from one school may be criticized on grounds that it conflicts with a conception that is the reverse of what that school intends. How is that even to be deciphered? That was the difficult task I set for myself in writing Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2008) [SIST 2018]. I thought I was done, but new monsters keep appearing. In some cases, rather than see how the notion of severity gets us beyond fallacies, misconstruals are taken to criticize severity! So, for example, in the last couple of posts, here and here, I deciphered some of the better known power howlers (discussed in SIST Ex 5 Tour II) I’m linking to all of this tour (in proofs). Continue reading →

Categories: power, reforming the reformers, SIST, Statistical Inference as Severe Testing | 16 Comments

Do “underpowered” tests “exaggerate” population effects? (iv)

Posted on May 2, 2022 by Mayo

You will often hear that if you reach a just statistically significant result “and the discovery study is underpowered, the observed effects are expected to be inflated” (Ioannidis 2008, p. 64), or “exaggerated” (Gelman and Carlin 2014). This connects to what I’m referring to as the second set of concerns about statistical significance tests, power and magnitude errors. Here, the problem does not revolve around erroneously interpreting power as a posterior probability, as we saw in the fallacy in this post. But there are other points of conflict with the error statistical tester, and much that cries out for clarification — else you will misunderstand the consequences of some of today’s reforms.. Continue reading →

Categories: power, reforming the reformers, SIST, Statistical Inference as Severe Testing | 17 Comments

Join me in reforming the “reformers” of statistical significance tests

Posted on April 26, 2022 by Mayo

The most surprising discovery about today’s statistics wars is that some who set out shingles as “statistical reformers” themselves are guilty of misdefining some of the basic concepts of error statistical tests—notably power. (See my recent post on power howlers.) A major purpose of my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP) is to clarify basic notions to get beyond what I call “chestnuts” and “howlers” of tests. The only way that disputing tribes can get beyond the statistics wars is by (at least) understanding correctly the central concepts. But these misunderstandings are more common than ever, so I’m asking readers to help. Why are they more common (than before the “new reformers” of the last decade)? I suspect that at least one reason is the popularity of Bayesian variants on tests: if one is looking to find posterior probabilities of hypotheses, then error statistical ingredients may tend to look as if that’s what they supply. Continue reading →

Categories: power, SIST, statistical significance tests | Tags: misunderstanding power, power analysts, Ziliac & McCloskey | 2 Comments

S. Senn: The Power of Negative Thinking (guest post)

Posted on February 12, 2021 by Mayo

Stephen Senn
Consultant Statistician
Edinburgh, Scotland

Sepsis sceptic

During an exchange on Twitter, Lawrence Lynn drew my attention to a paper by Laffey and Kavanagh[1]. This makes an interesting, useful and very depressing assessment of the situation as regards clinical trials in critical care. The authors make various claims that RCTs in this field are not useful as currently conducted. I don’t agree with the authors’ logic here although, perhaps, surprisingly, I consider that their conclusion might be true. I propose to discuss this here. Continue reading →

Categories: power, randomization | 5 Comments

Bonus meeting: Graduate Research Seminar: Current Controversies in Phil Stat: LSE PH 500: 25 June 2020

Posted on June 21, 2020 by Mayo

Ship StatInfasSt

We’re holding a bonus, 6th, meeting of the graduate research seminar PH500 for the Philosophy, Logic & Scientific Method Department at the LSE:

(Remote 10am-12 EST, 15:00 – 17:00 London time; Thursday, June 25)

VI. (June 25) BONUS: Power, shpower, severity, positive predictive value (diagnostic model) & a Continuation of The Statistics Wars and Their Casualties

There will also be a guest speaker: Professor David Hand (Imperial College, London). Here is Professor Hand’s presentation (click on “present” to hear sound)

The main readings are on the blog page for the seminar.

Categories: Graduate Seminar PH500 LSE, power | Leave a comment

(Full) Excerpt. Excursion 5 Tour II: How Not to Corrupt Power (Power Taboos, Retro Power, and Shpower)

Posted on June 7, 2019 by Mayo

returned from London…

The concept of a test’s power is still being corrupted in the myriad ways discussed in 5.5, 5.6. I’m excerpting all of Tour II of Excursion 5, as I did with Tour I (of Statistical Inference as Severe Testing:How to Get Beyond the Statistics Wars 2018, CUP)*. Originally the two Tours comprised just one, but in finalizing corrections, I decided the two together was too long of a slog, and I split it up. Because it was done at the last minute, some of the terms in Tour II rely on their introductions in Tour I. Here’s how it starts:

5.5 Power Taboos, Retrospective Power, and Shpower

Let’s visit some of the more populous tribes who take issue with power – by which we mean ordinary power – at least its post-data uses. Power Peninsula is often avoided due to various “keep out” warnings and prohibitions, or researchers come during planning, never to return. Why do some people consider it a waste of time, if not totally taboo, to compute power once we know the data? A degree of blame must go to N-P, who emphasized the planning role of power, and only occasionally mentioned its use in determining what gets “confirmed” post-data. After all, it’s good to plan how large a boat we need for a philosophical excursion to the Lands of Overlapping Statistical Tribes, but once we’ve made it, it doesn’t matter that the boat was rather small. Or so the critic of post-data power avers. A crucial disanalogy is that with statistics, we don’t know that we’ve “made it there,” when we arrive at a statistically significant result. The statistical significance alarm goes off, but you are not able to see the underlying discrepancy that generated the alarm you hear. The problem is to make the leap from the perceived alarm to an aspect of a process, deep below the visible ocean, responsible for its having been triggered. Then it is of considerable relevance to exploit information on the capability of your test procedure to result in alarms going off (perhaps with different decibels of loudness), due to varying values of the parameter of interest. There are also objections to power analysis with insignificant results. Continue reading →

Categories: fallacy of non-significance, power, Statistical Inference as Severe Testing | Leave a comment

How to avoid making mountains out of molehills (using power and severity)

Posted on December 12, 2017 by Mayo

In preparation for a new post that takes up some of the recent battles on reforming or replacing p-values, I reblog an older post on power, one of the most misunderstood and abused notions in statistics. (I add a few “notes on howlers”.) The power of a test T in relation to a discrepancy from a test hypothesis H₀ is the probability T will lead to rejecting H₀ when that discrepancy is present. Power is sometimes misappropriated to mean something only distantly related to the probability a test leads to rejection; but I’m getting ahead of myself. This post is on a classic fallacy of rejection. Continue reading →

Categories: CIs and tests, Error Statistics, power | 8 Comments

Frequentstein’s Bride: What’s wrong with using (1 – β)/α as a measure of evidence against the null?

Posted on May 23, 2017 by Mayo

ONE YEAR AGO: …and growing more relevant all the time. Rather than leak any of my new book*, I reblog some earlier posts, even if they’re a bit scruffy. This was first blogged here (with a slightly different title). It’s married to posts on “the P-values overstate the evidence against the null fallacy”, such as this, and is wedded to this one on “How to Tell What’s True About Power if You’re Practicing within the Frequentist Tribe”.

In their “Comment: A Simple Alternative to p-values,” (on the ASA P-value document), Benjamin and Berger (2016) recommend researchers report a pre-data Rejection Ratio:

It is the probability of rejection when the alternative hypothesis is true, divided by the probability of rejection when the null hypothesis is true, i.e., the ratio of the power of the experiment to the Type I error of the experiment. The rejection ratio has a straightforward interpretation as quantifying the strength of evidence about the alternative hypothesis relative to the null hypothesis conveyed by the experimental result being statistically significant. (Benjamin and Berger 2016, p. 1)

Continue reading →

Categories: Bayesian/frequentist, fallacy of rejection, J. Berger, power, S. Senn | 17 Comments

How to tell what’s true about power if you’re practicing within the error-statistical tribe

Posted on May 8, 2017 by Mayo

This is a modified reblog of an earlier post, since I keep seeing papers that confuse this.

I have heard some people say:

A. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is poor evidence of a discrepancy (from the null) corresponding to µ’. (i.e., there’s poor evidence that µ > µ’ ).*See point on language in notes.

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is warranted, or at least not problematic.

I have heard other people say:

B. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is good evidence of a discrepancy (from the null) corresponding to µ’ (i.e., there’s good evidence that µ > µ’).

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is unwarranted.

Which is correct, from the perspective of the (error statistical) philosophy, within which power and associated tests are defined? Continue reading →

Categories: power, reforming the reformers | 17 Comments

“Nonsignificance Plus High Power Does Not Imply Support for the Null Over the Alternative.”

Posted on July 21, 2016 by Mayo

Seeing the world through overly rosy glasses

Taboos about power nearly always stem from misuse of power analysis. Sander Greenland (2012) has a paper called “Nonsignificance Plus High Power Does Not Imply Support for the Null Over the Alternative.” I’m not saying Greenland errs; the error would be made by anyone who interprets power analysis in a manner giving rise to Greenland’s objection. So what’s (ordinary) power analysis?

(I) Listen to Jacob Cohen (1988) introduce Power Analysis

“PROVING THE NULL HYPOTHESIS. Research reports in the literature are frequently flawed by conclusions that state or imply that the null hypothesis is true. For example, following the finding that the difference between two sample means is not statistically significant, instead of properly concluding from this failure to reject the null hypothesis that the data do not warrant the conclusion that the population means differ, the writer concludes, at least implicitly, that there is no difference. The latter conclusion is always strictly invalid, and is functionally invalid as well unless power is high. The high frequency of occurrence of this invalid interpretation can be laid squarely at the doorstep of the general neglect of attention to statistical power in the training of behavioral scientists. Continue reading →

Categories: Cohen, Greenland, power, Statistics | 46 Comments

Frequentstein: What’s wrong with (1 – β)/α as a measure of evidence against the null? (ii)

Posted on May 22, 2016 by Mayo

In their “Comment: A Simple Alternative to p-values,” (on the ASA P-value document), Benjamin and Berger (2016) recommend researchers report a pre-data Rejection Ratio:

It is the probability of rejection when the alternative hypothesis is true, divided by the probability of rejection when the null hypothesis is true, i.e., the ratio of the power of the experiment to the Type I error of the experiment. The rejection ratio has a straightforward interpretation as quantifying the strength of evidence about the alternative hypothesis relative to the null hypothesis conveyed by the experimental result being statistically significant. (Benjamin and Berger 2016, p. 1)

The recommendation is much more fully fleshed out in a 2016 paper by Bayarri, Benjamin, Berger, and Sellke (BBBS 2016): Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses. Their recommendation is:

…that researchers should report the ‘pre-experimental rejection ratio’ when presenting their experimental design and researchers should report the ‘post-experimental rejection ratio’ (or Bayes factor) when presenting their experimental results. (BBBS 2016, p. 3)….

The (pre-experimental) ‘rejection ratio’ R_pre , the ratio of statistical power to significance threshold (i.e., the ratio of the probability of rejecting under H₁ and H₀ respectively), is shown to capture the strength of evidence in the experiment for H₁over H₀. (ibid., p. 2)

But in fact it does no such thing! [See my post from the FUSION conference here.] J. Berger, and his co-authors, will tell you the rejection ratio (and a variety of other measures created over the years) are entirely frequentist because they are created out of frequentist error statistical measures. But a creation built on frequentist measures doesn’t mean the resulting animal captures frequentist error statistical reasoning. It might be a kind of Frequentstein monster! [1] Continue reading →

Categories: J. Berger, power, reforming the reformers, S. Senn, Statistical power, Statistics | 36 Comments

power

Stephen Senn

Consultant Statistician Edinburgh

Relevant significance?

Be careful what you wish for

Despised and Rejected

Sepsis sceptic

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

Consultant Statistician
Edinburgh