Bayesian/frequentist

Taking errors seriously in forecasting elections

1200x-1

.

Science isn’t about predicting one-off events like election results, but that doesn’t mean the way to make election forecasts scientific (which they should be) is to build “theories of voting.” A number of people have sent me articles on statistical aspects of the recent U.S. election, but I don’t have much to say and I like to keep my blog non-political. I won’t violate this rule in making a couple of comments on Faye Flam’s Nov. 11 article: “Why Science Couldn’t Predict a Trump Presidency”[i].

For many people, Donald Trump’s surprise election victory was a jolt to very idea that humans are rational creatures. It tore away the comfort of believing that science has rendered our world predictable. The upset led two New York Times reporters to question whether data science could be trusted in medicine and business. A Guardian columnist declared that big data works for physics but breaks down in the realm of human behavior. Continue reading

Categories: Bayesian/frequentist, evidence-based policy | 15 Comments

For Statistical Transparency: Reveal Multiplicity and/or Just Falsify the Test (Remark on Gelman and Colleagues)

images-31

.

Gelman and Loken (2014) recognize that even without explicit cherry picking there is often enough leeway in the “forking paths” between data and inference so that by artful choices you may be led to one inference, even though it also could have gone another way. In good sciences, measurement procedures should interlink with well-corroborated theories and offer a triangulation of checks– often missing in the types of experiments Gelman and Loken are on about. Stating a hypothesis in advance, far from protecting from the verification biases, can be the engine that enables data to be “constructed”to reach the desired end [1].

[E]ven in settings where a single analysis has been carried out on the given data, the issue of multiple comparisons emerges because different choices about combining variables, inclusion and exclusion of cases…..and many other steps in the analysis could well have occurred with different data (Gelman and Loken 2014, p. 464).

An idea growing out of this recognition is to imagine the results of applying the same statistical procedure, but with different choices at key discretionary junctures–giving rise to a multiverse analysis, rather than a single data set (Steegen, Tuerlinckx, Gelman, and Vanpaemel 2016). One lists the different choices thought to be plausible at each stage of data processing. The multiverse displays “which constellation of choices corresponds to which statistical results” (p. 797). The result of this exercise can, at times, mimic the delineation of possibilities in multiple testing and multiple modeling strategies. Continue reading

Categories: Bayesian/frequentist, Error Statistics, Gelman, P-values, preregistration, reproducibility, Statistics | 9 Comments

A new front in the statistics wars? Peaceful negotiation in the face of so-called ‘methodological terrorism’

images-30I haven’t been blogging that much lately, as I’m tethered to the task of finishing revisions on a book (on the philosophy of statistical inference!) But I noticed two interesting blogposts, one by Jeff Leek, another by Andrew Gelman, and even a related petition on Twitter, reflecting a newish front in the statistics wars: When it comes to improving scientific integrity, do we need more carrots or more sticks? 

Leek’s post, from yesterday, called “Statistical Vitriol” (29 Sep 2016), calls for de-escalation of the consequences of statistical mistakes:

Over the last few months there has been a lot of vitriol around statistical ideas. First there were data parasites and then there were methodological terrorists. These epithets came from established scientists who have relatively little statistical training. There was the predictable backlash to these folks from their counterparties, typically statisticians or statistically trained folks who care about open source.
Continue reading

Categories: Anil Potti, fraud, Gelman, pseudoscience, Statistics | 15 Comments

Peircean Induction and the Error-Correcting Thesis (Part I)

C. S. Peirce: 10 Sept, 1839-19 April, 1914

C. S. Peirce: 10 Sept, 1839-19 April, 1914

Today is C.S. Peirce’s birthday. He’s one of my all time heroes. You should read him: he’s a treasure chest on essentially any topic, and he anticipated several major ideas in statistics (e.g., randomization, confidence intervals) as well as in logic. I’ll reblog the first portion of a (2005) paper of mine. Links to Parts 2 and 3 are at the end. It’s written for a very general philosophical audience; the statistical parts are pretty informal. Happy birthday Peirce.

Peircean Induction and the Error-Correcting Thesis
Deborah G. Mayo
Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy, Volume 41, Number 2, 2005, pp. 299-319

Peirce’s philosophy of inductive inference in science is based on the idea that what permits us to make progress in science, what allows our knowledge to grow, is the fact that science uses methods that are self-correcting or error-correcting:

Induction is the experimental testing of a theory. The justification of it is that, although the conclusion at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error. (5.145)

Continue reading

Categories: Bayesian/frequentist, C.S. Peirce, Error Statistics, Statistics | 18 Comments

TragiComedy hour: P-values vs posterior probabilities vs diagnostic error rates

Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?

Critic: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!

Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!

Raucous laughter ensues!

(Hah, hah… “So funny, I forgot to laugh! Or, I’m crying and laughing at the same time!) Continue reading

Categories: Bayesian/frequentist, Comedy, significance tests, Statistics | 8 Comments

Er, about those “other statistical approaches”: Hold off until a balanced critique is in?

street-chalk-art-optical-illusion-6

.

I could have told them that the degree of accordance enabling the “6 principles” on p-values was unlikely to be replicated when it came to most of the “other approaches” with which some would supplement or replace significance tests– notably Bayesian updating, Bayes factors, or likelihood ratios (confidence intervals are dual to hypotheses tests). [My commentary is here.] So now they may be advising a “hold off” or “go slow” approach until some consilience is achieved. Is that it? I don’t know. I was tweeted an article about the background chatter taking place behind the scenes; I wasn’t one of people interviewed for this. Here are some excerpts, I may add more later after it has had time to sink in. (check back later)

“Reaching for Best Practices in Statistics: Proceed with Caution Until a Balanced Critique Is In”

J. Hossiason

“[A]ll of the other approaches*, as well as most statistical tools, may suffer from many of the same problems as the p-values do. What level of likelihood ratio in favor of the research hypothesis will be acceptable to the journal? Should scientific discoveries be based on whether posterior odds pass a specific threshold (P3)? Does either measure the size of an effect (P5)?…How can we decide about the sample size needed for a clinical trial—however analyzed—if we do not set a specific bright-line decision rule? 95% confidence intervals or credence intervals…offer no protection against selection when only those that do not cover 0, are selected into the abstract (P4). (Benjamini, ASA commentary, pp. 3-4)

What’s sauce for the goose is sauce for the gander right?  Many statisticians seconded George Cobb who urged “the board to set aside time at least once every year to consider the potential value of similar statements” to the recent ASA p-value report. Disappointingly, a preliminary survey of leaders in statistics, many from the original p-value group, aired striking disagreements on best and worst practices with respect to these other approaches. The Executive Board is contemplating a variety of recommendations, minimally, Continue reading

Categories: Bayesian/frequentist, Statistics | 84 Comments

“P-values overstate the evidence against the null”: legit or fallacious?

The allegation that P-values overstate the evidence against the null hypothesis continues to be taken as gospel in discussions of significance tests. All such discussions, however, assume a notion of “evidence” that’s at odds with significance tests–generally likelihood ratios, or Bayesian posterior probabilities (conventional or of the “I’m selecting hypotheses from an urn of nulls” variety). I’m reblogging the bulk of an earlier post as background for a new post to appear tomorrow.  It’s not that a single small P-value provides good evidence of a discrepancy (even assuming the model, and no biasing selection effects); Fisher and others warned against over-interpreting an “isolated” small P-value long ago.  The problem is that the current formulation of the “P-values overstate the evidence” meme is attached to a sleight of hand (on meanings) that is introducing brand new misinterpretations into an already confused literature! 

 

Categories: Bayesian/frequentist, fallacy of rejection, highly probable vs highly probed, P-values | 3 Comments

“On the Brittleness of Bayesian Inference,” Owhadi, Scovel, and Sullivan (PUBLISHED)

a0a82d0b0dc678502499eaa33d4f4c79

.

The record number of hits on this blog goes to “When Bayesian Inference shatters,” where Houman Owhadi presents a “Plain Jane” explanation of results now published in “On the Brittleness of Bayesian Inference”. A follow-up was 1 year ago. Here’s how their paper begins:

 

 

owhadi

.

Houman Owhadi
Professor of Applied and Computational Mathematics and Control and Dynamical Systems, Computing + Mathematical Sciences,
California Institute of Technology, USA+

Clintpic

.

Clint Scovel
Senior Scientist,

Computing + Mathematical Sciences,
California Institute of Technology, USA
TimSullivan

.

Tim Sullivan

Warwick Zeeman Lecturer,
Assistant Professor,
Mathematics Institute,
University of Warwick, UK

 

 

“On the Brittleness of Bayesian Inference”

ABSTRACT: With the advent of high-performance computing, Bayesian methods are becoming increasingly popular tools for the quantification of uncertainty throughout science and industry. Since these methods can impact the making of sometimes critical decisions in increasingly complicated contexts, the sensitivity of their posterior conclusions with respect to the underlying models and prior beliefs is a pressing question to which there currently exist positive and negative answers. We report new results suggesting that, although Bayesian methods are robust when the number of possible outcomes is finite or when only a finite number of marginals of the data-generating distribution are unknown, they could be generically brittle when applied to continuous systems (and their discretizations) with finite information on the data-generating distribution. If closeness is defined in terms of the total variation (TV) metric or the matching of a finite system of generalized moments, then (1) two practitioners who use arbitrarily close models and observe the same (possibly arbitrarily large amount of) data may reach opposite conclusions; and (2) any given prior and model can be slightly perturbed to achieve any desired posterior conclusion. The mechanism causing brittleness/robustness suggests that learning and robustness are antagonistic requirements, which raises the possibility of a missing stability condition when using Bayesian inference in a continuous world under finite information.

© 2015, Society for Industrial and Applied Mathematics
Permalink: http://dx.doi.org/10.1137/130938633 Continue reading

Categories: Bayesian/frequentist, Statistics | 16 Comments

Gelman on ‘Gathering of philosophers and physicists unaware of modern reconciliation of Bayes and Popper’

 I’m reblogging Gelman’s post today: “Gathering of philosophers and physicists unaware of modern reconciliation of Bayes and Popper”. I concur with Gelman’s arguments against all Bayesian “inductive support” philosophies, and welcome the Gelman and Shalizi (2013) ‘meeting of the minds’ between an error statistical philosophy and Bayesian falsification (which I regard as a kind of error statistical Bayesianism). Just how radical a challenge these developments pose to other stripes of Bayesianism has yet to be explored. My comment on them is here.

Screen Shot 2015-12-16 at 11.17.09 PM

“Gathering of philosophers and physicists unaware of modern reconciliation of Bayes and Popper” by Andrew Gelman

Hiro Minato points us to a news article by physicist Natalie Wolchover entitled “A Fight for the Soul of Science.”

I have no problem with most of the article, which is a report about controversies within physics regarding the purported untestability of physics models such as string theory (as for example discussed by my Columbia colleague Peter Woit). Wolchover writes:

Whether the fault lies with theorists for getting carried away, or with nature, for burying its best secrets, the conclusion is the same: Theory has detached itself from experiment. The objects of theoretical speculation are now too far away, too small, too energetic or too far in the past to reach or rule out with our earthly instruments. . . .

Over three mild winter days, scholars grappled with the meaning of theory, confirmation and truth; how science works; and whether, in this day and age, philosophy should guide research in physics or the other way around. . . .

To social and behavioral scientists, this is all an old old story. Concepts such as personality, political ideology, and social roles are undeniably important but only indirectly related to any measurements. In social science we’ve forever been in the unavoidable position of theorizing without sharp confirmation or falsification, and, indeed, unfalsifiable theories such as Freudian psychology and rational choice theory have been central to our understanding of much of the social world.

But then somewhere along the way the discussion goes astray: Continue reading

Categories: Bayesian/frequentist, Error Statistics, Gelman, Shalizi, Statistics | 20 Comments

Return to the Comedy Hour: P-values vs posterior probabilities (1)

Comedy Hour

Comedy Hour

Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?

JB [Jim Berger]: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!(1)

Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!

Raucous laughter ensues!

(Hah, hah,…. I feel I’m back in high school: “So funny, I forgot to laugh!)

The frequentist tester should retort:

Frequentist Significance Tester: But you assumed 50% of the null hypotheses are true, and  computed P(H0|x) (imagining P(H0)= .5)—and then assumed my p-value should agree with the number you get, if it is not to be misleading!

Yet, our significance tester is not heard from as they move on to the next joke…. Continue reading

Categories: Statistics, Comedy, significance tests, Bayesian/frequentist, PBP | 27 Comments

S. McKinney: On Efron’s “Frequentist Accuracy of Bayesian Estimates” (Guest Post)

SMWorkPhoto_IMG_2432

.

Steven McKinney, Ph.D.
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

                    

On Bradley Efron’s: “Frequentist Accuracy of Bayesian Estimates”

Bradley Efron has produced another fine set of results, yielding a valuable estimate of variability for a Bayesian estimate derived from a Markov Chain Monte Carlo algorithm, in his latest paper “Frequentist accuracy of Bayesian estimates” (J. R. Statist. Soc. B (2015) 77, Part 3, pp. 617–646). I give a general overview of Efron’s brilliance via his Introduction discussion (his words “in double quotes”).

“1. Introduction

The past two decades have witnessed a greatly increased use of Bayesian techniques in statistical applications. Objective Bayes methods, based on neutral or uniformative priors of the type pioneered by Jeffreys, dominate these applications, carried forward on a wave of popularity for Markov chain Monte Carlo (MCMC) algorithms. Good references include Ghosh (2011), Berger (2006) and Kass and Wasserman (1996).”

A nice concise summary, one that should bring joy to anyone interested in Bayesian methods after all the Bayesian-bashing of the middle 20th century. Efron himself has crafted many beautiful results in the Empirical Bayes arena. He has reviewed important differences between Bayesian and frequentist outcomes that point to some as-yet unsettled issues in statistical theory and philosophy such as his scales of evidence work. Continue reading

Categories: Bayesian/frequentist, objective Bayesians, Statistics | 44 Comments

Statistical “reforms” without philosophy are blind (v update)

following-leader-off-cliff

.

Is it possible, today, to have a fair-minded engagement with debates over statistical foundations? I’m not sure, but I know it is becoming of pressing importance to try. Increasingly, people are getting serious about methodological reforms—some are quite welcome, others are quite radical. Too rarely do the reformers bring out the philosophical presuppositions of the criticisms and proposed improvements. Today’s (radical?) reform movements are typically launched from criticisms of statistical significance tests and P-values, so I focus on them. Regular readers know how often the P-value (that most unpopular girl in the class) has made her appearance on this blog. Here, I tried to quickly jot down some queries. (Look for later installments and links.) What are some key questions we need to ask to tell what’s true about today’s criticisms of P-values? 

I. To get at philosophical underpinnings, the single most import question is this:

(1) Do the debaters distinguish different views of the nature of statistical inference and the roles of probability in learning from data? Continue reading

Categories: Bayesian/frequentist, Error Statistics, P-values, significance tests, Statistics, strong likelihood principle | 193 Comments

Oy Faye! What are the odds of not conflating simple conditional probability and likelihood with Bayesian success stories?

Unknown

Faye Flam

ONE YEAR AGO, the NYT “Science Times” (9/29/14) published Fay Flam’s article, first blogged here.

Congratulations to Faye Flam for finally getting her article published at the Science Times at the New York Times, “The odds, continually updated” after months of reworking and editing, interviewing and reinterviewing. I’m grateful that one remark from me remained. Seriously I am. A few comments: The Monty Hall example is simple probability not statistics, and finding that fisherman who floated on his boots at best used likelihoods. I might note, too, that critiquing that ultra-silly example about ovulation and voting–a study so bad they actually had to pull it at CNN due to reader complaints[i]–scarcely required more than noticing the researchers didn’t even know the women were ovulating[ii]. Experimental design is an old area of statistics developed by frequentists; on the other hand, these ovulation researchers really believe their theory (and can point to a huge literature)….. Anyway, I should stop kvetching and thank Faye and the NYT for doing the article at all[iii]. Here are some excerpts:

30BAYES-master675

silly pic that accompanied the NYT article

…….When people think of statistics, they may imagine lists of numbers — batting averages or life-insurance tables. But the current debate is about how scientists turn data into knowledge, evidence and predictions. Concern has been growing in recent years that some fields are not doing a very good job at this sort of inference. In 2012, for example, a team at the biotech company Amgen announced that they’d analyzed 53 cancer studies and found it could not replicate 47 of them.

Similar follow-up analyses have cast doubt on so many findings in fields such as neuroscience and social science that researchers talk about a “replication crisis”

Continue reading

Categories: Bayesian/frequentist, Statistics | Leave a comment

(Part 2) Peircean Induction and the Error-Correcting Thesis

C. S. Peirce 9/10/1839 – 4/19/1914

C. S. Peirce
9/10/1839 – 4/19/1914

Continuation of “Peircean Induction and the Error-Correcting Thesis”

Deborah G. Mayo
Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy, Volume 41, Number 2, 2005, pp. 299-319

Part 1 is here.

There are two other points of confusion in critical discussions of the SCT, that we may note here:

I. The SCT and the Requirements of Randomization and Predesignation

The concern with “the trustworthiness of the proceeding” for Peirce like the concern with error probabilities (e.g., significance levels) for error statisticians generally, is directly tied to their view that inductive method should closely link inferences to the methods of data collection as well as to how the hypothesis came to be formulated or chosen for testing.

This account of the rationale of induction is distinguished from others in that it has as its consequences two rules of inductive inference which are very frequently violated (1.95) namely, that the sample be (approximately) random and that the property being tested not be determined by the particular sample x— i.e., predesignation.

The picture of Peircean induction that one finds in critics of the SCT disregards these crucial requirements for induction: Neither enumerative induction nor H-D testing, as ordinarily conceived, requires such rules. Statistical significance testing, however, clearly does. Continue reading

Categories: Bayesian/frequentist, C.S. Peirce, Error Statistics, Statistics | Leave a comment

Peircean Induction and the Error-Correcting Thesis (Part I)

C. S. Peirce: 10 Sept, 1839-19 April, 1914

C. S. Peirce: 10 Sept, 1839-19 April, 1914

Yesterday was C.S. Peirce’s birthday. He’s one of my all time heroes. You should read him: he’s a treasure chest on essentially any topic. I only recently discovered a passage where Popper calls Peirce one of the greatest philosophical thinkers ever (I don’t have it handy). If Popper had taken a few more pages from Peirce, he would have seen how to solve many of the problems in his work on scientific inference, probability, and severe testing. I’ll blog the main sections of a (2005) paper of mine over the next few days. It’s written for a very general philosophical audience; the statistical parts are pretty informal. I first posted it in 2013Happy (slightly belated) Birthday Peirce.

Peircean Induction and the Error-Correcting Thesis
Deborah G. Mayo
Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy, Volume 41, Number 2, 2005, pp. 299-319

Peirce’s philosophy of inductive inference in science is based on the idea that what permits us to make progress in science, what allows our knowledge to grow, is the fact that science uses methods that are self-correcting or error-correcting:

Induction is the experimental testing of a theory. The justification of it is that, although the conclusion at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error. (5.145)

Inductive methods—understood as methods of experimental testing—are justified to the extent that they are error-correcting methods. We may call this Peirce’s error-correcting or self-correcting thesis (SCT):

Self-Correcting Thesis SCT: methods for inductive inference in science are error correcting; the justification for inductive methods of experimental testing in science is that they are self-correcting. Continue reading

Categories: Bayesian/frequentist, C.S. Peirce, Error Statistics, Statistics | Leave a comment

Can You change Your Bayesian prior? (ii)

images-1

.

This is one of the questions high on the “To Do” list I’ve been keeping for this blog.  The question grew out of discussions of “updating and downdating” in relation to papers by Stephen Senn (2011) and Andrew Gelman (2011) in Rationality, Markets, and Morals.[i]

“As an exercise in mathematics [computing a posterior based on the client’s prior probabilities] is not superior to showing the client the data, eliciting a posterior distribution and then calculating the prior distribution; as an exercise in inference Bayesian updating does not appear to have greater claims than ‘downdating’.” (Senn, 2011, p. 59)

“If you could really express your uncertainty as a prior distribution, then you could just as well observe data and directly write your subjective posterior distribution, and there would be no need for statistical analysis at all.” (Gelman, 2011, p. 77)

But if uncertainty is not expressible as a prior, then a major lynchpin for Bayesian updating seems questionable. If you can go from the posterior to the prior, on the other hand, perhaps it can also lead you to come back and change it.

Is it legitimate to change one’s prior based on the data?

I don’t mean update it, but reject the one you had and replace it with another. My question may yield different answers depending on the particular Bayesian view. I am prepared to restrict the entire question of changing priors to Bayesian “probabilisms”, meaning the inference takes the form of updating priors to yield posteriors, or to report a comparative Bayes factor. Interpretations can vary. In many Bayesian accounts the prior probability distribution is a way of introducing prior beliefs into the analysis (as with subjective Bayesians) or, conversely, to avoid introducing prior beliefs (as with reference or conventional priors). Empirical Bayesians employ frequentist priors based on similar studies or well established theory. There are many other variants.

images

.

S. SENN: According to Senn, one test of whether an approach is Bayesian is that while Continue reading

Categories: Bayesian/frequentist, Gelman, S. Senn, Statistics | 111 Comments

From our “Philosophy of Statistics” session: APS 2015 convention

aps_2015_logo_cropped-1

.

“The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference,” at the 2015 American Psychological Society (APS) Annual Convention in NYC, May 23, 2015:

 

D. Mayo: “Error Statistical Control: Forfeit at your Peril” 

 

S. Senn: “‘Repligate’: reproducibility in statistical studies. What does it mean and in what sense does it matter?”

 

A. Gelman: “The statistical crisis in science” (this is not his exact presentation, but he focussed on some of these slides)

 

For more details see this post.

Categories: Bayesian/frequentist, Error Statistics, P-values, reforming the reformers, reproducibility, S. Senn, Statistics | 10 Comments

Philosophy of Statistics Comes to the Big Apple! APS 2015 Annual Convention — NYC

Start Spreading the News…..

aps_2015_logo_cropped-1

..

 The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference,
2015 APS Annual Convention
Saturday, May 23  
2:00 PM- 3:50 PM in Wilder

(Marriott Marquis 1535 B’way)

 

 

gelman5

.

Andrew Gelman

Professor of Statistics & Political Science
Columbia University

SENN FEB

.

Stephen Senn

Head of Competence Center
for Methodology and Statistics (CCMS)

Luxembourg Institute of Health

 .
.

Slide1

D. Mayo headshot

D.G. Mayo, Philosopher


morey

.

Richard Morey, Session Chair & Discussant

Senior Lecturer
School of Psychology
Cardiff University
Categories: Announcement, Bayesian/frequentist, Statistics | 8 Comments

Joan Clarke, Turing, I.J. Good, and “that after-dinner comedy hour…”

I finally saw The Imitation Game about Alan Turing and code-breaking at Bletchley Park during WWII. This short clip of Joan Clarke, who was engaged to Turing, includes my late colleague I.J. Good at the end (he’s not second as the clip lists him). Good used to talk a great deal about Bletchley Park and his code-breaking feats while asleep there (see note[a]), but I never imagined Turing’s code-breaking machine (which, by the way, was called the Bombe and not Christopher as in the movie) was so clunky. The movie itself has two tiny scenes including Good. Below I reblog: “Who is Allowed to Cheat?”—one of the topics he and I debated over the years. Links to the full “Savage Forum” (1962) may be found at the end (creaky, but better than nothing.)

[a]”Some sensitive or important Enigma messages were enciphered twice, once in a special variation cipher and again in the normal cipher. …Good dreamed one night that the process had been reversed: normal cipher first, special cipher second. When he woke up he tried his theory on an unbroken message – and promptly broke it.” This, and further examples may be found in this obituary

[b] Pictures comparing the movie cast and the real people may be found here. Continue reading

Categories: Bayesian/frequentist, optional stopping, Statistics, strong likelihood principle | 6 Comments

What’s wrong with taking (1 – β)/α, as a likelihood ratio comparing H0 and H1?

mayo_thumbnail_rings

.

Here’s a quick note on something that I often find in discussions on tests, even though it treats “power”, which is a capacity-of-test notion, as if it were a fit-with-data notion…..

1. Take a one-sided Normal test T+: with n iid samples:

H0: µ ≤  0 against H1: µ >  0

σ = 10,  n = 100,  σ/√n =σx= 1,  α = .025.

So the test would reject H0 iff Z > c.025 =1.96. (1.96. is the “cut-off”.)

~~~~~~~~~~~~~~

  1. Simple rules for alternatives against which T+ has high power:
  • If we add σx (here 1) to the cut-off (here, 1.96) we are at an alternative value for µ that test T+ has .84 power to detect.
  • If we add 3σto the cut-off we are at an alternative value for µ that test T+ has ~ .999 power to detect. This value, which we can write as µ.999 = 4.96

Let the observed outcome just reach the cut-off to reject the null,z= 1.96.

If we were to form a “likelihood ratio” of μ = 4.96 compared to μ0 = 0 using

[Power(T+, 4.96)]/α,

it would be 40.  (.999/.025).

It is absurd to say the alternative 4.96 is supported 40 times as much as the null, understanding support as likelihood or comparative likelihood. (The data 1.96 are even closer to 0 than to 4.96). The same point can be made with less extreme cases.) What is commonly done next is to assign priors of .5 to the two hypotheses, yielding

Pr(H0 |z0) = 1/ (1 + 40) = .024, so Pr(H1 |z0) = .976.

Such an inference is highly unwarranted and would almost always be wrong. Continue reading

Categories: Bayesian/frequentist, law of likelihood, Statistical power, statistical tests, Statistics, Stephen Senn | 87 Comments

Blog at WordPress.com.