The Meaning of My Title: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars

.

Excerpts from the Preface:

The Statistics Wars: 

Today’s “statistics wars” are fascinating: They are at once ancient and up to the minute. They reflect disagreements on one of the deepest, oldest, philosophical questions: How do humans learn about the world despite threats of error due to incomplete and variable data? At the same time, they are the engine behind current controversies surrounding high-profile failures of replication in the social and biological sciences. How should the integrity of science be restored? Experts do not agree. This book pulls back the curtain on why.

Methods of statistical inference become relevant primarily when effects are neither totally swamped by noise, nor so clear cut that formal assessment of errors is relatively unimportant. Should probability enter to capture degrees of belief about claims? To measure variability? Or to ensure we won’t reach mistaken interpretations of data too often in the long run of experience? Modern statistical methods grew out of attempts to systematize doing all of these. The field has been marked by disagreements between competing tribes of frequentists and Bayesians that have been so contentious–likened in some quarters to religious and political debates–that everyone wants to believe we are long past them. We now enjoy unifications and reconciliations between rival schools, it will be said, and practitioners are eclectic, prepared to use whatever method “works.” The truth is, long-standing battles still simmer below the surface in questions about scientific trustworthiness and the relationships between Big Data-driven models and theory. The reconciliations and unifications have been revealed to have serious problems, and there’s little agreement on which to use or how to interpret them. As for eclecticism, it’s often not clear what is even meant by “works.” The presumption that all we need is an agreement on numbers–never mind if they’re measuring different things–leads to pandemonium. Let’s brush the dust off the pivotal debates, walk into the museums where we can see and hear such founders as Fisher, Neyman, Pearson, Savage and many others. This is to simultaneously zero in on the arguments between metaresearchers–those doing research on research–charged with statistical reforms.

Statistical Inference as Severe Testing:

Why are some arguing that in today’s world of high powered computer searches that statistical findings are mostly false? The problem is that high powered methods can make it easy to uncover impressive-looking findings even if they are false: spurious correlations and other errors have not been severely probed. We set sail with a simple tool: If little or nothing has been done to rule out flaws in inferring a claim, then it has not passed a severe test. In the severe testing view, probability arises in scientific contexts to assess and control how capable methods are at uncovering and avoiding erroneous interpretations of data. That’s what it means to view statistical inference as severe testing. A claim is severely tested to the extent it has been subjected to and passes a test that probably would have found flaws, were they present. You may be surprised to learn that many methods advocated by experts do not stand up to severe scrutiny, are even in tension with successful strategies for blocking or accounting for cherry-picking and selective reporting!

The severe testing perspective substantiates, using modern statistics, the idea Karl Popper promoted, but never cashed out. The goal of highly well tested claims differs sufficiently from highly probable ones that you can have your cake and eat it too: retaining both for different contexts. Claims may be “probable” (in whatever sense you choose) but terribly tested by these data. In saying we may view statistical inference as severe testing, I’m not saying statistical inference is always about formal statistical testing. The testing metaphor grows out of the idea that before we have evidence for a claim, it must have passed an analysis that could have found it flawed. The probability that a method commits an erroneous interpretation of data is an error probability. Statistical methods based on error probabilities I call error statistics. The value of error probabilities, I argue, is not merely to control error in the long-run, but because of what they teach us about the source of the data in front of us. The concept of severe testing is sufficiently general to apply to any of the methods now in use, whether for exploration, estimation, or prediction.

Getting Beyond the Statistics Wars:

Thomas Kuhn’s remark that only in the face of crisis “do scientists behave like philosophers” (1970), holds some truth in the current statistical crisis in science. Leaders of today’s programs to restore scientific integrity have their own preconceptions about the nature of evidence and inference, and about “what we really want” in learning from data. Philosophy of science can also alleviate such conceptual discomforts. Fortunately, you needn’t accept the severe testing view in order to employ it as a tool for bringing into focus the crux of all these issues. It’s a tool for excavation, and for keeping us afloat in the marshes and quicksand that often mark today’s controversies. Nevertheless, important consequences will follow once this tool is used. First there will be a reformulation of existing tools (tests, confidence intervals and others) so as to avoid misinterpretations and abuses. The debates on statistical inference generally concern inference after a statistical model and data statements are in place, when in fact the most interesting work involves the local inferences needed to get to that point. A primary asset of error statistical methods is their contributions to designing, collecting, modeling, and learning from data. The severe testing view provides the much-needed link between a test’s error probabilities and what’s required for a warranted inference in the case at hand. Second, instead of rehearsing the same criticisms over and over again, challengers on all sides should now begin by grappling with the arguments we trace within. Kneejerk assumptions about the superiority of one or another method will not do. Although we’ll be excavating the actual history, it’s the methods themselves that matter; they’re too important to be limited by what someone 50, 60 or 90 years ago thought, or to what today’s discussants think they thought……

…………….

Join me, then on a series of 6 excursions and 16 tours, during which we will visit 3 leading museums of statistical science and philosophy of science, and engage with a host of tribes marked by family quarrels, peace treaties, and shifting alliances.[i]

[i] A bit of travel trivia for those who not only read to the end of prefaces, but check its footnotes………..[Oops, you’ll have to check the book itself when it’s out. In the mean time, inform me of typos/errors,other queries: error@vt.edu or the comments]

*Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, July 2018)

Categories: Announcement, SIST | Leave a comment

Getting Up to Speed on Principles of Statistics

.

“If a statistical analysis is clearly shown to be effective … it gains nothing from being … principled,” according to Terry Speed in an interesting IMS article (2016) that Harry Crane tweeted about a couple of days ago [i]. Crane objects that you need principles to determine if it is effective, else it “seems that a method is effective (a la Speed) if it gives the answer you want/expect.” I suspected that what Speed was objecting to was an appeal to “principles of inference” of the type to which Neyman objected in my recent post. This turns out to be correct. Here are some excerpts from Speed’s article (emphasis is mine):

Not long ago I helped some people with the statistical analysis of their data. The approach I suggested worked reasonably well, somewhat better than the previously published approaches for dealing with that kind of data, and they were happy. But when they came to write it up, they wanted to describe our approach as principled, and I strongly objected. Why? Who doesn’t like to be seen as principled? I have several reasons for disliking this adjective, and not wanting to see it used to describe anything I do. My principal reason for feeling this way is that such statements carry the implication, typically implicit, but at times explicit, that any other approach to the task is unprincipled. I’ve had to grin and bear this slur on my integrity many times in the writings of Bayesians.

Not atypical is the following statement about probability theory in an article about Bayesian inference: that it “furnishes us with a principled and consistent framework for meaningful reasoning in the presence of uncertainty.” Not a Bayesian? Then your reasoning is likely to be unprincipled, inconsistent, and meaningless. Calling something one does “principled” makes me think of Hamlet’s mother Queen Gertrude’s comment, “The lady doth protest too much, methinks.” If a statistical analysis is clearly shown to be effective at answering the questions of interest, it gains nothing from being described as principled. And if it hasn’t been shown so, fine words are a poor substitute. In the write-up mentioned at the beginning of this piece, we compared different analyses, and so had no need to tell the reader that we were principled: our approach was shown to be effective. Of course there is the possibility that multiple approaches to the same problem are principled, and they just adhere to different principles. Indeed, one of the ironies in the fact that my collaborators want to describe our approach to the analysis of their data as principled, is that a Bayesian approach is one of the alternatives. And as we have seen, all Bayesian analyses are principled.

… I have another reason for feeling ambivalent about principles in statistics. Many years ago, people spent time debating philosophies of statistical inference; some still do. I got absorbed in it for a period in the 1970s. At that time, there was much discussion about the Sufficiency Principle and the Conditionality Principle (each coming in strong and weak versions), the Ancillarity Principle, not to mention the Weak Repeated Sampling Principle, the Invariance (Equivariance) Principle, and others, and the famous Likelihood Principle. There were examples, counter-examples, and theorems of the form “Principle A & Principle B implies Principle C”.

You might think that with so many principles of statistical inference, we’d always know what to do with the next set of data that walks in the door. But this is not so. The principles just mentioned all take as their starting point a statistical model, sometimes from a very restricted class of parametric models. Principles telling you how to get from the data and questions of interest to that point were prominent by their absence, and still are. Probability theory is little help to Bayesians when it comes to deciding on an initial probability model. Perhaps this is reasonable, as there is a difference between the philosophy of statistical inference and the art of making statistical inferences in a given context. We have lots of principles to guide us for dealing with the easy part of our analysis, but none for the hard part. While the younger me spent time on all those Principles over 40 years ago, I wouldn’t teach them today, or even recommend the discussions as worth reading.

Showing effectiveness, of course, relies on “principled”–sound and warranted–reasons for taking the analysis to satisfy given goals (e.g., that flaws and fallacies have been probed and avoided/reported, and that the problems of interest are solved to a reasonable extent). That’s what is involved in showing it to be effective as compared to different analyses. But I agree that 70’s style principles get the order wrong. Not only do they start with an assumed model, even within a statistical model they presuppose a priori standpoints about what “we really want”. Instead, any principles should be examined according to how well they promote our ability carry out and check analyses on grounds of effectiveness.

Nowadays, even most Bayesians recommend (default, non-subjective) methods that violate principles thought at one time to be sacrosanct. However, some of these old-style principles still simmer below the surface in many of today’s statistical debates and proposed reforms, so there is still a need to shine a spotlight on their unexamined assumptions. We should explain why some principles are even at odds with effective methods to solve given problems; and show the circularity of one of the most famous “theorems” of all.[ii]  It’s too bad that a certain old-fashioned conception of the philosophy of statistical inference gives “principles” a bad name [iii]. We needn’t be allergic to developing principles in the sense of effective strategies for successful inquiries. Speed seems to agree. Here’s the end of his article:

But I do think there is a demand for the principles of what I’ll call initial data analysis, an encapsulation of the knowledge and experience of statisticians in dealing with the early part of an analysis, before we fix on a model or class of models. I am often asked by non-statisticians engaged in data science how they can learn applied statistics, and I don’t have a long list of places to send them. Whether what they need can be expressed in principles is not clear, but I think it’s worth trying. My first step in this direction was taken 16 years ago, when I posted two “Hints & Prejudices” on our microarray analysis web site, namely “Always log” and “Avoid assuming normality”. I am not against principles, but I like to remember Oscar Wilde’s aphorism: “Lean on principles, one day they’ll end up giving way.” (p. 17)

[i]

http://bulletin.imstat.org/wp-content/uploads/Bulletin45_6.pdf

[ii]Search this blog for quite a lot on the Likelihood Principle–it was one of the main topics in the first few years of this blog. Here’s a link to my article in Statistical Science.

[iii] I try to dispel this image of philosophy of statistics in my forthcoming Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP).

 

Categories: Likelihood Principle, Philosophy of Statistics | 4 Comments

3 YEARS AGO (May 2015): Monthly Memory Lane

3 years ago...               3 years ago…

MONTHLY MEMORY LANE: 3 years ago: May 2015. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1]. Posts that are part of a “unit” or a group count as one, as in the case of 5/16, 5/19 and 5/24.

May 2015

  • 05/04 Spurious Correlations: Death by getting tangled in bedsheets and the consumption of cheese! (Aris Spanos)
  • 05/08 What really defies common sense (Msc kvetch on rejected posts)
  • 05/09 Stephen Senn: Double Jeopardy?: Judge Jeffreys Upholds the Law (sequel to the pathetic P-value)
  • 05/16 “Error statistical modeling and inference: Where methodology meets ontology” A. Spanos and D. Mayo
  • 05/19 Workshop on Replication in the Sciences: Society for Philosophy and Psychology: (2nd part of double header)
  • 05/24 From our “Philosophy of Statistics” session: APS 2015 convention
  • 05/27 “Intentions” is the new code word for “error probabilities”: Allan Birnbaum’s Birthday
  • 05/30 3 YEARS AGO (MAY 2012): Saturday Night Memory Lane

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

I regret being away from blogging as of late (yes, the last bit of proofing on the book): I shall return soon! Send me stuff to post of yours or items of interest in the mean time.

 

Categories: 3-year memory lane | 1 Comment

Neyman vs the ‘Inferential’ Probabilists continued (a)

.

Today is Jerzy Neyman’s Birthday (April 16, 1894 – August 5, 1981).  I am posting a brief excerpt and a link to a paper of his that I hadn’t posted before: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”.  “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). [iii] I’ll explain in later stages of this post & in comments…(so please check back); I don’t want to miss the start of the birthday party in honor of Neyman, and it’s already 8:30 p.m in Berkeley!

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program. HAPPY BIRTHDAY NEYMAN!

 

 

Installment (a)4/17. What doesn’t Neyman like about Birnbaum’s advocacy of a Principle of Sufficiency S (p. 25)? He doesn’t like that it is advanced as a normative principle (e.g., about when evidence is or ought to be deemed equivalent) rather than a criterion that does something for you, such as control errors. (Presumably it is relevant to a type of context, say parametric inference within a model.) S is put forward as a kind of principle of rationality, rather than one with a rationale in solving some statistical problem

“The principle of sufficiency (S): If E is specified experiment, with outcomes x; if t = t (x) is any sufficient statistic; and if E’ is the experiment, derived from E, in which any outcome x of E is represented only by the corresponding value t = t (x) of the sufficient statistic; then for each x, Ev (E, x) = Ev (E’, t) where t = t (x)… (S) may be described informally as asserting the ‘irrelevance of observations independent of a sufficient statistic’.”

Ev(E, x) is a metalogical symbol referring to the evidence from experiment E with result x. The very idea that there is such a thing as an evidence function is never explained, but to Birnbaum “inferential theory” required such things. (At least that’s how he started out.) The view is very philosophical and it inherits much from logical positivism and logics of induction.The principle S, and also other principles of Birnbaum, have a normative character: Birnbaum considers them “compellingly appropriate”.

“The principles of Birnbaum appear as a kind of substitutes for known theorems” Neyman says. For example, various authors proved theorems to the general effect that the use of sufficient statistics will minimize the frequency of errors. But if you just start with the rationale (minimizing the frequency of errors, say) you wouldn’t need these”principles” from on high as it were. That’s what Neyman seems to be saying in his criticism of them in this paper. Do you agree? He has the same gripe concerning Cornfield’s conception of a default-type Bayesian account akin to Jeffreys. Why?

[i] I thank @omaclaran for reminding me of this paper on twitter recently.

[ii] Or so I argue in my forthcoming, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, 2018, CUP. (Expected this summer.)

[iii] Do you think Neyman is using “breakthrough” here in reference to Savage’s description of Birnbaum’s “proof” of the (strong) Likelihood Principle? Or is it the other way round? Or neither? Please weigh in.

REFERENCES

Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘, Revue De l’Institut International De Statistique / Review of the International Statistical Institute, 30(1), 11-27.

Categories: Bayesian/frequentist, Error Statistics, Neyman, Statistics | Leave a comment

3 YEARS AGO (APRIL 2015): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: April 2015. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 3 others of general relevance to philosophy of statistics (in months where I’ve blogged a lot)[2].  Posts that are part of a “unit” or a group count as one.

April 2015

  • 04/01 Are scientists really ready for ‘retraction offsets’ to advance ‘aggregate reproducibility’? (let alone ‘precautionary withdrawals’)
  • 04/04 Joan Clarke, Turing, I.J. Good, and “that after-dinner comedy hour…”
  • 04/08 Heads I win, tails you lose? Meehl and many Popperians get this wrong (about severe tests)!
  • 04/13 Philosophy of Statistics Comes to the Big Apple! APS 2015 Annual Convention — NYC
  • 04/16 A. Spanos: Jerzy Neyman and his Enduring Legacy
  • 04/18 Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen
  • 04/22 NEYMAN: “Note on an Article by Sir Ronald Fisher” (3 uses for power, Fisher’s fiducial argument)
  • 04/24 “Statistical Concepts in Their Relation to Reality” by E.S. Pearson
  • 04/27 3 YEARS AGO (APRIL 2012): MEMORY LANE
  • 04/30 96% Error in “Expert” Testimony Based on Probability of Hair Matches: It’s all Junk!

 

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 -a very convenient way to allow data-dependent choices (note why it’s legit in selecting blog posts, on severity grounds).

 

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Categories: 3-year memory lane | Leave a comment

New Warning: Proceed With Caution Until the “Alt Stat Approaches” are Evaluated

I predicted that the degree of agreement behind the ASA’s “6 principles” on p-values , partial as it was,was unlikely to be replicated when it came to most of the “other approaches” with which some would supplement or replace significance tests– notably Bayesian updating, Bayes factors, or likelihood ratios (confidence intervals are dual to hypotheses tests). [My commentary is here.] So now they may be advising a “hold off” or “go slow” approach until some consilience is achieved. Is that it? There’s word that the ASA will  hold meeting where the other approaches are put through their paces. I don’t know when. I was tweeted an article about the background chatter taking place behind the scenes; I wasn’t one of people interviewed for this. Here are some excerpts, I may add more later after it has had time to sink in. 

“Restoring Credibility in Statistical Science: Proceed with Caution Until a Balanced Critique Is In”

J. Hossiason Continue reading

Categories: Announcement | 2 Comments

February Palindrome Winner: Lucas Friesen

Winner of the February 2018 Palindrome Contest: (a dozen book choice)

.

Lucas Friesen: a graduate student in Measurement, Evaluation, and Research Methodology at the University of British Columbia

Palindrome:

Ares, send a mere vest set? Bagel-bag madness.

Able! Elbas! Send AM: “Gable-Gab test severe. Madness era.”

The requirement: A palindrome using “madness*” (+ Elba, of course). Statistical, philosophical, scientific themes are awarded more points.) *Sorry, the editor got ahead of herself in an earlier post, listing March’s word.
Book choice: This is horribly difficult, but I think I have to go with the allure of the unknown: Statistical Inference as Severe Testing: How to get beyond the statistics wars.

Continue reading

Categories: Palindrome | Leave a comment

J. Pearl: Challenging the Hegemony of Randomized Controlled Trials: Comments on Deaton and Cartwright

.

Judea Pearl

Judea Pearl* wrote to me to invite readers of Error Statistics Philosophy to comment on a recent post of his (from his Causal Analysis blog here) pertaining to a guest post by Stephen Senn (“Being a Statistician Means never Having to Say You Are Certain”.) He has added a special addendum for us.[i]

Challenging the Hegemony of Randomized Controlled Trials: Comments on Deaton and Cartwright

Judea Pearl

I was asked to comment on a recent article by Angus Deaton and Nancy Cartwright (D&C), which touches on the foundations of causal inference. The article is titled: “Understanding and misunderstanding randomized controlled trials,” and can be viewed here: https://goo.gl/x6s4Uy

My comments are a mixture of a welcome and a puzzle; I welcome D&C’s stand on the status of randomized trials, and I am puzzled by how they choose to articulate the alternatives. Continue reading

Categories: RCTs | 26 Comments

3 YEARS AGO (MARCH 2015): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: March 2015. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 3 others of general relevance to philosophy of statistics (in months where I’ve blogged a lot)[2].  Posts that are part of a “unit” or a group count as one.

March 2015

  • 03/01 “Probabilism as an Obstacle to Statistical Fraud-Busting”
  • 03/05 A puzzle about the latest test ban (or ‘don’t ask, don’t tell’)
  • 03/12 All She Wrote (so far): Error Statistics Philosophy: 3.5 years on
  • 03/16 Stephen Senn: The pathetic P-value (Guest Post)
  • 03/21 Objectivity in Statistics: “Arguments From Discretion and 3 Reactions”
  • 03/24 3 YEARS AGO (MARCH 2012): MEMORY LANE
  • 03/28 Your (very own) personalized genomic prediction varies depending on who else was around?

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 -a very convenient way to allow data-dependent choices (note why it’s legit in selecting blog posts, on severity grounds).

 

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Categories: 3-year memory lane | Leave a comment

Cover/Itinerary of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars

SNEAK PREVIEW: Here’s the cover of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars:

It should be out in July 2018. The “Itinerary”, generally known as the Table of Contents, is below. I forgot to mention that this is not the actual pagination, I don’t have the page proofs yet. These are the pages of the draft I submitted. It should be around 50 pages shorter in the actual page proofs, maybe 380 pages.

 

Itinerary

Continue reading

Categories: Announcement | 9 Comments

Deconstructing the Fisher-Neyman conflict wearing fiducial glasses (continued)

imgres-4

Fisher/ Neyman

This continues my previous post: “Can’t take the fiducial out of Fisher…” in recognition of Fisher’s birthday, February 17. I supply a few more intriguing articles you may find enlightening to read and/or reread on a Saturday night

Move up 20 years to the famous 1955/56 exchange between Fisher and Neyman. Fisher clearly connects Neyman’s adoption of a behavioristic-performance formulation to his denying the soundness of fiducial inference. When “Neyman denies the existence of inductive reasoning, he is merely expressing a verbal preference. For him ‘reasoning’ means what ‘deductive reasoning’ means to others.” (Fisher 1955, p. 74). Continue reading

Categories: fiducial probability, Fisher, Neyman, Statistics | 4 Comments

Can’t Take the Fiducial Out of Fisher (if you want to understand the N-P performance philosophy) [i]

imgres

R.A. Fisher: February 17, 1890 – July 29, 1962

Continuing with posts in recognition of R.A. Fisher’s birthday, I post one from a couple of years ago on a topic that had previously not been discussed on this blog: Fisher’s fiducial probability

[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).

The entire episode of fiducial probability is fraught with minefields. Many say it was Fisher’s biggest blunder; others suggest it still hasn’t been understood. The majority of discussions omit the side trip to the Fiducial Forest altogether, finding the surrounding brambles too thorny to penetrate. Besides, a fascinating narrative about the Fisher-Neyman-Pearson divide has managed to bloom and grow while steering clear of fiducial probability–never mind that it remained a centerpiece of Fisher’s statistical philosophy. I now think that this is a mistake. It was thought, following Lehman (1993) and others, that we could take the fiducial out of Fisher and still understand the core of the Neyman-Pearson vs Fisher (or Neyman vs Fisher) disagreements. We can’t. Quite aside from the intrinsic interest in correcting the “he said/he said” of these statisticians, the issue is intimately bound up with the current (flawed) consensus view of frequentist error statistics.

So what’s fiducial inference? I follow Cox (2006), adapting for the case of the lower limit: Continue reading

Categories: fiducial probability, Fisher, Statistics | Leave a comment

R.A. Fisher: “Statistical methods and Scientific Induction”

I continue a week of Fisherian posts in honor of his birthday (Feb 17). This is his contribution to the “Triad”–an exchange between  Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. They are each very short and bear rereading

17 February 1890 — 29 July 1962

“Statistical Methods and Scientific Induction”

by Sir Ronald Fisher (1955)

SUMMARY

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of  acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

  1. “Repeated sampling from the same population”,
  2. Errors of the “second kind”,
  3. “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

 

Note on an Article by Sir Ronald Fisher

by Jerzy Neyman (1956)

Neyman

Neyman

Summary

(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation.  (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible.  (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values.  The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight.  (4)  The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

 

E.S. Pearson

“Statistical Concepts in Their Relation to Reality”.

by E.S. Pearson (1955)

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”. There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!…

To continue reading, “Statistical Concepts in Their Relation to Reality” click HERE

Categories: E.S. Pearson, fiducial probability, Fisher, Neyman, phil/history of stat | 3 Comments

R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)

A SPANOS

.

In recognition of R.A. Fisher’s birthday on February 17….

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

Categories: Fisher, phil/history of stat, Spanos, Statistics | 3 Comments

Guest Blog: STEPHEN SENN: ‘Fisher’s alternative to the alternative’

“You May Believe You Are a Bayesian But You Are Probably Wrong”

.

As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog a guest post by Stephen Senn from 2012/2017.  The comments from 2017 lead to a troubling issue that I will bring up in the comments today.

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading

Categories: Fisher, S. Senn, Statistics | 1 Comment

Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I’ll post some Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!

Two New Properties of Mathematical Likelihood

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

  The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

Categories: Fisher, phil/history of stat, Statistics | Tags: , , , | 1 Comment

3 YEARS AGO (FEBRUARY 2015): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: February 2015 [1]. Here are some items to for your Saturday night reading and rereading. Three are in preparation for Fisher’s birthday next week (Feb 17). One is a Saturday night comedy where Jeffreys appears to substitute for Jay Leno. The 2/25 entry lets you go back 6 years where there’s more on Fisher, a bit of statistical theatre (of the absurd), Misspecification tests, and a guest post (by Schachtman) on that infamous Matrixx court case (wherein the Supreme Court is thought to have weighed in on statistical significance tests). The comments are often the most interesting parts of these old posts.

February 2015

  • 02/05 Stephen Senn: Is Pooling Fooling? (Guest Post)
  • 02/10 What’s wrong with taking (1 – β)/α, as a likelihood ratio comparing H0 and H1?
  • 02/13 Induction, Popper and Pseudoscience
  • 02/16 Continuing the discussion on truncation, Bayesian convergence and testing of priors
  • 02/16 R. A. Fisher: ‘Two New Properties of Mathematical Likelihood’: Just before breaking up (with N-P)
  • 02/17 R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)

  • 02/19 Stephen Senn: Fisher’s Alternative to the Alternative
  • 02/21 Sir Harold Jeffreys’ (tail area) one-liner: Saturday night comedy (b)
  • 02/25 3 YEARS AGO: (FEBRUARY 2012) MEMORY LANE

    .

  • 02/27 Big Data is the New Phrenology?

 

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

 

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Categories: 3-year memory lane | Leave a comment

S. Senn: Evidence Based or Person-centred? A Statistical debate (Guest Post)

.

Stephen Senn
Head of  Competence Center
for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Evidence Based or Person-centred? A statistical debate

It was hearing Stephen Mumford and Rani Lill Anjum (RLA) in January 2017 speaking at the Epistemology of Causal Inference in Pharmacology conference in Munich organised by Jürgen Landes, Barbara Osmani and Roland Poellinger, that inspired me to buy their book, Causation A Very Short Introduction[1]. Although I do not agree with all that is said in it and also could not pretend to understand all it says, I can recommend it highly as an interesting introduction to issues in causality, some of which will be familiar to statisticians but some not at all.

Since I have a long-standing interest in researching into ways of delivering personalised medicine, I was interested to see a reference on Twitter to a piece by RLA, Evidence based or person centered? An ontological debate, in which she claims that the choice between evidence based or person-centred medicine is ultimately ontological[2]. I don’t dispute that thinking about health care delivery in ontological terms might be interesting. However, I do dispute that there is any meaningful choice between evidence based medicine (EBM) and person centred healthcare (PCH). To suggest so is to commit a category mistake by suggesting that means are alternatives to ends.

In fact, EBM will be essential to delivering effective PCH, as I shall now explain. Continue reading

Categories: personalized medicine, RCTs, S. Senn | 7 Comments

3 YEARS AGO (JANUARY 2015): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: January 2015. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green 2-3 others of general relevance to philosophy of statistics (in months where I’ve blogged a lot)[2].  Posts that are part of a “unit” or a group count as one.

 

January 2015

  • 01/02 Blog Contents: Oct.- Dec. 2014
  • 01/03 No headache power (for Deirdre)
  • 01/04 Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)
  • 01/07 “When Bayesian Inference Shatters” Owhadi, Scovel, and Sullivan (reblog)
  • 01/08 On the Brittleness of Bayesian Inference–An Update: Owhadi and Scovel (guest post).
  • 01/12 “Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)
  • 01/16 Winners of the December 2014 Palindrome Contest: TWO!
  • 01/18 Power Analysis and Non-Replicability: If bad statistics is prevalent in your field, does it follow you can’t be guilty of scientific fraud?
  • 01/21 Some statistical dirty laundry.
  • 01/24 What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri
  • 01/26 Trial on Anil Potti’s (clinical) Trial Scandal Postponed Because Lawyers Get the Sniffles (updated)
  • 01/27 3 YEARS AGO: (JANUARY 2012) MEMORY LANE
  • 01/31 Saturday Night Brainstorming and Task Forces: (4th draft)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30,2016, March 30,2017 -a very convenient way to allow data-dependent choices (note why it’s legit in selecting blog posts, on severity grounds).

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Categories: 3-year memory lane | Leave a comment

S. Senn: Being a statistician means never having to say you are certain (Guest Post)

.

Stephen Senn
Head of  Competence Center
for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Being a statistician means never having to say you are certain

A recent discussion of randomised controlled trials[1] by Angus Deaton and Nancy Cartwright (D&C) contains much interesting analysis but also, in my opinion, does not escape rehashing some of the invalid criticisms of randomisation with which the literatures seems to be littered. The paper has two major sections. The latter, which deals with generalisation of results, or what is sometime called external validity, I like much more than the former which deals with internal validity. It is the former I propose to discuss.

Continue reading

Categories: Error Statistics, RCTs, Statistics | 26 Comments

Blog at WordPress.com.