Mementos for “It’s the Methods, Stupid!” Excursion 3 Tour II (3.4-3.6)

some snapshots from Excursion 3 tour II.

 

 

 

 

 

 

Excursion 3 Tour II: It’s The Methods, Stupid

Tour II disentangles a jungle of conceptual issues at the heart of today’s statistics wars. The first stop (3.4) unearths the basis for a number of howlers and chestnuts thought to be licensed by Fisherian or N-P tests.* In each exhibit, we study the basis for the joke.  Together, they show: the need for an adequate test statistic, the difference between implicationary (i assumptions) and actual assumptions, and the fact that tail areas serve to raise, and not lower, the bar for rejecting a null hypothesis. (Additional howlers occur in Excursion 3 Tour III)

recommended: medium to heavy shovel 

Stop (3.5) pulls back the curtain on the view that Fisher and N-P tests form an incompatible hybrid. Incompatibilist tribes retain caricatures of F & N-P tests, and rob each from notions they need (e.g., power and alternatives for F, P-values & post-data error probabilities for N-P). Those who allege that Fisherian P-values are not error probabilities often mean simply that Fisher wanted an evidential not a performance interpretation. This is a philosophical not a mathematical claim. N-P and Fisher tended to use P-values in both ways. It’s time to get beyond incompatibilism. Even if we couldn’t point to quotes and applications that break out of the strict “evidential versus behavioral” split, we should be the ones to interpret the methods for inference, and supply the statistical philosophy that directs their right use.” (p. 181)

strongly recommended: light to medium shovel, thick-skinned jacket

In (3.6) we slip into the jungle. Critics argue that P-values are for evidence, unlike error probabilities, but then aver P-values aren’t good measures of evidence either, since they disagree with probabilist measures: likelihood ratios, Bayes Factors or posteriors. A famous peace-treaty between Fisher, Jeffreys & Bayes promises a unification. A bit of magic ensues! The meaning of error probability changes into a type of Bayesian posterior probability. It’s then possible to say ordinary frequentist error probabilities (e.g., type I & II error probabilities) aren’t error probabilities. We get beyond this marshy swamp by introducing subscripts 1 and 2. Whatever you think of the two concepts, they are very different. This recognition suffices to get you out of quicksand.

required: easily removed shoes, stiff walking stick (review Souvenir M on day of departure)

*Several of these may be found in searching for “Saturday night comedy” on this blog. In SIST, however I trace out the basis for the jokes.

selected key terms and ideas 

Howlers and chestnuts of statistical tests
armchair science
Jeffreys tail area criticism
Limb sawing logic
Two machines with different positions
Weak conditionality principle (WCP)
Conditioning (see WCP)
Likelihood principle
Long run performance vs probabilism
Alphas and p’s
Fisher as behaviorist
Hypothetical long-runs
Freudian metaphor for significance tests
Pearson, on cases where there’s no repetition
Armour-piercing naval shell
Error probability1 and error probability 2
Incompatibilist philosophy (F and N-P must remain separate)
Test statistic requirements (p. 159)

Please send me your list of key terms in the comments; typos would also be appreciated

These are Tour Guide Mementos from Excursion 3 Tour II of Mayo (2018, CUP): Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars.

To see an excerpt from Excursion 3 Tour II (and “where you are” in the journey), see my last post.

For all excerpts and mementos (on this blog) from SIST (to Nov.30), see this post.

 

 

 

Categories: Statistical Inference as Severe Testing | Leave a comment

It’s the Methods, Stupid: Excerpt from Excursion 3 Tour II (Mayo 2018, CUP)

Tour II It’s the Methods, Stupid

There is perhaps in current literature a tendency to speak of the Neyman–Pearson contributions as some static system, rather than as part of the historical process of development of thought on statistical theory which is and will always go on. (Pearson 1962, 276)

This goes for Fisherian contributions as well. Unlike museums, we won’ t remain static. The lesson from Tour I of this Excursion is that Fisherian and Neyman– Pearsonian tests may be seen as offering clusters of methods appropriate for different contexts within the large taxonomy of statistical inquiries. There is an overarching pattern:

Just as with the use of measuring instruments, applied to the specific case, we employ the performance features to make inferences about aspects of the particular thing that is measured, aspects that the measuring tool is appropriately capable of revealing. (Mayo and Cox 2006, p. 84)

This information is used to ascertain what claims have, and have not, passed severely, post-data. Any such proposed inferential use of error probabilities gives considerable fodder for criticism from various tribes of Fisherians,Neyman– Pearsonians, and Bayesians. We can hear them now:

How can we reply? To begin, we need to uncover how the charges originate in traditional philosophies long associated with error statistical tools. That’ s the focus of Tour II.

Only then do we have a shot at decoupling traditional philosophies from those tools in order to use them appropriately today. This is especially so when the traditional foundations stand on such wobbly grounds, grounds largely rejected by founders of the tools. There is a philosophical disagreement between Fisher and Neyman, but it differs importantly from the ones that you’re presented with and which are widely accepted and repeated in scholarly and popular treatises on significance tests. Neo-Fisherians and N-P theorists, keeping to their tribes, forfeit notions that would improve their methods (e.g., for Fisherians: explicit alternatives, with corresponding notions of sensitivity, and distinguishing statistical and substantive hypotheses; for N-P theorists, making error probabilities relevant for inference in the case at hand).

The spadework on this tour will be almost entirely conceptual: we won’t be arguing for or against any one view. We begin in Section 3.4 by unearthing the basis for some classic counterintuitive inferences thought to be licensed by either Fisherian or N-P tests. That many are humorous doesn’t mean disentangling their puzzles is straightforward; a medium to heavy shovel is recommended. We can switch to a light to medium shovel in Section 3.5: excavations of the evidential versus behavioral divide between Fisher and N-P turn out to be mostly built on sand. As David Cox observes, Fisher is often more performance-oriented in practice, but not in theory, while the reverse is true for Neyman and Pearson. At times, Neyman exaggerates the behavioristic conception just to accentuate how much Fisher’s tests need reining in. Likewise, Fisher can be spotted running away from his earlier behavioristic positions just to derogate the new N-P movement, whose popularity threatened to eclipse the statistics program that was, after all, his baby. Taking the polemics of Fisher and Neyman at face value, many are unaware how much they are based on personality and professional disputes. Hearing the actual voices of Fisher, Neyman, and Pearson (F and N-P), you don’ t have to accept the gospel of “what the founders really thought.” Still, there’ s an entrenched history and philosophy of F and N-P: A thick-skinned jacket is recommended. On our third stop (Section 3.6) we witness a bit of magic. The very concept of an error probability gets redefined and, hey presto!, a reconciliation between Jeff reys, Fisher, and Neyman is forged. Wear easily removed shoes and take a stiff walking stick. The Unificationist tribes tend to live near underground springs and lakeshore bounds; in the heady magic, visitors have been known to accidentally fall into a pool of quicksand.

3.4 Some Howlers and Chestnuts of Statistical Tests

The well-known definition of a statistician as someone whose aim in life is to be wrong in exactly 5 per cent of everything they do misses its target. (Sir David Cox 2006a, p. 197)

Showing that a method’s stipulations could countenance absurd or counterintuitive results is a perfectly legitimate mode of criticism. I reserve the term “howler” for common criticisms based on logical fallacies or conceptual misunderstandings. Other cases are better seen as chestnuts – puzzles that the founders of statistical tests never cleared up explicitly. Whether you choose to see my “howler” as a “chestnut” is up to you. Under each exhibit is the purported basis for the joke……

TO KEEP READING, SEE Mayo (2018, CUP): Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars.

Where are you in the journey?  

 

 

Categories: Error Statistics, Statistical Inference as Severe Testing | 4 Comments

Memento & Quiz (on SEV): Excursion 3, Tour I

.

As you enjoy the weekend discussion & concert in the Captain’s Central Limit Library & Lounge, your Tour Guide has prepared a brief overview of Excursion 3 Tour I, and a short (semi-severe) quiz on severity, based on exhibit (i).*

 

We move from Popper through a gallery on “Data Analysis in the 1919 Eclipse tests of the General Theory of Relativity (GTR)” (3.1) which leads to the main gallery on the origin of statistical tests (3.2) by way of a look at where the main members of our statistical cast are in 1919: Fisher, Neyman and Pearson. From the GTR episode, we identify the key elements of a statistical test–the steps in E.S. Pearson’s opening description of tests in 3.2. The classical testing notions–type I and II errors, power, consistent tests–are shown to grow out of requiring probative tests. The typical (behavioristic) formulation of N-P tests came later. The severe tester breaks out of the behavioristic prison. A first look at the severity construal of N-P tests is in Exhibit (i). Viewing statistical inference as severe testing shows how to do all N-P tests do (and more) while a member of the Fisherian Tribe (3.3). We consider the frequentist principle of evidence FEV and the divergent interpretations that are called for by Cox’s taxonomy of null hypotheses. The last member of the taxonomy–substantively based null hypotheses–returns us to the opening episode of GTR. Continue reading

Categories: Severity, Statistical Inference as Severe Testing | 8 Comments

First Look at N-P Methods as Severe Tests: Water plant accident [Exhibit (i) from Excursion 3]

Excursion 3 Exhibit (i)

Exhibit (i) N-P Methods as Severe Tests: First Look (Water Plant Accident)

There’s been an accident at a water plant where our ship is docked, and the cooling system had to be repaired.  It is meant to ensure that the mean temperature of discharged water stays below the temperature that threatens the ecosystem, perhaps not much beyond 150 degrees Fahrenheit. There were 100 water measurements taken at randomly selected times and the sample mean x computed, each with a known standard deviation σ = 10.  When the cooling system is effective, each measurement is like observing X ~ N(150, 102). Because of this variability, we expect different 100-fold water samples to lead to different values of X, but we can deduce its distribution. If each X ~N(μ = 150, 102) then X is also Normal with μ = 150, but the standard deviation of X is only σ/√n = 10/√100 = 1. So X ~ N(μ = 150, 1). Continue reading

Categories: Error Statistics, Severity, Statistical Inference as Severe Testing | 44 Comments

Neyman-Pearson Tests: An Episode in Anglo-Polish Collaboration: Excerpt from Excursion 3 (3.2)

Neyman & Pearson

3.2 N-P Tests: An Episode in Anglo-Polish Collaboration*

We proceed by setting up a specific hypothesis to test, Hin Neyman’s and my terminology, the null hypothesis in R. A. Fisher’s . . . in choosing the test, we take into account alternatives to Hwhich we believe possible or at any rate consider it most important to be on the look out for . . .Three steps in constructing the test may be defined:

Step 1. We must first specify the set of results . . .

Step 2. We then divide this set by a system of ordered boundaries . . .such that as we pass across one boundary and proceed to the next, we come to a class of results which makes us more and more inclined, on the information available, to reject the hypothesis tested in favour of alternatives which differ from it by increasing amounts.

Step 3. We then, if possible, associate with each contour level the chance that, if H0 is true, a result will occur in random sampling lying beyond that level . . .

In our first papers [in 1928] we suggested that the likelihood ratio criterion, λ, was a very useful one . . . Thus Step 2 proceeded Step 3. In later papers [1933–1938] we started with a fixed value for the chance, ε, of Step 3 . . . However, although the mathematical procedure may put Step 3 before 2, we cannot put this into operation before we have decided, under Step 2, on the guiding principle to be used in choosing the contour system. That is why I have numbered the steps in this order. (Egon Pearson 1947, p. 173)

In addition to Pearson’s 1947 paper, the museum follows his account in “The Neyman–Pearson Story: 1926–34” (Pearson 1970). The subtitle is “Historical Sidelights on an Episode in Anglo-Polish Collaboration”!

We meet Jerzy Neyman at the point he’s sent to have his work sized up by Karl Pearson at University College in 1925/26. Neyman wasn’t that impressed: Continue reading

Categories: E.S. Pearson, Neyman, Statistical Inference as Severe Testing, statistical tests, Statistics | 1 Comment

Where Are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3

Excursion 3 Statistical Tests and Scientific Inference

Tour I Ingenious and Severe Tests

[T]he impressive thing about [the 1919 tests of Einstein’s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted.The theory is incompatible with certain possible results of observation – in fact with results which everybody before Einstein would have expected. This is quite different from the situation I have previously described, [where] . . . it was practically impossible to describe any human behavior that might not be claimed to be a verification of these [psychological] theories. (Popper 1962, p. 36)

Mayo 2018, CUP

The 1919 eclipse experiments opened Popper’ s eyes to what made Einstein’ s theory so different from other revolutionary theories of the day: Einstein was prepared to subject his theory to risky tests.[1] Einstein was eager to galvanize scientists to test his theory of gravity, knowing the solar eclipse was coming up on May 29, 1919. Leading the expedition to test GTR was a perfect opportunity for Sir Arthur Eddington, a devout follower of Einstein as well as a devout Quaker and conscientious objector. Fearing “ a scandal if one of its young stars went to jail as a conscientious objector,” officials at Cambridge argued that Eddington couldn’ t very well be allowed to go off to war when the country needed him to prepare the journey to test Einstein’ s predicted light deflection (Kaku 2005, p. 113). Continue reading

Categories: SIST, Statistical Inference as Severe Testing | 1 Comment

Stephen Senn: On the level. Why block structure matters and its relevance to Lord’s paradox (Guest Post)

.

Stephen Senn
Consultant Statistician
Edinburgh

Introduction

In a previous post I considered Lord’s paradox from the perspective of the ‘Rothamsted School’ and its approach to the analysis of experiments. I now illustrate this in some detail giving an example.

What I shall do

I have simulated data from an experiment in which two diets have been compared in 20 student halls of residence, each diet having been applied to 10 halls. I shall assume that the halls have been randomly allocated the diet and that in each hall 10 students have been randomly chosen to have their weights recorded at the beginning of the academic year and again at the end. Continue reading

Categories: Lord's paradox, Statistical Inference as Severe Testing, Stephen Senn | 34 Comments

SIST* Posts: Excerpts & Mementos (to Nov 30, 2018)

Surveying SIST Posts so far

SIST* BLOG POSTS (up to Nov 30, 2018)

Excerpts

  • 05/19: The Meaning of My Title: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars
  • 09/08: Excursion 1 Tour I: Beyond Probabilism and Performance: Severity Requirement (1.1)
  • 09/11: Excursion 1 Tour I (2nd stop): Probabilism, Performance, and Probativeness (1.2)
  • 09/15: Excursion 1 Tour I (3rd stop): The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon (1.3)
  • 09/29: Excursion 2: Taboos of Induction and Falsification: Tour I (first stop)
  • 10/10: Excursion 2 Tour II (3rd stop): Falsification, Pseudoscience, Induction (2.3)
  • 11/30: Where are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3

Mementos, Keepsakes and Souvenirs

  • 10/29: Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)
  • 11/8:   Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II)
  • 10/5:  “It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based” (Keepsake by Fisher, 2.1)
  • 11/14: Tour Guide Mementos and Quiz 2.1 (Excursion 2 Tour I Induction and Confirmation)
  • 11/17: Mementos for Excursion 2 Tour II Falsification, Pseudoscience, Induction

*Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (Mayo, CUP 2018)

Categories: SIST, Statistical Inference as Severe Testing | 3 Comments

Mementos for Excursion 2 Tour II: Falsification, Pseudoscience, Induction (2.3-2.7)

.

Excursion 2 Tour II: Falsification, Pseudoscience, Induction*

Outline of Tour. Tour II visits Popper, falsification, corroboration, Duhem’s problem (what to blame in the case of anomalies) and the demarcation of science and pseudoscience (2.3). While Popper comes up short on each, the reader is led to improve on Popper’s notions (live exhibit (v)). Central ingredients for our journey are put in place via souvenirs: a framework of models and problems, and a post-Popperian language to speak about inductive inference. Defining a severe test, for Popperians, is linked to when data supply novel evidence for a hypothesis: family feuds about defining novelty are discussed (2.4). We move into Fisherian significance tests and the crucial requirements he set (often overlooked): isolated significant results are poor evidence of a genuine effect, and statistical significance doesn’t warrant substantive, e.g., causal inference (2.5). Applying our new demarcation criterion to a plausible effect (males are more likely than females to feel threatened by their partner’s success), we argue that a real revolution in psychology will need to be more revolutionary than at present. Whole inquiries might have to be falsified, their measurement schemes questioned (2.6). The Tour’s pieces are synthesized in (2.7), where a guest lecturer explains how to solve the problem of induction now, having redefined induction as severe testing.

Mementos from 2.3 Continue reading

Categories: Popper, Statistical Inference as Severe Testing, Statistics | 5 Comments

Tour Guide Mementos and QUIZ 2.1 (Excursion 2 Tour I: Induction and Confirmation)

.

Excursion 2 Tour I: Induction and Confirmation (Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars)

Tour Blurb. The roots of rival statistical accounts go back to the logical Problem of Induction. (2.1) The logical problem of induction is a matter of finding an argument to justify a type of argument (enumerative induction), so it is important to be clear on arguments, their soundness versus their validity. These are key concepts of fundamental importance to our journey. Given that any attempt to solve the logical problem of induction leads to circularity, philosophers turned instead to building logics that seemed to capture our intuitions about induction. This led to confirmation theory and some projects in today’s formal epistemology. There’s an analogy between contrasting views in philosophy and statistics: Carnapian confirmation is to Bayesian statistics, as Popperian falsification is to frequentist error statistics. Logics of confirmation take the form of probabilisms, either in the form of raising the probability of a hypothesis, or arriving at a posterior probability. (2.2) The contrast between these types of probabilisms, and the problems each is found to have in confirmation theory are directly relevant to the types of probabilisms in statistics. Notably, Harold Jeffreys’ non-subjective Bayesianism, and current spin-offs, share features with Carnapian inductive logics. We examine the problem of irrelevant conjunctions: that if x confirms H, it confirms (H & J) for any J. This also leads to what’s called the tacking paradox.

Quiz on 2.1 Soundness vs Validity in Deductive Logic. Let ~C be the denial of claim C. For each of the following argument, indicate whether it is valid and sound, valid but unsound, invalid. Continue reading

Categories: induction, SIST, Statistical Inference as Severe Testing, Statistics | 10 Comments

Stephen Senn: Rothamsted Statistics meets Lord’s Paradox (Guest Post)

.

Stephen Senn
Consultant Statistician
Edinburgh

The Rothamsted School

I never worked at Rothamsted but during the eight years I was at University College London (1995-2003) I frequently shared a train journey to London from Harpenden (the village in which Rothamsted is situated) with John Nelder, as a result of which we became friends and I acquired an interest in the software package Genstat®.

That in turn got me interested in John Nelder’s approach to analysis of variance, which is a powerful formalisation of ideas present in the work of others associated with Rothamsted. Nelder’s important predecessors in this respect include, at least, RA Fisher (of course) and Frank Yates and others such as David Finney and Frank Anscombe. John died in 2010 and I regard Rosemary Bailey, who has done deep and powerful work on randomisation and the representation of experiments through Hasse diagrams, as being the greatest living proponent of the Rothamsted School. Another key figure is Roger Payne who turned many of John’s ideas into code in Genstat®. Continue reading

Categories: Error Statistics | 11 Comments

Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II)

.

I will continue to post mementos and, at times, short excerpts following the pace of one “Tour” a week, in sync with some book clubs reading Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST or Statinfast 2018, CUP), e.g., Lakens. This puts us at Excursion 2 Tour I, but first, here’s a quick Souvenir (Souvenir C) from Excursion 1 Tour II:

Souvenir C: A Severe Tester’s Translation Guide

Just as in ordinary museum shops, our souvenir literature often probes treasures that you didn’t get to visit at all. Here’s an example of that, and you’ll need it going forward. There’s a confusion about what’s being done when the significance tester considers the set of all of the outcomes leading to a d(x) greater than or equal to 1.96, i.e., {x: d(x) ≥ 1.96}, or just d(x) ≥ 1.96. This is generally viewed as throwing away the particular x, and lumping all these outcomes together. What’s really happening, according to the severe tester, is quite different. What’s actually being signified is that we are interested in the method, not just the particular outcome. Those who embrace the LP make it very plain that data-dependent selections and stopping rules drop out. To get them to drop in, we signal an interest in what the test procedure would have yielded. This is a counterfactual and is altogether essential in expressing the properties of the method, in particular, the probability it would have yielded some nominally significant outcome or other. Continue reading

Categories: Statistical Inference as Severe Testing | 7 Comments

The Replication Crises and its Constructive Role in the Philosophy of Statistics-PSA2018

Below are my slides from a session on replication at the recent Philosophy of Science Association meetings in Seattle.

 

Categories: Error Statistics | Leave a comment

Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)

Stat Museum

Excursion 1 Tour II: Error Probing Tools vs. Logics of Evidence 

Blurb. Core battles revolve around the relevance of a method’s error probabilities. What’s distinctive about the severe testing account is that it uses error probabilities evidentially: to assess how severely a claim has passed a test. Error control is necessary but not sufficient for severity. Logics of induction focus on the relationships between given data and hypotheses–so outcomes other than the one observed drop out. This is captured in the Likelihood Principle (LP). Tour II takes us to the crux of central wars in relation to the Law of Likelihood (LL) and Bayesian probabilism. (1.4) Hypotheses deliberately designed to accord with the data can result in minimal severity. The likelihoodist wishes to oust them via degrees of belief captured in prior probabilities. To the severe tester, such gambits directly alter the evidence by leading to inseverity. (1.5) Stopping rules: If a tester tries and tries again until significance is reached–optional stopping–significance will be attained erroneously with high probability. According to the LP, the stopping rule doesn’t alter evidence. The irrelevance of optional stopping is an asset for holders of the LP, it’s the opposite for a severe tester. The warring sides talk past each other. Continue reading

Categories: SIST, Statistical Inference as Severe Testing | 1 Comment

A small amendment to Nuzzo’s tips for communicating p-values

.

I’ve been asked if I agree with Regina Nuzzo’s recent note on p-values [i]. I don’t want to be nit-picky, but one very small addition to Nuzzo’s helpful tips for communicating statistical significance can make it a great deal more helpful. Here’s my friendly amendment. She writes: Continue reading

Categories: P-values, science communication | 2 Comments

severe testing or severe sabotage? Christian Robert and the book slasher.

severe testing or severe sabotage? [not a book review]

 

I came across this anomaly on Christian Robert’s blog

Last week, I received this new book of Deborah Mayo, which I was looking forward reading and annotating!, but thrice alas, the book had been sabotaged: except for the preface and acknowledgements, the entire book is printed upside down [a minor issue since the entire book is concerned] and with some part of the text cut on each side [a few letters each time but enough to make reading a chore!]. I am thus waiting for a tested copy of the book to start reading it in earnest!

How bizarre, my book has been slashed with a knife, cruelly stabbing the page,letting words bleed out helter skelter. Some part of the text cut on each side? It wasn’t words with “Bayesian” in them was it? The only anomalous volume I’ve seen has a slightly crooked cover. Do you think it is the Book Slasher out for Halloween, or something more sinister? It’s a bit like serving the Michelin restaurant reviewer by dropping his meal on the floor, or accidentally causing a knife wound. I hope they remedy this quickly. (Talk about Neyman and quality control).

Readers: Feel free to use the comments to share you particular tale of woe in acquiring the book.

Categories: Statistical Inference as Severe Testing | 4 Comments

Tour Guide Mementos (Excursion 1, Tour I of How to Get Beyond the Statistics Wars)

.

Tour guides in your travels jot down Mementos and Keepsakes from each Tour[i] of my new book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP 2018). Their scribblings, which may at times include details, at other times just a word or two, may be modified through the Tour, and in response to questions from travelers (so please check back). Since these are just mementos, they should not be seen as replacements for the more careful notions given in the journey (i.e., book) itself. Still, you’re apt to flesh out your notes in greater detail, so please share yours (along with errors you’re bound to spot), and we’ll create Meta-Mementos. Continue reading

Categories: Error Statistics, Statistical Inference as Severe Testing | 8 Comments

Philosophy of Statistics & the Replication Crisis in Science: A philosophical intro to my book (slides)

a road through the jungle

In my talk yesterday at the Philosophy Department at Virginia Tech, I introduced my new book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (Cambridge 2018). I began with my preface (explaining the meaning of my title), and turned to the Statistics Wars, largely from Excursion 1 of the book. After the sum-up at the end, I snuck in an example from the replication crisis in psychology. Here are the slides.

 

Categories: Error Statistics | Leave a comment

Excursion 2 Tour II (3rd stop): Falsification, Pseudoscience, Induction (2.3)

StatSci/PhilSci Museum

Where you are in the Journey*  We’ll move from the philosophical ground floor to connecting themes from other levels, from Popperian falsification to significance tests, and from Popper’s demarcation to current-day problems of pseudoscience and irreplication. An excerpt from our Museum Guide gives a broad-brush sketch of the first few sections of Tour II:

Karl Popper had a brilliant way to “solve” the problem of induction: Hume was right that enumerative induction is unjustified, but science is a matter of deductive falsification. Science was to be demarcated from pseudoscience according to whether its theories were testable and falsifiable. A hypothesis is deemed severely tested if it survives a stringent attempt to falsify it. Popper’s critics denied he could sustain this and still be a deductivist …

Popperian falsification is often seen as akin to Fisher’s view that “every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis” (1935a, p. 16). Though scientists often appeal to Popper, some critics of significance tests argue that they are used in decidedly non-Popperian ways. Tour II explores this controversy.

While Popper didn’t make good on his most winning slogans, he gives us many seminal launching-off points for improved accounts of falsification, corroboration, science versus pseudoscience, and the role of novel evidence and predesignation. These will let you revisit some thorny issues in today’s statistical crisis in science. Continue reading

Categories: Statistical Inference as Severe Testing | 11 Comments

“It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based”

.

My new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars,” you might have discovered, includes Souvenirs throughout (A-Z). But there are some highlights within sections that might be missed in the excerpts I’m posting. One such “keepsake” is a quote from Fisher at the very end of Section 2.1

These are some of the first clues we’ll be collecting on a wide difference between statistical inference as a deductive logic of probability, and an inductive testing account sought by the error statistician. When it comes to inductive learning, we want our inferences to go beyond the data: we want lift-off. To my knowledge, Fisher is the only other writer on statistical inference, aside from Peirce, to emphasize this distinction.

In deductive reasoning all knowledge obtainable is already latent in the postulates. Rigour is needed to prevent the successive inferences growing less and less accurate as we proceed. The conclusions are never more accurate than the data. In inductive reasoning we are performing part of the process by which new knowledge is created. The conclusions normally grow more and more accurate as more data are included. It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based. (Fisher 1935b, p. 54)

How do you understand this remark of Fisher’s? (Please share your thoughts in the comments.) My interpretation, and its relation to the “lift-off” needed to warrant inductive inferences, is discussed in an earlier section, 1.2, posted here.   Here’s part of that. 

Continue reading

Categories: induction, keepsakes from Stat Wars, Statistical Inference as Severe Testing | 7 Comments

Blog at WordPress.com.