Error Statistics

David R. Cox Foundations of Statistics Award

Link to announcement on ASA website.

First Winner

Nancy Reid

.

Nancy Reid
University of Toronto

For contributions to the foundations of statistics that significantly advanced the frontiers of statistics and for insight that transformed understanding of parametric statistical inference, Nancy Reid is the inaugural recipient of the David R. Cox Foundations of Statistics Award, presented by the American Statistical Association (ASA). Reid will formally receive the award and deliver a lecture at the Joint Statistical Meetings in Toronto in August.

Reid, University Professor of Statistical Sciences at the University of Toronto, co-authored with David Cox an influential 1987 J. Roy. Statist. Soc. B discussion paper entitled “Parameter orthogonality and approximate conditional inference.” With this paper, and subsequent work with Cox and others, Reid has made major contributions to higher-order inference and various aspects of conditioning.

In addition to her work in foundational areas of statistics, Reid has successfully pursued numerous other lines of research, contributing to experimental design, nonparametric statistics, robust statistics, and comparisons and contradictions between Bayesian and frequentist inference.

The David R. Cox Foundations of Statistics Award was created in 2022 through an endowment created by Deborah G. Mayo, Professor Emerita of Philosophy at Virginia Tech. The ASA presents the award in odd-numbered years. The recipient receives a $2000 honorarium and is invited to give a lecture at the Joint Statistical Meetings.See announcement on ASA website.

Anyone interested in supporting the ASA’s effort to increase the size of the award through donations that may be matched by the donor and friends of David Cox is encouraged to contact Ron Wasserstein, Executive Director, ron@amstat.org.

About the Award

This award honors the contributions of David R. Cox to the foundations of statistical inference, experimental design, and data analysis. It was established in 2022 to promote research and teaching that illuminates conceptual, theoretical, philosophical, and historical perspectives on statistical science and to advance understanding of comparative approaches to the interpretation and communication of statistical data.

The honoree will receive a $2,000 honorarium and up to $1,000 toward travel expenses to present a lecture at JSM.

Selection Criteria

The award will be for a paper, monograph, book, or cumulative research. Anyone who has made noteworthy contributions to statistics in the spirit of Cox’s contributions as outlined above may be nominated.

Award Recipient Responsibilities

The award recipient is responsible for providing a current photograph and general personal information the year the award is presented. The American Statistical Association uses this information to publicize the award and prepare the check and certificate.

Nominations

Nominations are due by December 1 and require the following:

  • Nomination letter
  • Candidate’s CV
  • Two letters of support, not to exceed two pages each

Questions

Please contact the committee chair.

Note: The David Cox Foundations of Statistics Award will be given biennially to start, but will be given annually when funding is sufficient. Interested in contributing to this award? Contact ASA Director of Development Amanda Malloy.

Categories: Error Statistics | Leave a comment

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Below are the videos and slides from the 7 talks from Session 3 and Session 4 of our workshop The Statistics Wars and Their Casualties held on December 1 & 8, 2022. Session 3 speakers were: Daniele Fanelli (London School of Economics and Political Science), Stephan Guttinger (University of Exeter), and David Hand (Imperial College London).  Session 4 speakers were: Jon Williamson (University of Kent),  Margherita Harris  (London School of Economics and Political Science), Aris Spanos (Virginia Tech), and Uri Simonsohn (Esade Ramon Llull University). Abstracts can be found here. In addition to the talks, you’ll find (1) a Recap of recaps at the beginning of Session 3 that provides a summary of Sessions 1 & 2, and (2) Mayo’s (5 minute) introduction to the final discussion: “Where do we go from here (Part ii)”at the end of Session 4.

The videos & slides from Sessions 1 & 2 can be found on this post.

Readers are welcome to use the comments section on the PhilStatWars.com workshop blog post here to make constructive comments or to ask questions of the speakers. If you’re asking a question, indicate to which speaker(s) it is directed. We will leave it to speakers to respond. Thank you! Continue reading

Categories: Error Statistics | Leave a comment

Where should stat activists go from here? (part (i))

.

From what standpoint should we approach the statistics wars? That’s the question from which I launched my presentation at the Statistics Wars and Their Casualties workshop (phil-stat-wars.com). In my view, it should be, not from the standpoint of technical disputes, but from the non-technical standpoint of the skeptical consumer of statistics (see my slides here). What should we do now as regards the controversies and conundrums growing out of the statistics wars? We should not leave off the discussions of our workshop without at least sketching a future program for answering this question. We still have 2 more sessions, December 1 and 8, but I want to prepare us for the final discussions which should look beyond a single workshop. (The slides and videos from the presenters in Sessions 1 and 2 can be found here.)

I will consider three, interrelated, responsibilities and tasks that we can undertake as statistical activist citizens. In so doing I will refer to presentations from the workshop, limiting myself to session #1. (I will add more examples in part (ii) of this post.) Continue reading

Categories: Error Statistics, significance tests, stat wars and their casualties | Leave a comment

My Slides from the workshop: The statistics wars and their casualties

.

I will be writing some reflections on our two workshop sessions on this blog soon, but for now, here are just the slides I used on Thursday, 22 September. If you wish to ask a question of any of the speakers, use the blogpost at phil-stat-wars.com. The slides from the other speakers will also be up there on Monday.

Deborah G. Mayo’s. Slides from the workshop: The Statistics Wars and Their Casualties, Session 1, on September 22, 2022.

Categories: Error Statistics | 3 Comments

22-23 September final schedule for workshop: The statistics wars and their casualties ONLINE

The Statistics Wars
and Their Casualties

Final Schedule for September 22 & 23 (Workshop Sessions 1 & 2) Continue reading

Categories: Error Statistics | Leave a comment

22-23 Workshop Schedule: The Statistics Wars and Their Casualties: ONLINE

You can still register: https://phil-stat-wars.com/2022/09/19/22-23-september-workshop-schedule-the-statistics-wars-and-their-casualties/ Continue reading

Categories: Error Statistics | Leave a comment

Behavioral vs Evidential Interpretations of N-P tests: E.S. Pearson’s Statistical Philosophy: Belated Birthday Wish

E.S. Pearson

This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980)–one of my statistical heroes. It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. Yes, I know I’ve been neglecting this blog as of late, because I’m busy planning our workshop: The Statistics Wars and Their Casualties (22-23 September, online). See phil-stat-wars.com. I will reblog some favorite Pearson posts in the next few days.

HAPPY BELATED BIRTHDAY EGON!

Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) PearsonCases of Type A and Type B

“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)

Pearson considers the rationale that might be given to N-P tests in two types of cases, A and B:

“(A) At one extreme we have the case where repeated decisions must be made on results obtained from some routine procedure…

(B) At the other is the situation where statistical tools are applied to an isolated investigation of considerable importance…?” (ibid., 170)

In cases of type A, long-run results are clearly of interest, while in cases of type B, repetition is impossible and may be irrelevant:

“In other and, no doubt, more numerous cases there is no repetition of the same type of trial or experiment, but all the same we can and many of us do use the same test rules to guide our decision, following the analysis of an isolated set of numerical data. Why do we do this? What are the springs of decision? Is it because the formulation of the case in terms of hypothetical repetition helps to that clarity of view needed for sound judgment?

Or is it because we are content that the application of a rule, now in this investigation, now in that, should result in a long-run frequency of errors in judgment which we control at a low figure?” (Ibid., 173)

Although Pearson leaves this tantalizing question unanswered, claiming, “On this I should not care to dogmatize”, in studying how Pearson treats cases of type B, it is evident that in his view, “the formulation of the case in terms of hypothetical repetition helps to that clarity of view needed for sound judgment” in learning about the particular case at hand.

“Whereas when tackling problem A it is easy to convince the practical man of the value of a probability construct related to frequency of occurrence, in problem B the argument that ‘if we were to repeatedly do so and so, such and such result would follow in the long run’ is at once met by the commonsense answer that we never should carry out a precisely similar trial again.

Nevertheless, it is clear that the scientist with a knowledge of statistical method behind him can make his contribution to a round-table discussion…” (Ibid., 171).

Pearson gives the following example of a case of type B (from his wartime work), where he claims no repetition is intended:

“Example of type B. Two types of heavy armour-piercing naval shell of the same caliber are under consideration; they may be of different design or made by different firms…. Twelve shells of one kind and eight of the other have been fired; two of the former and five of the latter failed to perforate the plate….”(Pearson 1947, 171) 

“Starting from the basis that, individual shells will never be identical in armour-piercing qualities, however good the control of production, he has to consider how much of the difference between (i) two failures out of twelve and (ii) five failures out of eight is likely to be due to this inevitable variability. ..”(Ibid.,)

We’re interested in considering what other outcomes could have occurred, and how readily, in order to learn what variability alone is capable of producing. As a noteworthy aside, Pearson shows that treating the observed difference (between the two proportions) in one way yields an observed significance level of 0.052; treating it differently (along Barnard’s lines), he gets 0.025 as the (upper) significance level. But in scientific cases, Pearson insists, the difference in error probabilities makes no real difference to substantive judgments in interpreting the results. Only in an unthinking, automatic, routine use of tests would it matter:

“Were the action taken to be decided automatically by the side of the 5% level on which the observation point fell, it is clear that the method of analysis used would here be of vital importance. But no responsible statistician, faced with an investigation of this character, would follow an automatic probability rule.” (ibid., 192)

The two analyses correspond to the tests effectively asking different questions, and if we recognize this, says Pearson, different meanings may be appropriately attached.

Three Steps in the Original Construction of Tests

After setting up the test (or null) hypothesis, and the alternative hypotheses against which “we wish the test to have maximum discriminating power” (Pearson 1947, 173), Pearson defines three steps in specifying tests:

“Step 1. We must specify the experimental probability set, the set of results which could follow on repeated application of the random process used in the collection of the data…

Step 2. We then divide this set [of possible results] by a system of ordered boundaries…such that as we pass across one boundary and proceed to the next, we come to a class of results which makes us more and more inclined on the Information  available, to reject the hypothesis tested in favour of alternatives which differ from it by increasing amounts”.

“Step 3. We then, if possible[i], associate with each contour level the chance that, if [the null] is true, a result will occur in random sampling lying beyond that level” (ibid.).

Pearson warns that:

“Although the mathematical procedure may put Step 3 before 2, we cannot put this into operation before we have decided, under Step 2, on the guiding principle to be used in choosing the contour system. That is why I have numbered the steps in this order.” (Ibid. 173).

Strict behavioristic formulations jump from step 1 to step 3, after which one may calculate how the test has in effect accomplished step 2.  However, the resulting test, while having adequate error probabilities, may have an inadequate distance measure and may even be irrelevant to the hypothesis of interest. This is one reason critics can construct howlers that appear to be licensed by N-P methods, and which make their way from time to time into this blog.

So step 3 remains crucial, even for cases of type [B]. There are two reasons: pre-data planning—that’s familiar enough—but secondly, for post-data scrutiny. Post data, step 3 enables determining the capability of the test to have detected various discrepancies, departures, and errors, on which a critical scrutiny of the inferences are based. More specifically, the error probabilities are used to determine how well/poorly corroborated, or how severely tested, various claims are, post-data.

If we can readily bring about statistically significantly higher rates of success with the first type of armour-piercing naval shell than with the second (in the above example), we have evidence the first is superior. Or, as Pearson modestly puts it: the results “raise considerable doubts as to whether the performance of the [second] type of shell was as good as that of the [first]….” (Ibid., 192)[ii]

Still, while error rates of procedures may be used to determine how severely claims have/have not passed they do not automatically do so—hence, again, opening the door to potential howlers that neither Egon nor Jerzy for that matter would have countenanced.

Neyman Was the More Behavioristic of the Two

Pearson was (rightly) considered to have rejected the more behaviorist leanings of Neyman.

Here’s a snippet from an unpublished letter he wrote to Birnbaum (1974) about the idea that the N-P theory admits of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

In Pearson’s (1955) response to Fisher (blogged here):

“To dispel the picture of the Russian technological bogey, I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot….!” (Pearson 1955, 204)

“To the best of my ability I was searching for a way of expressing in mathematical terms what appeared to me to be the requirements of the scientist in applying statistical tests to his data. After contact was made with Neyman in 1926, the development of a joint mathematical theory proceeded much more surely; it was not till after the main lines of this theory had taken shape with its necessary formalization in terms of critical regions, the class of admissible hypotheses, the two sources of error, the power function, etc., that the fact that there was a remarkable parallelism of ideas in the field of acceptance sampling became apparent. Abraham Wald’s contributions to decision theory of ten to fifteen years later were perhaps strongly influenced by acceptance sampling problems, but that is another story.“ (ibid., 204-5).

“It may be readily agreed that in the first Neyman and Pearson paper of 1928, more space might have been given to discussing how the scientific worker’s attitude of mind could be related to the formal structure of the mathematical probability theory….Nevertheless it should be clear from the first paragraph of this paper that we were not speaking of the final acceptance or rejection of a scientific hypothesis on the basis of statistical analysis…. Indeed, from the start we shared Professor Fisher’s view that in scientific enquiry, a statistical test is ‘a means of learning”… (Ibid., 206)

“Professor Fisher’s final criticism concerns the use of the term ‘inductive behavior’; this is Professor Neyman’s field rather than mine.” (Ibid., 207)

These points on Pearson are discussed in more depth in my book Statistical Inference as Severe Testing (SIST): How to Get Beyond the Statistics Wars (CUP 2018). You can read and download the entire book for free during the month of August 2022 at the following link:

https://www.cambridge.org/core/books/statistical-inference-as-severe-testing/D9DF409EF568090F3F60407FF2B973B2

 

References:

Pearson, E. S. (1947), “The choice of Statistical Tests illustrated on the Interpretation of Data Classed in a 2×2 Table,Biometrika 34(1/2): 139-167.

Pearson, E. S. (1955), “Statistical Concepts and Their Relationship to RealityJournal of the Royal Statistical Society, Series B, (Methodological), 17(2): 204-207.

Neyman, J. and Pearson, E. S. (1928), “On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference, Part I.Biometrika 20(A): 175-240.


[i] In some cases only an upper limit to this error probability may be found.

[ii] Pearson inadvertently switches from number of failures to number of successes in the conclusion of this paper.

Categories: E.S. Pearson, Error Statistics | Leave a comment

The Statistics Wars and Their Casualties Workshop-Now Online

The Statistics Wars
and Their Casualties 

22-23 September 2022
15:00-18:00 pm London Time*

ONLINE 

To register for the workshop, please fill out the registration form here.

*These will be sessions 1 & 2, there will be two more
The future online sessions (3 & 4)  at 15:00-18:00 pm London Time on December 1 & 8.

Yoav Benjamini (Tel Aviv University), Alexander Bird (University of Cambridge), Mark Burgman (Imperial College London),  Daniele Fanelli (London School of Economics and Political Science), Roman Frigg (London School of Economics and Political Science),
Stephan Guttinger
(University of Exeter), David Hand (Imperial College London), Margherita Harris (London School of Economics and Political Science), Christian Hennig (University of Bologna), Daniël Lakens (Eindhoven University of Technology), Deborah Mayo (Virginia Tech), Richard Morey (Cardiff University), Stephen Senn (Edinburgh, Scotland), Jon Williamson (University of Kent) Continue reading

Categories: Announcement, Error Statistics | Leave a comment

10 years after the July 4 statistical discovery of the the Higgs & the value of negative results

Higgs

Today marks a decade since the discovery on July 4, 2012 of evidence for a Higgs particle based on a “5 sigma observed effect”. CERN celebrated with a scientific symposium (webcast here). The observed effect refers to the number of excess events of a given type that are “observed” in comparison to the number that would be expected from background alone—which they can simulate in particle detectors. Because the 5-sigma standard refers to a benchmark from frequentist significance testing, the discovery was immediately imbued with controversies that, at bottom, concerned statistical philosophy. Continue reading

Categories: Error Statistics | 2 Comments

Dissent

 

Continue reading

Categories: Error Statistics | 1 Comment

D. Mayo & D. Hand: “Statistical significance and its critics: practicing damaging science, or damaging scientific practice?”

.

Prof. Deborah Mayo, Emerita
Department of Philosophy
Virginia Tech

.

Prof. David Hand
Department of Mathematics Statistics
Imperial College London

Statistical significance and its critics: practicing damaging science, or damaging scientific practice?  (Synthese)

[pdf of full paper.] Continue reading

Categories: Error Statistics

Insevere Tests of Severe Testing (iv)

.

One does not have evidence for a claim if little if anything has been done to rule out ways the claim may be false. The claim may be said to “pass” the test, but it’s one that utterly lacks stringency or severity. On the basis of this very simple principle, I build a notion of evidence that applies to any error prone inference. In this account, data x are evidence for a claim C only if (and only to the extent that) C has passed a severe test with x.[1] How to apply this simple idea, however, and how to use it to solve central problems of induction and statistical inference requires careful consideration of how it is to be fleshed out. (See this post on strong vs weak severity.) Continue reading

Categories: Error Statistics

No fooling: The Statistics Wars and Their Casualties Workshop is Postponed to 22-23 September, 2022

The Statistics Wars
and Their Casualties

Postponed to
22-23 September 2022

 

London School of Economics (CPNSS)

Yoav Benjamini (Tel Aviv University), Alexander Bird (University of Cambridge), Mark Burgman (Imperial College London),
Daniele Fanelli (London School of Economics and Political Science), Roman Frigg (London School of Economics and Political Science), Stephen Guttinger (University of Exeter), David Hand (Imperial College London), Margherita Harris (London School of Economics and Political Science), Christian Hennig (University of Bologna), Katrin Hohl *(City University London),
Daniël Lakens (Eindhoven University of Technology), Deborah Mayo (Virginia Tech), Richard Morey (Cardiff University), Stephen Senn (Edinburgh, Scotland), Jon Williamson (University of Kent) Continue reading

Categories: Error Statistics

Sir David Cox

July 15, 1924-January 18, 2022

 

Categories: Error Statistics

Yudi Pawitan: Behavioral aspects in the statistical significance war-game(Guest Post)

.

 

Yudi Pawitan
Professor
Department of Medical Epidemiology and Biostatistics
Karolinska Institutet, Stockholm

 

Behavioral aspects in the statistical significance war-game

I remember with fondness the good old days when the only ‘statistical war’-game was fought between the Bayesian and the frequentist. It was simpler – except when the likelihood principle is thrown in, always guaranteed to confound the frequentist – and the participants were for the most part collegial. Moreover, there was a feeling that it was a philosophical debate. Even though the Bayesian-frequentist war is not fully settled, we can see areas of consensus, for example in objective Bayesianism or in conditional inference. However, on the P-value and statistical significance front, the war looks less simple as it is about statistical praxis; it is no longer Bayesian vs frequentist, with no consensus in sight and with wide implications affecting the day-to-day use of statistics. Typically, a persistent controversy between otherwise sensible and knowledgeable people – thus excluding anti-vaxxers and conspiracy theorists – might indicate we are missing some common perspectives or perhaps the big picture. In complex issues there can be genuinely distinct aspects about which different players disagree and, at some point, agree to disagree. I am not sure we have reached that point yet, with each side still working to persuade the other side the faults of their position. For now, I can only concur with Mayo (2021)’s appeal that at least the umpires – journals editors – recognize (a) the issue at hand and (b) that genuine debates are still ongoing, so it is not yet time to take sides. Continue reading

Categories: Error Statistics

I’ll be speaking at the Philo of Sci Association (PSA): Philosophy IN Science: Can Philosophers of Science Contribute to Science?

.

Philosophy in Science: Can Philosophers of Science Contribute to Science?
     on November 13, 2-4 pm

 

This session revolves around the intriguing question: Can Philosophers of Science Contribute to Science? They’re calling it philosophy “in” science–when philosophical ministrations actually intervene in a science itself.  This is the session I’ll be speaking in. I hope you will come to our session if you’re there–it’s hybrid, so you can’t see it through a remote link. But I’d like to hear what you think about this question–in the comments to this post. Continue reading

Categories: Error Statistics

Philo of Sci Assoc (PSA) Session: Current Debates on Statistical Modeling and Inference

 

.

The Philosophy of Science Association (PSA) is holding its biennial meeting (one year late)–live/hybrid/remote*–in November, 2021, and I plan to be there (first in-person meeting since Feb 2020). Some of the members from the 2019 Summer Seminar that I ran with Aris Spanos are in a Symposium:

Current Debates on Statistical Modeling and Inference
(co-author Mike Tamir, Berkeley)     on November 13, 9 am-12:15 pm  

Here are the members and talks (Link to session/abstracts):

  • Aris Spanos (Virginia Tech): Self-Correction and Statistical Misspecification (co-author Deborah Mayo, Virginia Tech)
  • Roubin Gong (Rutgers): Measuring Severity in Statistical Inference
  • Riet van Bork (University of Amsterdam): Psychometric Models: Statistics and Interpretation (co-author Jan-Willem Romeijn, University of Groningen)
  • Marcello di Bello (Lehman College CUNY): Is Algorithmic Fairness Possible?
  • Elay Shech (Auburn University): Statistical Modeling, Mis-specification Testing, and Exploration (co-author Mike Tamir, Berkeley)

Continue reading

Categories: Error Statistics

Workshop-New Date!

The Statistics Wars
and Their Casualties

New Date!

4-5 April 2022

London School of Economics (CPNSS)

Yoav Benjamini (Tel Aviv University), Alexander Bird (University of Cambridge), Mark Burgman (Imperial College London),  Daniele Fanelli (London School of Economics and Political Science), Roman Frigg (London School of Economics and Political Science), Stephen Guttinger (University of Exeter), David Hand (Imperial College London), Margherita Harris (London School of Economics and Political Science), Christian Hennig (University of Bologna), Katrin Hohl (City University London), Daniël Lakens (Eindhoven University of Technology), Deborah Mayo (Virginia Tech), Richard Morey (Cardiff University), Stephen Senn (Edinburgh, Scotland), Jon Williamson (University of Kent) Continue reading

Categories: Error Statistics

Performance or Probativeness? E.S. Pearson’s Statistical Philosophy: Belated Birthday Wish

E.S. Pearson

This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. Yes, i know I’ve been neglecting this blog as of late, but this topic will appear in a new guise in a post I’m writing now, to appear tomorrow.

HAPPY BELATED BIRTHDAY EGON!

Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson.  Continue reading

Categories: E.S. Pearson, Error Statistics

June 24: “Have Covid-19 lockdowns led to an increase in domestic violence? Drawing inferences from police administrative data” (Katrin Hohl)

The tenth meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

24 June 2021

TIME: 15:00-16:45 (London); 10:00-11:45 (New York, EST)

For information about the Phil Stat Wars forum and how to join, click on this link.

Katrin Hohl_copy

.

“Have Covid-19 lockdowns led to an increase in domestic violence? Drawing inferences from police administrative data” 

Katrin Hohl Continue reading

Categories: Error Statistics

Blog at WordPress.com.