Error Statistics

Higgs discovery two years on (2: Higgs analysis and statistical flukes)

Higgs_cake-sI’m reblogging a few of the Higgs posts, with some updated remarks, on this two-year anniversary of the discovery. (The first was in my last post.) The following, was originally “Higgs Analysis and Statistical Flukes: part 2″ (from March, 2013).[1]

Some people say to me: “This kind of reasoning is fine for a ‘sexy science’ like high energy physics (HEP)”–as if their statistical inferences are radically different. But I maintain that this is the mode by which data are used in “uncertain” reasoning across the entire landscape of science and day-to-day learning (at least, when we’re trying to find things out)[2] Even with high level theories, the particular problems of learning from data are tackled piecemeal, in local inferences that afford error control. Granted, this statistical philosophy differs importantly from those that view the task as assigning comparative (or absolute) degrees-of-support/belief/plausibility to propositions, models, or theories.  Continue reading

Categories: Higgs, highly probable vs highly probed, P-values, Severity, Statistics | 13 Comments

“Statistical Science and Philosophy of Science: where should they meet?”

img_1142

Four score years ago (!) we held the conference “Statistical Science and Philosophy of Science: Where Do (Should) They meet?” at the London School of Economics, Center for the Philosophy of Natural and Social Science, CPNSS, where I’m visiting professor [1] Many of the discussions on this blog grew out of contributions from the conference, and conversations initiated soon after. The conference site is here; my paper on the general question is here.[2]

My main contribution was “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. It begins like this: 

1. Comedy Hour at the Bayesian Retreat[3]

 Overheard at the comedy hour at the Bayesian retreat: Did you hear the one about the frequentist… Continue reading

Categories: Error Statistics, Philosophy of Statistics, Severity, Statistics, StatSci meets PhilSci | 23 Comments

A. Spanos: “Recurring controversies about P values and confidence intervals revisited”

A SPANOS

Aris Spanos
Wilson E. Schmidt Professor of Economics
Department of Economics, Virginia Tech

Recurring controversies about P values and confidence intervals revisited*
Ecological Society of America (ESA) ECOLOGY
Forum—P Values and Model Selection (pp. 609-654)
Volume 95, Issue 3 (March 2014): pp. 645-651

INTRODUCTION

The use, abuse, interpretations and reinterpretations of the notion of a P value has been a hot topic of controversy since the 1950s in statistics and several applied fields, including psychology, sociology, ecology, medicine, and economics.

The initial controversy between Fisher’s significance testing and the Neyman and Pearson (N-P; 1933) hypothesis testing concerned the extent to which the pre-data Type  I  error  probability  α can  address the arbitrariness and potential abuse of Fisher’s post-data  threshold for the value. Continue reading

Categories: CIs and tests, Error Statistics, Fisher, P-values, power, Statistics | 32 Comments

Who ya gonna call for statistical Fraudbusting? R.A. Fisher, P-values, and error statistics (again)

images-9If there’s somethin’ strange in your neighborhood. Who ya gonna call?(Fisherian Fraudbusters!)*

*[adapted from R. Parker's "Ghostbusters"]

When you need to warrant serious accusations of bad statistics, if not fraud, where do scientists turn? Answer: To the frequentist error statistical reasoning and to p-value scrutiny, first articulated by R.A. Fisher[i].The latest accusations of big time fraud in social psychology concern the case of Jens Förster. As Richard Gill notes:

The methodology here is not new. It goes back to Fisher (founder of modern statistics) in the 30’s. Many statistics textbooks give as an illustration Fisher’s re-analysis (one could even say: meta-analysis) of Mendel’s data on peas. The tests of goodness of fit were, again and again, too good. There are two ingredients here: (1) the use of the left-tail probability as p-value instead of the right-tail probability. (2) combination of results from a number of independent experiments using a trick invented by Fisher for the purpose, and well known to all statisticians. (Richard D. Gill)

Continue reading

Categories: Error Statistics, Fisher, significance tests, Statistical fraudbusting, Statistics | 42 Comments

A. Spanos: Talking back to the critics using error statistics (Phil6334)

spanos 2014

Aris Spanos’ overview of error statistical responses to familiar criticisms of statistical tests. Related reading is Mayo and Spanos (2011)

Categories: Error Statistics, frequentist/Bayesian, Phil6334, reforming the reformers, statistical tests, Statistics | Leave a comment

Phil 6334: Foundations of statistics and its consequences: Day #12

picture-216-1We interspersed key issues from the reading for this session (from Howson and Urbach) with portions of my presentation at the Boston Colloquium (Feb, 2014): Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge. (Slides below)*.

Someone sent us a recording  (mp3)of the panel discussion from that Colloquium (there’s a lot on “big data” and its politics) including: Mayo, Xiao-Li Meng (Harvard), Kent Staley (St. Louis), and Mark van der Laan (Berkeley). 

See if this works: | mp3

*There’s a prelude here to our visitor on April 24: Professor Stanley Young from the National Institute of Statistical Sciences.

 

Categories: Bayesian/frequentist, Error Statistics, Phil6334 | 43 Comments

Fallacies of statistics & statistics journalism, and how to avoid them: Summary & Slides Day #8 (Phil 6334)

picture-216-1

.

We spent the first half of Thursday’s seminar discussing the FisherNeyman, and E. Pearson “triad”[i]. So, since it’s Saturday night, join me in rereading for the nth time these three very short articles. The key issues were: error of the second kind, behavioristic vs evidential interpretations, and Fisher’s mysterious fiducial intervals. Although we often hear exaggerated accounts of the differences in the Fisherian vs Neyman-Pearson (NP) methodology, in fact, N-P were simply providing Fisher’s tests with a logical ground (even though other foundations for tests are still possible), and Fisher welcomed this gladly. Notably, with the single null hypothesis, N-P showed that it was possible to have tests where the probability of rejecting the null when true exceeded the probability of rejecting it when false. Hacking called such tests “worse than useless”, and N-P develop a theory of testing that avoids such problems. Statistical journalists who report on the alleged “inconsistent hybrid” (a term popularized by Gigerenzer) should recognize the extent to which the apparent disagreements on method reflect professional squabbling between Fisher and Neyman after 1935 [A recent example is a Nature article by R. Nuzzo in ii below]. The two types of tests are best seen as asking different questions in different contexts. They both follow error-statistical reasoning. Continue reading

Categories: phil/history of stat, Phil6334, science communication, Severity, significance tests, Statistics | Tags: | 35 Comments

New SEV calculator (guest app: Durvasula)

Unknown-1Karthik Durvasula, a blog follower[i], sent me a highly apt severity app that he created: https://karthikdurvasula.shinyapps.io/Severity_Calculator/
I have his permission to post it or use it for pedagogical purposes, so since it’s Saturday night, go ahead and have some fun with it. Durvasula had the great idea of using it to illustrate howlers. Also, I would add, to discover them.
It follows many of the elements of the Excel Sev Program discussed recently, but it’s easier to use.* (I’ll add some notes about the particular claim (i.e, discrepancy) for which SEV is being computed later on).
*If others want to tweak or improve it, he might pass on the source code (write to me on this).
[i] I might note that Durvasula was the winner of the January palindrome contest.
Categories: Severity, Statistics | 12 Comments

Cosma Shalizi gets tenure (at last!) (metastat announcement)

ShaliziNews Flash! Congratulations to Cosma Shalizi who announced yesterday that he’d been granted tenure (Statistics, Carnegie Mellon). Cosma is a leading error statistician, a creative polymath and long-time blogger (at Three-Toad sloth). Shalizi wrote an early book review of EGEK (Mayo 1996)* that people still send me from time to time, in case I hadn’t seen it! You can find it on this blog from 2 years ago (posted by Jean Miller). A discussion of a meeting of the minds between Shalizi and Andrew Gelman is here.

*Error and the Growth of Experimental Knowledge.

Categories: Announcement, Error Statistics, Statistics | Tags: | Leave a comment

Phil6334: Popper self-test

images-10Those reading Popper[i] with us might be interested in an (undergraduate) item I came across: Popper Self-Test Questions. It includes multiple choice questions, quotes to ponder, and thumbnail definitions at the end[ii].
[i]Popper reading (for Feb 13, 2014) from Conjectures and Refutations
[ii]I might note the “No-Pain philosophy” (3 part) Popper posts from this blog: parts 12, and 3.

Categories: Error Statistics | 1 Comment

Phil6334: “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech) UPDATE: JAN 21

FURTHER UPDATED: New course for Spring 2014: Thurs 3:30-6:15 (Randolph 209)

first installment 6334 syllabus_SYLLABUS (first) Phil 6334: Philosophy of Statistical Inference and ModelingPicture 216 1mayo

picture-072-1-1

D. Mayo and A. Spanos

Contact: error@vt.edu

This new course, to be jointly taught by Professors D. Mayo (Philosophy) and A. Spanos (Economics) will provide an introductory, in-depth introduction to graduate level research in philosophy of inductive-statistical inference and probabilistic methods of evidence (a branch of formal epistemology). We explore philosophical problems of confirmation and induction, the philosophy and history of frequentist and Bayesian approaches, and key foundational controversies surrounding tools of statistical data analytics, modeling and hypothesis testing in the natural and social sciences, and in evidence-based policy.

We now have some tentative topics and dates:

 

course flyer pic

1. 1/23 Introduction to the Course: 4 waves of controversy in the philosophy of statistics
2. 1/30 How to tell what’s true about statistical inference: Probabilism, performance and probativeness
3. 2/6 Induction and Confirmation: Formal Epistemology
4. 2/13 Induction, falsification, severe tests: Popper and Beyond
5. 2/20 Statistical models and estimation: the Basics
6. 2/27 Fundamentals of significance tests and severe testing
7. 3/6 Five sigma and the Higgs Boson discovery Is it “bad science”?
SPRING BREAK Statistical Exercises While Sunning
 8. 3/20  Fraudbusting and Scapegoating: Replicability and big data: are most scientific results false?
9. 3/27 How can we test the assumptions of statistical models?
All models are false; no methods are objective: Philosophical problems of misspecification testing: Spanos method
10. 4/3 Fundamentals of Statistical Testing: Family Feuds and 70 years of controversy
11. 4/10 Error Statistical Philosophy: Highly Probable vs Highly Probed
Some howlers of testing
12. 4/17 What ever happened to Bayesian Philosophical Foundations? Dutch books etc. Fundamental of Bayesian statistics
13. 4/24 Bayesian-frequentist reconciliations, unifications, and O-Bayesians
14. 5/1 Overview: Answering the critics: Should statistical philosophy be divorced from methodology?
(15. TBA) Topic to be chosen (Resampling statistics and new journal policies? Likelihood principle)

 Interested in attending? E.R.R.O.R.S.* can fund travel (presumably driving) and provide accommodation for Thurs. night in a conference lodge in Blacksburg for a few people through (or part of)  the semester. If interested, write ASAP for details (with a brief description of your interest and background) to error@vt.edu. (Several people asked about long-distance hook-ups: We will try to provide some sessions by Skype, and will put each of the seminar items here (also check the Phil6334 page on this blog). 

A sample of questions we consider*:

  • What makes an inquiry scientific? objective? When are we warranted in generalizing from data?
  • What is the “traditional problem of induction”?  Is it really insoluble?  Does it matter in practice?
  • What is the role of probability in uncertain inference? (to assign degrees of confirmation or belief? to characterize the reliability of test procedures?) 3P’s: Probabilism, performance and probativeness
  • What is probability? Random variables? Estimates? What is the relevance of long-run error probabilities for inductive inference in science?
  • What did Popper really say about severe testing, induction, falsification? Is it time for a new definition of pseudoscience?
  • Confirmation and falsification: Carnap and Popper, paradoxes of confirmation; contemporary formal epistemology
  • What is the current state of play in the “statistical wars” e.g., between frequentists, likelihoodists, and (subjective vs. “non-subjective”) Bayesians?
  • How should one specify and interpret p-values, type I and II errors, confidence levels?  Can one tell the truth (and avoid fallacies) with statistics? Do the “reformers” themselves need reform?
  • Is it unscientific (ad hoc, degenerating) to use the same data both in constructing and testing hypotheses? When and why?
  • Is it possible to test assumptions of statistical models without circularity?
  • Is the new research on “replicability” well-founded, or an erroneous use of screening statistics for long-run performance?
  • Should randomized studies be the “gold standard” for “evidence-based” science and policy?
  • What’s the problem with big data: cherry-picking, data mining, multiple testing
  • The many faces of Bayesian statistics: Can there be uninformative prior probabilities? (No) Principles of indifference over the years
  • Statistical fraudbusting: psychology, economics, evidence-based policy
  • Applied controversies (selected): Higgs experiments, climate modeling, social psychology, econometric modeling, development economic

D. Mayo (books):

How to Tell What’s True About Statistical Inference, (Cambridge, in progress).

Error and the Growth of Experimental KnowledgeChicago: Chicago University Press, 1996. (Winner of 1998 Lakatos Prize).

Acceptable Evidence: Science and Values in Risk Managementco-edited with Rachelle Hollander, New York: Oxford University Press, 1994.

Aris Spanos (books):

Probability Theory and Statistical Inference, Cambridge, 1999.

Statistical Foundations of Econometric Modeling, Cambridge, 1986.

Joint (books): Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science, D. Mayo & A. Spanos (eds.), Cambridge: Cambridge University Press, 2010. [The book includes both papers and exchanges between Mayo and A. Chalmers, A. Musgrave, P. Achinstein, J. Worrall, C. Glymour, A. Spanos, and joint papers with Mayo and Sir David Cox].

Categories: Announcement, Error Statistics, Statistics | 5 Comments

Objective/subjective, dirty hands and all that: Gelman/ Wasserman blogolog (ii)

Objectivity #2: The “Dirty Hands” Argument for Ethics in EvidenceAndrew Gelman says that as a philosopher, I should appreciate his blog today in which he records his frustration: “Against aggressive definitions: No, I don’t think it helps to describe Bayes as ‘the analysis of subjective beliefs’…”  Gelman writes:

I get frustrated with what might be called “aggressive definitions,” where people use a restrictive definition of something they don’t like. For example, Larry Wasserman writes (as reported by Deborah Mayo):

“I wish people were clearer about what Bayes is/is not and what 
frequentist inference is/is not. Bayes is the analysis of subjective
 beliefs but provides no frequency guarantees. Frequentist inference 
is about making procedures that have frequency guarantees but makes no 
pretense of representing anyone’s beliefs.”

I’ll accept Larry’s definition of frequentist inference. But as for his definition of Bayesian inference: No no no no no. The probabilities we use in our Bayesian inference are not subjective, or, they’re no more subjective than the logistic regressions and normal distributions and Poisson distributions and so forth that fill up all the textbooks on frequentist inference.

To quickly record some of my own frustrations:*: First, I would disagree with Wasserman’s characterization of frequentist inference, but as is clear from Larry’s comments to (my reaction to him), I think he concurs that he was just giving a broad contrast. Please see Note [1] for a remark from my post: Comments on Wasserman’s “what is Bayesian/frequentist inference?” Also relevant is a Gelman post on the Bayesian name: [2].

Second, Gelman’s “no more subjective than…” evokes  remarks I’ve made before. For example, in “What should philosophers of science do…” I wrote:

Arguments given for some very popular slogans (mostly by non-philosophers), are too readily taken on faith as canon by others, and are repeated as gospel. Examples are easily found: all models are false, no models are falsifiable, everything is subjective, or equally subjective and objective, and the only properly epistemological use of probability is to supply posterior probabilities for quantifying actual or rational degrees of belief. Then there is the cluster of “howlers” allegedly committed by frequentist error statistical methods repeated verbatim (discussed on this blog).

I’ve written a lot about objectivity on this blog, e.g., here, here and here (and in real life), but what’s the point if people just rehearse the “everything is a mixture…” line, without making deeply important distinctions? I really think that, next to the “all models are false” slogan, the most confusion has been engendered by the “no methods are objective” slogan. However much we may aim at objective constraints, it is often urged, we can never have “clean hands” free of the influence of beliefs and interests, and we invariably sully methods of inquiry by the entry of background beliefs and personal judgments in their specification and interpretation. Continue reading

Categories: Bayesian/frequentist, Error Statistics, Gelman, Objectivity, Statistics | 41 Comments

Two Severities? (PhilSci and PhilStat)

Janus--2faceThe blog “It’s Chancy” (Corey Yanofsky) has a post today about “two severities” which warrants clarification. Two distinctions are being blurred: between formal and informal severity assessments, and between a statistical philosophy (something Corey says he’s interested in) and its relevance to philosophy of science (which he isn’t). I call the latter an error statistical philosophy of science. The former requires both formal, semi-formal and informal severity assessments. Here’s his post:

In the comments to my first post on severity, Professor Mayo noted some apparent and some actual misstatements of her views.To avert misunderstandings, she directed readers to two of her articles, one of which opens by making this distinction:

“Error statistics refers to a standpoint regarding both (1) a general philosophy of science and the roles probability plays in inductive inference, and (2) a cluster of statistical tools, their interpretation, and their justification.”

In Mayo’s writings I see  two interrelated notions of severity corresponding to the two items listed in the quote: (1) an informal severity notion that Mayo uses when discussing philosophy of science and specific scientific investigations, and (2) Mayo’s formalization of severity at the data analysis level.

One of my besetting flaws is a tendency to take a narrow conceptual focus to the detriment of the wider context. In the case of Severity, part one, I think I ended up making claims about severity that were wrong. I was narrowly focused on severity in sense (2) — in fact, on one specific equation within (2) — but used a mish-mash of ideas and terminology drawn from all of my readings of Mayo’s work. When read through a philosophy-of-science lens, the result is a distorted and misstated version of severity in sense (1) .

As a philosopher of science, I’m a rank amateur; I’m not equipped to add anything to the conversation about severity as a philosophy of science. My topic is statistics, not philosophy, and so I want to warn readers against interpreting Severity, part one as a description of Mayo’s philosophy of science; it’s more of a wordy introduction to the formal definition of severity in sense (2).[It’s Chancy, Jan 11, 2014)

A needed clarification may be found in a post of mine which begins: 

Error statistics: (1) There is a “statistical philosophy” and a philosophy of science. (a) An error-statistical philosophy alludes to the methodological principles and foundations associated with frequentist error-statistical methods. (b) An error-statistical philosophy of science, on the other hand, involves using the error-statistical methods, formally or informally, to deal with problems of philosophy of science: to model scientific inference (actual or rational), to scrutinize principles of inference, and to address philosophical problems about evidence and inference (the problem of induction, underdetermination, warranting evidence, theory testing, etc.).

I assume the interest here* is on the former, (a). I have stated it in numerous ways, but the basic position is that inductive inference—i.e., data-transcending inference—calls for methods of controlling and evaluating error probabilities (even if only approximate). An inductive inference, in this conception, takes the form of inferring hypotheses or claims to the extent that they have been well tested. It also requires reporting claims that have not passed severely, or have passed with low severity. In the “severe testing” philosophy of induction, the quantitative assessment offered by error probabilities tells us not “how probable” but, rather, “how well probed” hypotheses are.  The local canonical hypotheses of formal tests and estimation methods need not be the ones we entertain post data; but they give us a place to start without having to go “the designer-clothes” route.

The post-data interpretations might be formal, semi-formal, or informal.

See also: Staley’s review of Error and Inference (Mayo and Spanos eds.)

Categories: Review of Error and Inference, Severity, StatSci meets PhilSci | 52 Comments

“Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech)

New course for Spring 2014: Thursday 3:30-6:15

picture-072-1-1Phil 6334: Philosophy of Statistical Inference and ModelingPicture 216 1mayo

D. Mayo and A. Spanos

Contact: error@vt.edu

This new course, to be jointly taught by Professors D. Mayo (Philosophy) and A. Spanos (Economics) will provide an introductory, in-depth introduction to graduate level research in philosophy of inductive-statistical inference and probabilistic methods of evidence (a branch of formal epistemology). We explore philosophical problems of confirmation and induction, the philosophy and history of frequentist and Bayesian approaches, and key foundational controversies surrounding tools of statistical data analytics, modeling and hypothesis testing in the natural and social sciences, and in evidence-based policy.

course flyer pic

A sample of questions we consider*:

  • What makes an inquiry scientific? objective? When are we warranted in generalizing from data?
  • What is the “traditional problem of induction”?  Is it really insoluble?  Does it matter in practice?
  • What is the role of probability in uncertain inference? (to assign degrees of confirmation or belief? to characterize the reliability of test procedures?) 3P’s: Probabilism, performance and probativeness
  • What is probability? Random variables? Estimates? What is the relevance of long-run error probabilities for inductive inference in science?
  • What did Popper really say about severe testing, induction, falsification? Is it time for a new definition of pseudoscience?
  • Confirmation and falsification: Carnap and Popper, paradoxes of confirmation; contemporary formal epistemology
  • What is the current state of play in the “statistical wars” e.g., between frequentists, likelihoodists, and (subjective vs. “non-subjective”) Bayesians?
  • How should one specify and interpret p-values, type I and II errors, confidence levels?  Can one tell the truth (and avoid fallacies) with statistics? Do the “reformers” themselves need reform?
  • Is it unscientific (ad hoc, degenerating) to use the same data both in constructing and testing hypotheses? When and why?
  • Is it possible to test assumptions of statistical models without circularity?
  • Is the new research on “replicability” well-founded, or an erroneous use of screening statistics for long-run performance?
  • Should randomized studies be the “gold standard” for “evidence-based” science and policy?
  • What’s the problem with big data: cherry-picking, data mining, multiple testing
  • The many faces of Bayesian statistics: Can there be uninformative prior probabilities? (No) Principles of indifference over the years
  • Statistical fraudbusting: psychology, economics, evidence-based policy
  • Applied controversies (selected): Higgs experiments, climate modeling, social psychology, econometric modeling, development economic

Interested in attending? E.R.R.O.R.S.* can fund travel (presumably driving) and provide lodging for Thurs. night in a conference lodge in Blacksburg for a few people through (or part of)  the semester. Topics will be posted over the next week, but if you might be interested, write ASAP for details (with a brief description of your interest and background) to error@vt.edu. 

*This course will be a brand new version of related seminar we’ve led in the past, so we don’t have the syllabus set yet. We’re going to try something different this time. I’ll be updating in subsequent installments to the blog.

Dates: January 23, 30; February 6, 13, 20, 27; March 6, [March 8-16 break], 20, 27; April 3,10, 17, 24; May 1

D. Mayo (books):

How to Tell What’s True About Statistical Inference, (Cambridge, in progress).

Error and the Growth of Experimental KnowledgeChicago: Chicago University Press, 1996. (Winner of 1998 Lakatos Prize).

Acceptable Evidence: Science and Values in Risk Managementco-edited with Rachelle Hollander, New York: Oxford University Press, 1994.

Aris Spanos (books):

Probability Theory and Statistical Inference, Cambridge, 1999.

Statistical Foundations of Econometric Modeling, Cambridge, 1986.

Joint (books): Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science, D. Mayo & A. Spanos (eds.), Cambridge: Cambridge University Press, 2010. [The book includes both papers and exchanges between Mayo and A. Chalmers, A. Musgrave, P. Achinstein, J. Worrall, C. Glymour, A. Spanos, and joint papers with Mayo and Sir David Cox].

Categories: Announcement, Error Statistics, Statistics | 9 Comments

Your 2014 wishing well….

images-3A reader asks how I would complete the following sentence:
I wish that new articles* written in 2014 would refrain from_______.  

Here are my quick answers, in no special order:
(a) rehearsing the howlers of significance tests and other frequentist statistical methods;

(b) misinterpreting p-values, ignoring discrepancy assessments (and thus committing fallacies of rejection and non-rejection);

(c) confusing an assessment of boosts in belief (or support) in claim H ,with assessing what (if anything) has been done to ensure/increase the severity of the tests H passes;

(d) declaring that “what we really want” are posterior probability assignments in statistical hypotheses without explaining what they would mean, and why we should want them;

(e) promoting the myth that frequentist tests (and estimates) form an inconsistent hybrid of incompatible philosophies (from Fisher and Neyman-Pearson);

(f) presupposing that a relevant assessment of the scientific credentials of research would be an estimate of the percentage of null hypothesis that are “true” (selected from an “urn of nulls”) given they are rejectable with a low p-value in an “up-down” use of tests;

(g) sidestepping the main sources of pseudoscience: insevere tests through interpretational and inferential latitude, and violations of statistical model assumptions.

The  “2014 wishing well” stands ready for your sentence completions.

*The question alluded to articles linked with philosophy & methodology of statistical science.

Categories: Error Statistics, science communication, Statistics | Leave a comment

Error Statistics Philosophy: 2013

metablog old fashion typewriter

I blog ergo I blog

Error Statistics Philosophy: 2013
Organized by Nicole Jinn & Jean Anne Miller* 

January 2013

(1/2) Severity as a ‘Metastatistical’ Assessment
(1/4) Severity Calculator
(1/6) Guest post: Bad Pharma? (S. Senn)
(1/9) RCTs, skeptics, and evidence-based policy
(1/10) James M. Buchanan
(1/11) Aris Spanos: James M. Buchanan: a scholar, teacher and friend
(1/12) Error Statistics Blog: Table of Contents
(1/15) Ontology & Methodology: Second call for Abstracts, Papers
(1/18) New Kvetch/PhilStock
(1/19) Saturday Night Brainstorming and Task Forces: (2013) TFSI on NHST
(1/22) New PhilStock
(1/23) P-values as posterior odds?
(1/26) Coming up: December U-Phil Contributions….
(1/27) U-Phil: S. Fletcher & N.Jinn
(1/30) U-Phil: J. A. Miller: Blogging the SLP

February 2013
(2/2) U-Phil: Ton o’ Bricks
(2/4) January Palindrome Winner
(2/6) Mark Chang (now) gets it right about circularity
(2/8) From Gelman’s blog: philosophy and the practice of Bayesian statistics
(2/9) New kvetch: Filly Fury
(2/10) U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof
(2/11) U-Phil: Mayo’s response to Hennig and Gandenberger
(2/13) Statistics as a Counter to Heavyweights…who wrote this?
(2/16) Fisher and Neyman after anger management?
(2/17) R. A. Fisher: how an outsider revolutionized statistics
(2/20) Fisher: from ‘Two New Properties of Mathematical Likelihood’
(2/23) Stephen Senn: Also Smith and Jones
(2/26) PhilStock: DO < $70
(2/26) Statistically speaking… Continue reading

Categories: blog contents, Error Statistics, Statistics | Leave a comment

Wasserman on Wasserman: Update! December 28, 2013

Professor Larry Wasserman

Professor Larry Wasserman

I had invited Larry to give an update, and I’m delighted that he has! The discussion relates to the last post (by Spanos), which follows upon my deconstruction of Wasserman*. So, for your Saturday night reading pleasure, join me** in reviewing this and the past two blogs and the links within.

“Wasserman on Wasserman: Update! December 28, 2013″

My opinions have shifted a bit.

My reference to Franken’s joke suggested that the usual philosophical 
debates about the foundations of statistics were un-important, much 
like the debate about media bias. I was wrong on both counts.

First, I now think Franken was wrong. CNN and network news have a 
strong liberal bias, especially on economic issues. FOX has an 
obvious right wing, and anti-atheist bias. (At least FOX has some 
libertarians on the payroll.) And this does matter. Because people
 believe what they see on TV and what they read in the NY times. Paul
 Krugman’s socialist bullshit parading as economics has brainwashed 
millions of Americans. So media bias is much more than who makes 
better hummus.

Similarly, the Bayes-Frequentist debate still matters. And people —
including many statisticians — are still confused about the 
distinction. I thought the basic Bayes-Frequentist debate was behind 
us. A year and a half of blogging (as well as reading other blogs) 
convinced me I was wrong here too. And this still does matter. Continue reading

Categories: Error Statistics, frequentist/Bayesian, Statistics, Wasserman | 55 Comments

A. Spanos lecture on “Frequentist Hypothesis Testing”

may-4-8-aris-spanos-e2809contology-methodology-in-statistical-modelinge2809d

Aris Spanos

I attended a lecture by Aris Spanos to his graduate econometrics class here at Va Tech last week[i]. This course, which Spanos teaches every fall, gives a superb illumination of the disparate pieces involved in statistical inference and modeling, and affords clear foundations for how they are linked together. His slides follow the intro section. Some examples with severity assessments are also included.

Frequentist Hypothesis Testing: A Coherent Approach

Aris Spanos

1    Inherent difficulties in learning statistical testing

Statistical testing is arguably  the  most  important, but  also the  most difficult  and  confusing chapter of statistical inference  for several  reasons, including  the following.

(i) The need to introduce numerous new notions, concepts and procedures before one can paint —  even in broad brushes —  a coherent picture  of hypothesis  testing.

(ii) The current textbook discussion of statistical testing is both highly confusing and confused.  There  are several sources of confusion.

  • (a) Testing is conceptually one of the most sophisticated sub-fields of any scientific discipline.
  • (b) Inadequate knowledge by textbook writers who often do not have  the  technical  skills to read  and  understand the  original  sources, and  have to rely on second hand  accounts  of previous  textbook writers that are  often  misleading  or just  outright erroneous.   In most  of these  textbooks hypothesis  testing  is poorly  explained  as  an  idiot’s guide to combining off-the-shelf formulae with statistical tables like the Normal, the Student’s t, the chi-square,  etc., where the underlying  statistical  model that gives rise to the testing procedure  is hidden  in the background.
  • (c)  The  misleading  portrayal of Neyman-Pearson testing  as essentially  decision-theoretic in nature, when in fact the latter has much greater  affinity with the Bayesian rather than the frequentist inference.
  • (d)  A deliberate attempt to distort and  cannibalize  frequentist testing by certain  Bayesian drumbeaters who revel in (unfairly)  maligning frequentist inference in their  attempts to motivate their  preferred view on statistical inference.

(iii) The  discussion of frequentist testing  is rather incomplete  in so far as it has been beleaguered by serious foundational problems since the 1930s. As a result, different applied fields have generated their own secondary  literatures attempting to address  these  problems,  but  often making  things  much  worse!  Indeed,  in some fields like psychology  it has reached the stage where one has to correct the ‘corrections’ of those chastising  the initial  correctors!

In an attempt to alleviate  problem  (i),  the discussion  that follows uses a sketchy historical  development of frequentist testing.  To ameliorate problem (ii), the discussion includes ‘red flag’ pointers (¥) designed to highlight important points that shed light on certain  erroneous  in- terpretations or misleading arguments.  The discussion will pay special attention to (iii), addressing  some of the key foundational problems.

[i] It is based on Ch. 14 of Spanos (1999) Probability Theory and Statistical Inference. Cambridge[ii].

[ii] You can win a free copy of this 700+ page text by creating a simple palindrome! http://errorstatistics.com/palindrome/march-contest/

Categories: Bayesian/frequentist, Error Statistics, Severity, significance tests, Statistics | Tags: | 36 Comments

Surprising Facts about Surprising Facts

Mayo mirror

double-counting

A paper of mine on “double-counting” and novel evidence just came out: “Some surprising facts about (the problem of) surprising facts” in Studies in History and Philosophy of Science (2013), http://dx.doi.org/10.1016/j.shpsa.2013.10.005

ABSTRACT: A common intuition about evidence is that if data x have been used to construct a hypothesis H, then x should not be used again in support of H. It is no surprise that x fits H, if H was deliberately constructed to accord with x. The question of when and why we should avoid such ‘‘double-counting’’ continues to be debated in philosophy and statistics. It arises as a prohibition against data mining, hunting for significance, tuning on the signal, and ad hoc hypotheses, and as a preference for predesignated hypotheses and ‘‘surprising’’ predictions. I have argued that it is the severity or probativeness of the test—or lack of it—that should determine whether a double-use of data is admissible. I examine a number of surprising ambiguities and unexpected facts that continue to bedevil this debate.

Categories: double-counting, Error Statistics, philosophy of science, Statistics | 36 Comments

The error statistician has a complex, messy, subtle, ingenious, piece-meal approach

RMM: "A Conversation Between Sir David Cox & D.G. Mayo"A comment today by Stephen Senn leads me to post the last few sentences of my (2010) paper with David Cox, “Frequentist Statistics as a Theory of Inductive Inference”:

A fundamental tenet of the conception of inductive learning most at home with the frequentist philosophy is that inductive inference requires building up incisive arguments and inferences by putting together several different piece-meal results; we have set out considerations to guide these pieces[i]. Although the complexity of the issues makes it more difficult to set out neatly, as, for example, one could by imagining that a single algorithm encompasses the whole of inductive inference, the payoff is an account that approaches the kind of arguments that scientists build up in order to obtain reliable knowledge and understanding of a field.” (273)[ii]

A reread for Saturday night?

[i]The pieces hang together by dint of the rationale growing out of a severity criterion (or something akin but using a different term.)

[ii]Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 1-27. This paper appeared in The Second Erich L. Lehmann Symposium: Optimality, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics, pp. 247-275.

Categories: Bayesian/frequentist, Error Statistics | 20 Comments

Blog at WordPress.com. The Adventure Journal Theme.

Follow

Get every new post delivered to your Inbox.

Join 410 other followers