Statistics

Phil6334: Feb 24, 2014: Induction, Popper and pseudoscience (Day #4)

Phil 6334* Day #4: Mayo slides follow the comments below. (Make-up for Feb 13 snow day.) Popper reading is from Conjectures and Refutations.

.

.

As is typical in rereading any deep philosopher, I discover (or rediscover) different morsals of clues to understanding—whether fully intended by the philosopher or a byproduct of their other insights, and a more contemporary reading. So it is with Popper. A couple of key ideas to emerge from Monday’s (make-up) class and the seminar discussion (my slides are below):

  1. Unlike the “naïve” empiricists of the day, Popper recognized that observations are not just given unproblematically, but also require an interpretation, an interest, a point of view, a problem. What came first, a hypothesis or an observation? Another hypothesis, if only at a lower level, says Popper.  He draws the contrast with Wittgenstein’s “verificationism”. In typical positivist style, the verificationist sees observations as the given “atoms,” and other knowledge is built up out of truth functional operations on those atoms.[1] However, scientific generalizations beyond the given observations cannot be so deduced, hence the traditional philosophical problem of induction isn’t solvable. One is left trying to build a formal “inductive logic” (generally deductive affairs, ironically) that is thought to capture intuitions about scientific inference (a largely degenerating program). The formal probabilists, as well as philosophical Bayesianism, may be seen as descendants of the logical positivists–instrumentalists, verificationists, operationalists (and the corresponding “isms”). So understanding Popper throws a lot of light on current day philosophy of probability and statistics.
  2. The fact that observations must be interpreted opens the door to interpretations that prejudge the construal of data. With enough interpretive latitude, anything (or practically anything) that is observed can be interpreted as in sync with a general claim H. (Once you opened your eyes, you see confirmations everywhere, as with a gestalt conversion, as Popper put it.) For Popper, positive instances of a general claim H, i.e., observations that agree with or “fit” H, do not even count as evidence for H if virtually any result could be interpreted as according with H.
    Note a modification of Popper here: Instead of putting the “riskiness” on H itself, it is the method of assessment or testing that bears the burden of showing that something (ideally quite a lot) has been done in order to scrutinize the way the data were interpreted (to avoid “verification bias”). The scrutiny needs to ensure that it would be difficult (rather than easy) to get an accordance between data x and H (as strong as the one obtained) if H were false (or specifiably flawed). Continue reading
Categories: Phil 6334 class material, Popper, Statistics | 7 Comments

Winner of the Febrary 2014 palindrome contest (rejected post)

SamHeadWinner of February 2014 Palindrome Contest
Samuel Dickson

Palindrome:
Rot, Cadet A, I’ve droned! Elba, revile deviant, naïve, deliverable den or deviated actor.

The requirement was: A palindrome with Elba plus deviate with an optional second word: deviant. A palindrome that uses both deviate and deviant tops an acceptable palindrome that only uses deviate.

Bio:
Sam Dickson is a regulatory statistician at U.S. Army Medical Research Institute of Infectious Diseases (USAMRIID) with experience in statistical consulting, specializing in design and analysis of biological and genetics/genomics studies.

Statement:
“It’s great to get a  chance to exercise the mind with something other than statistics, though putting words together to make a palindrome is a puzzle very similar to designing an experiment that answers the right question.  Thank you for hosting this contest!”

Choice of book:
Principles of Applied Statistics (D. R. Cox and C. A. Donnelly 2011, Cambridge: Cambridge University Press)

Congratulations, Sam! I hope that your opting to do two words (plus Elba) means we can go back to the tougher standard for palindromes, but I’d just as soon raise the level of competence for several months more (sticking to one word). 

Categories: Announcement, Palindrome, Rejected Posts, Statistics | Leave a comment

STEPHEN SENN: Fisher’s alternative to the alternative

Reblogging 2 years ago:

By: Stephen Senn

This year [2012] marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.

The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading

Categories: Fisher, Statistics, Stephen Senn | Tags: , , , | 31 Comments

R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Exactly 1 year ago: I find this to be an intriguing discussion–before some of the conflicts with N and P erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below.

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

  The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true.

If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

Categories: Fisher, phil/history of stat, Statistics | Tags: , , , | 1 Comment

Aris Spanos: The Enduring Legacy of R. A. Fisher

spanos 2014

More Fisher insights from A. Spanos, this from 2 years ago:

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most re­markable, but least recognized, achievement was to initiate the recast­ing of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the proba­bilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily confined to the description of the distributional features of the data in hand using the histogram and the first few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn). As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Fisher was able to recast statistical inference by turning Karl Pear­son’s approach, proceeding from data x0 in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespecified Mθ(x) (a ‘hypothetical infinite population’), and view x0 as a ‘typical’ realization thereof; see Spanos (1999).

In my mind, Fisher’s most enduring contribution is his devising a general way to ‘operationalize’ errors by embedding the material ex­periment into Mθ(x), and taming errors via probabilification, i.e. to define frequentist error probabilities in the context of a statistical model. These error probabilities are (a) deductively derived from the statistical model, and (b) provide a measure of the ‘effectiviness’ of the inference procedure: how often a certain method will give rise to correct in­ferences concerning the underlying ‘true’ Data Generating Mechanism (DGM). This cast aside the need for a prior. Both of these key elements, the statistical model and the error probabilities, have been refined and extended by Mayo’s error statistical approach (EGEK 1996). Learning from data is achieved when an inference is reached by an inductive procedure which, with high probability, will yield true conclusions from valid inductive premises (a statistical model); Mayo and Spanos (2011). Continue reading

Categories: Fisher, phil/history of stat, Statistics | Tags: , , , , , , | 2 Comments

R. A. Fisher: how an outsider revolutionized statistics

A SPANOSToday is R.A. Fisher’s birthday and I’m reblogging the post by Aris Spanos which, as it happens, received the highest number of views of 2013.

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

Categories: Fisher, phil/history of stat, Spanos, Statistics | 6 Comments

Fisher and Neyman after anger management?

Photo on 2-15-13 at 11.47 PM

Monday is Fisher’s birthday, and to set the stage for some items to appear, I’m posing the anger management question from a year ago post (please also see the comments from then). Here it is:


Would you agree if your (senior) colleague urged you to use his/her book rather than your own –even if you thought doing so would change for the positive the entire history of your field? My guess is that the answer is no (but see “add on”). For that matter, would you ever try to insist that your (junior) colleague use your book in teaching a course rather than his/her own notes or book?  Again I guess no. But perhaps you’d be more tactful than were Fisher and Neyman.

It wasn’t just Fisher who seemed to need some anger management training, Erich Lehmann (in conversation and in 2011) points to a number of incidences wherein Neyman is the instigator of gratuitous ill-will. Their substantive statistical and philosophical disagreements, I now think, were minuscule in comparison to the huge animosity that developed over many years. Here’s how Neyman describes a vivid recollection he has of the 1935 book episode to Constance Reid (1998, 126). [i]

A couple of months “after Neyman criticized Fisher’s concept of the complex experiment” Neyman vividly recollects  Fisher stopping by his office at University College on his way to a meeting which was to decide on Neyman’s reappointment[ii]: Continue reading

Categories: phil/history of stat, Statistics | 9 Comments

Is it true that all epistemic principles can only be defended circularly? A Popperian puzzle

images-8Current day Popperians, the “critical rationalists”, espouse the following epistemic principle CR:[i]

(CR) it is reasonable to adopt or believe a claim or theory P which best survives serious criticism.

What justifies CR?  To merely declare it is a reasonable epistemic principle without giving evidence that following it advances any epistemic goals is entirely unsatisfactory, and decidedly un-Popperian in spirit.

Alan Musgrave (1999), leading critical rationalist, mounts a defence of CR that he openly concedes is circular, admitting, as he does, that such circular defences could likewise be used to argue for principles he himself regards as ‘crazy’.
However, he also gives a subtle and clever argument that it’s impossible to do better, that such a circular defense is the only kind possible. So since we’re reading Popper this week (some of us), and since an analogous argument arises in defending principles of statistical inference, try your hand at this conundrum. Continue reading

Categories: philosophy of science, Popper, Statistics | 2 Comments

Phil 6334: Day #3: Feb 6, 2014

img_1249-e1356389909748

Day #3: Spanos lecture notes 2, and reading/resources from Feb 6 seminar 

6334 Day 3 slides: Spanos-lecture-2

___

Crupi & Tentori (2010). Irrelevant Conjunction: Statement and Solution of a New Paradox, Phil Sci, 77, 1–13.

Hawthorne & Fitelson (2004). Re-Solving Irrelevant Conjunction with Probabilistic Independence, Phil Sci 71: 505–514.

Skryms (1975) Choice and Chance 2nd ed. Chapter V and Carnap (pp. 206-211), Dickerson Pub. Co.

Mayo posts on the tacking paradox: Oct. 25, 2013: “Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*” &  Oct 25.

An update on this issue will appear shortly in a separate blogpost.

_

READING FOR NEXT WEEK
Selection (pp. 35-59) from: Popper (1962). Conjectures and RefutationsThe Growth of Scientific Knowledge. Basic Books. 

Categories: Bayes' Theorem, Phil 6334 class material, Statistics | Leave a comment

“Probabilism as an Obstacle to Statistical Fraud-Busting” (draft iii)

IMG_0244Update: Feb. 21, 2014 (slides at end)Ever find when you begin to “type” a paper to which you gave an off-the-cuff title months and months ago that you scarcely know just what you meant or feel up to writing a paper with that (provocative) title? But then, pecking away at the outline of a possible paper crafted to fit the title, you discover it’s just the paper you’re up to writing right now? That’s how I feel about “Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?” (the impromptu title I gave for my paper for the Boston Colloquium for the Philosophy of Science):

The conference is called: “Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge.”  

 Here are some initial chicken-scratchings (draft (i)). Share comments, queries. (I still have 2 weeks to come up with something*.) Continue reading

Categories: P-values, significance tests, Statistical fraudbusting, Statistics | Leave a comment

Phil 6334: Day #2 Slides

 

Picture 216 1mayo Day #2, Part 1: D. Mayo: 

Class, Part 2: A. Spanos:picture-072-1-1
Probability/Statistics Lecture Notes 1: Introduction to Probability and Statistical Inference

Day #1 slides are here.

Categories: Phil 6334 class material, Philosophy of Statistics, Statistics | 8 Comments

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2013–2014
54th Annual Program

Download the 54th Annual Program

REVISITING THE FOUNDATIONS OF STATISTICS IN THE ERA OF BIG DATA: SCALING UP TO MEET THE CHALLENGE

Cosponsored by the Department of Mathematics & Statistics at Boston University.
Friday, February 21, 2014
10 a.m. – 5:30 p.m.
Photonics Center, 9th Floor Colloquium Room (Rm 906)
8 St. Mary’s Street

10 a.m.–noon

  • Computational Challenges in Genomic Medicine
    Jill Mesirov Computational Biology and Bioinformatics, Broad Institute
  • Selection, Significance, and Signification: Issues in High Energy Physics
    Kent Staley Philosophy, Saint Louis University

1:30–5:30 p.m.

  • Multi-Resolution Inference: An Engineering (Engineered?) Foundation of Statistical Inference
    Xiao-Li Meng Statistics, Harvard University
  • Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?
    Deborah Mayo Philosophy, Virginia Tech
  • Targeted Learning from Big Data
    Mark van der Laan Biostatistics and Statistics, UC Berkeley

Panel Discussion

Boston Colloquium 2013-2014 (3)

Categories: Announcement, philosophy of science, Philosophy of Statistics, Statistical fraudbusting, Statistics | Leave a comment

Phil 6334: Slides from Day #1: Four Waves in Philosophy of Statistics

images-4First installment 6334 syllabus (Mayo and Spanos)
D. Mayo slides from Day #1: Jan 23, 2014

 

I will post seminar slides here (they will generally be ragtag affairs), links to the papers are in the syllabus.

Categories: Phil 6334 class material, Philosophy of Statistics, Statistics | 14 Comments

Phil6334: “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech) UPDATE: JAN 21

FURTHER UPDATED: New course for Spring 2014: Thurs 3:30-6:15 (Randolph 209)

first installment 6334 syllabus_SYLLABUS (first) Phil 6334: Philosophy of Statistical Inference and ModelingPicture 216 1mayo

picture-072-1-1

D. Mayo and A. Spanos

Contact: error@vt.edu

This new course, to be jointly taught by Professors D. Mayo (Philosophy) and A. Spanos (Economics) will provide an introductory, in-depth introduction to graduate level research in philosophy of inductive-statistical inference and probabilistic methods of evidence (a branch of formal epistemology). We explore philosophical problems of confirmation and induction, the philosophy and history of frequentist and Bayesian approaches, and key foundational controversies surrounding tools of statistical data analytics, modeling and hypothesis testing in the natural and social sciences, and in evidence-based policy.

We now have some tentative topics and dates:

 

course flyer pic

1. 1/23 Introduction to the Course: 4 waves of controversy in the philosophy of statistics
2. 1/30 How to tell what’s true about statistical inference: Probabilism, performance and probativeness
3. 2/6 Induction and Confirmation: Formal Epistemology
4. 2/13 Induction, falsification, severe tests: Popper and Beyond
5. 2/20 Statistical models and estimation: the Basics
6. 2/27 Fundamentals of significance tests and severe testing
7. 3/6 Five sigma and the Higgs Boson discovery Is it “bad science”?
SPRING BREAK Statistical Exercises While Sunning
 8. 3/20  Fraudbusting and Scapegoating: Replicability and big data: are most scientific results false?
9. 3/27 How can we test the assumptions of statistical models?
All models are false; no methods are objective: Philosophical problems of misspecification testing: Spanos method
10. 4/3 Fundamentals of Statistical Testing: Family Feuds and 70 years of controversy
11. 4/10 Error Statistical Philosophy: Highly Probable vs Highly Probed
Some howlers of testing
12. 4/17 What ever happened to Bayesian Philosophical Foundations? Dutch books etc. Fundamental of Bayesian statistics
13. 4/24 Bayesian-frequentist reconciliations, unifications, and O-Bayesians
14. 5/1 Overview: Answering the critics: Should statistical philosophy be divorced from methodology?
(15. TBA) Topic to be chosen (Resampling statistics and new journal policies? Likelihood principle)

 Interested in attending? E.R.R.O.R.S.* can fund travel (presumably driving) and provide accommodation for Thurs. night in a conference lodge in Blacksburg for a few people through (or part of)  the semester. If interested, write ASAP for details (with a brief description of your interest and background) to error@vt.edu. (Several people asked about long-distance hook-ups: We will try to provide some sessions by Skype, and will put each of the seminar items here (also check the Phil6334 page on this blog). 

A sample of questions we consider*:

  • What makes an inquiry scientific? objective? When are we warranted in generalizing from data?
  • What is the “traditional problem of induction”?  Is it really insoluble?  Does it matter in practice?
  • What is the role of probability in uncertain inference? (to assign degrees of confirmation or belief? to characterize the reliability of test procedures?) 3P’s: Probabilism, performance and probativeness
  • What is probability? Random variables? Estimates? What is the relevance of long-run error probabilities for inductive inference in science?
  • What did Popper really say about severe testing, induction, falsification? Is it time for a new definition of pseudoscience?
  • Confirmation and falsification: Carnap and Popper, paradoxes of confirmation; contemporary formal epistemology
  • What is the current state of play in the “statistical wars” e.g., between frequentists, likelihoodists, and (subjective vs. “non-subjective”) Bayesians?
  • How should one specify and interpret p-values, type I and II errors, confidence levels?  Can one tell the truth (and avoid fallacies) with statistics? Do the “reformers” themselves need reform?
  • Is it unscientific (ad hoc, degenerating) to use the same data both in constructing and testing hypotheses? When and why?
  • Is it possible to test assumptions of statistical models without circularity?
  • Is the new research on “replicability” well-founded, or an erroneous use of screening statistics for long-run performance?
  • Should randomized studies be the “gold standard” for “evidence-based” science and policy?
  • What’s the problem with big data: cherry-picking, data mining, multiple testing
  • The many faces of Bayesian statistics: Can there be uninformative prior probabilities? (No) Principles of indifference over the years
  • Statistical fraudbusting: psychology, economics, evidence-based policy
  • Applied controversies (selected): Higgs experiments, climate modeling, social psychology, econometric modeling, development economic

D. Mayo (books):

How to Tell What’s True About Statistical Inference, (Cambridge, in progress).

Error and the Growth of Experimental KnowledgeChicago: Chicago University Press, 1996. (Winner of 1998 Lakatos Prize).

Acceptable Evidence: Science and Values in Risk Managementco-edited with Rachelle Hollander, New York: Oxford University Press, 1994.

Aris Spanos (books):

Probability Theory and Statistical Inference, Cambridge, 1999.

Statistical Foundations of Econometric Modeling, Cambridge, 1986.

Joint (books): Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science, D. Mayo & A. Spanos (eds.), Cambridge: Cambridge University Press, 2010. [Intro, Background & Chapter 1. (The book includes both papers and exchanges between Mayo and A. Chalmers, A. Musgrave, P. Achinstein, J. Worrall, C. Glymour, A. Spanos, and joint papers with Mayo and Sir David Cox)].

Categories: Announcement, Error Statistics, Statistics | 5 Comments

Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]

Comedy hour iconYou might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike ….

IMG_1547It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler* in criticizing the use of p-values.

“Did you hear the one about significance testers rejecting H0 because of outcomes H0 didn’t predict?

‘What’s unusual about that?’ you ask?

Well, what’s unusual, is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation.

We can view p-values in terms of rejecting H0, as in the joke: There’s a test statistic D such that H0 is rejected if its observed value d0 reaches or exceeds a cut-off d* where Pr(D > d*; H0) is small, say .025.
           Reject H0 if Pr(D > d0H0) < .025.
The report might be “reject Hat level .025″.
Example:  H0: The mean light deflection effect is 0. So if we observe a 1.96 standard deviation difference (in one-sided Normal testing) we’d reject H0 .

Now it’s true that if the observation were further into the rejection region, say 2, 3 or 4 standard deviations, it too would result in rejecting the null, and with an even smaller p-value. It’s also true that H0 “has not predicted” a 2, 3, 4, 5 etc. standard deviation difference in the sense that differences so large are “far from” or improbable under the null. But wait a minute. What if we’ve only observed a 1 standard deviation difference (p-value = .16)? It is unfair to count it against the null that 1.96, 2, 3, 4 etc. standard deviation differences would have diverged seriously from the null, when we’ve only observed the 1 standard deviation difference. Yet the p-value tells you to compute Pr(D > 1; H0), which includes these more extreme outcomes! This is “a remarkable procedure” indeed! [i]

So much for making out the howler. The only problem is that significance tests do not do this, that is, they do not reject with, say, D = 1 because larger D values might have occurred (but did not). D = 1 does not reach the cut-off, and does not lead to rejecting H0. Moreover, looking at the tail area makes it harder, not easier, to reject the null (although this isn’t the only function of the tail area): since it requires not merely that Pr(D = d0 ; H0 ) be small, but that Pr(D > d0 ; H0 ) be small. And this is well justified because when this probability is not small, you should not regard it as evidence of discrepancy from the null. Before getting to this …. Continue reading

Categories: Comedy, Fisher, Jeffreys, P-values, Statistics, Stephen Senn | 12 Comments

Objective/subjective, dirty hands and all that: Gelman/ Wasserman blogolog (ii)

Objectivity #2: The “Dirty Hands” Argument for Ethics in EvidenceAndrew Gelman says that as a philosopher, I should appreciate his blog today in which he records his frustration: “Against aggressive definitions: No, I don’t think it helps to describe Bayes as ‘the analysis of subjective beliefs’…”  Gelman writes:

I get frustrated with what might be called “aggressive definitions,” where people use a restrictive definition of something they don’t like. For example, Larry Wasserman writes (as reported by Deborah Mayo):

“I wish people were clearer about what Bayes is/is not and what 
frequentist inference is/is not. Bayes is the analysis of subjective
 beliefs but provides no frequency guarantees. Frequentist inference 
is about making procedures that have frequency guarantees but makes no 
pretense of representing anyone’s beliefs.”

I’ll accept Larry’s definition of frequentist inference. But as for his definition of Bayesian inference: No no no no no. The probabilities we use in our Bayesian inference are not subjective, or, they’re no more subjective than the logistic regressions and normal distributions and Poisson distributions and so forth that fill up all the textbooks on frequentist inference.

To quickly record some of my own frustrations:*: First, I would disagree with Wasserman’s characterization of frequentist inference, but as is clear from Larry’s comments to (my reaction to him), I think he concurs that he was just giving a broad contrast. Please see Note [1] for a remark from my post: Comments on Wasserman’s “what is Bayesian/frequentist inference?” Also relevant is a Gelman post on the Bayesian name: [2].

Second, Gelman’s “no more subjective than…” evokes  remarks I’ve made before. For example, in “What should philosophers of science do…” I wrote:

Arguments given for some very popular slogans (mostly by non-philosophers), are too readily taken on faith as canon by others, and are repeated as gospel. Examples are easily found: all models are false, no models are falsifiable, everything is subjective, or equally subjective and objective, and the only properly epistemological use of probability is to supply posterior probabilities for quantifying actual or rational degrees of belief. Then there is the cluster of “howlers” allegedly committed by frequentist error statistical methods repeated verbatim (discussed on this blog).

I’ve written a lot about objectivity on this blog, e.g., here, here and here (and in real life), but what’s the point if people just rehearse the “everything is a mixture…” line, without making deeply important distinctions? I really think that, next to the “all models are false” slogan, the most confusion has been engendered by the “no methods are objective” slogan. However much we may aim at objective constraints, it is often urged, we can never have “clean hands” free of the influence of beliefs and interests, and we invariably sully methods of inquiry by the entry of background beliefs and personal judgments in their specification and interpretation. Continue reading

Categories: Bayesian/frequentist, Error Statistics, Gelman, Objectivity, Statistics | 41 Comments

“Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech)

New course for Spring 2014: Thursday 3:30-6:15

picture-072-1-1Phil 6334: Philosophy of Statistical Inference and ModelingPicture 216 1mayo

D. Mayo and A. Spanos

Contact: error@vt.edu

This new course, to be jointly taught by Professors D. Mayo (Philosophy) and A. Spanos (Economics) will provide an introductory, in-depth introduction to graduate level research in philosophy of inductive-statistical inference and probabilistic methods of evidence (a branch of formal epistemology). We explore philosophical problems of confirmation and induction, the philosophy and history of frequentist and Bayesian approaches, and key foundational controversies surrounding tools of statistical data analytics, modeling and hypothesis testing in the natural and social sciences, and in evidence-based policy.

course flyer pic

A sample of questions we consider*:

  • What makes an inquiry scientific? objective? When are we warranted in generalizing from data?
  • What is the “traditional problem of induction”?  Is it really insoluble?  Does it matter in practice?
  • What is the role of probability in uncertain inference? (to assign degrees of confirmation or belief? to characterize the reliability of test procedures?) 3P’s: Probabilism, performance and probativeness
  • What is probability? Random variables? Estimates? What is the relevance of long-run error probabilities for inductive inference in science?
  • What did Popper really say about severe testing, induction, falsification? Is it time for a new definition of pseudoscience?
  • Confirmation and falsification: Carnap and Popper, paradoxes of confirmation; contemporary formal epistemology
  • What is the current state of play in the “statistical wars” e.g., between frequentists, likelihoodists, and (subjective vs. “non-subjective”) Bayesians?
  • How should one specify and interpret p-values, type I and II errors, confidence levels?  Can one tell the truth (and avoid fallacies) with statistics? Do the “reformers” themselves need reform?
  • Is it unscientific (ad hoc, degenerating) to use the same data both in constructing and testing hypotheses? When and why?
  • Is it possible to test assumptions of statistical models without circularity?
  • Is the new research on “replicability” well-founded, or an erroneous use of screening statistics for long-run performance?
  • Should randomized studies be the “gold standard” for “evidence-based” science and policy?
  • What’s the problem with big data: cherry-picking, data mining, multiple testing
  • The many faces of Bayesian statistics: Can there be uninformative prior probabilities? (No) Principles of indifference over the years
  • Statistical fraudbusting: psychology, economics, evidence-based policy
  • Applied controversies (selected): Higgs experiments, climate modeling, social psychology, econometric modeling, development economic

Interested in attending? E.R.R.O.R.S.* can fund travel (presumably driving) and provide lodging for Thurs. night in a conference lodge in Blacksburg for a few people through (or part of)  the semester. Topics will be posted over the next week, but if you might be interested, write ASAP for details (with a brief description of your interest and background) to error@vt.edu. 

*This course will be a brand new version of related seminar we’ve led in the past, so we don’t have the syllabus set yet. We’re going to try something different this time. I’ll be updating in subsequent installments to the blog.

Dates: January 23, 30; February 6, 13, 20, 27; March 6, [March 8-16 break], 20, 27; April 3,10, 17, 24; May 1

D. Mayo (books):

How to Tell What’s True About Statistical Inference, (Cambridge, in progress).

Error and the Growth of Experimental KnowledgeChicago: Chicago University Press, 1996. (Winner of 1998 Lakatos Prize).

Acceptable Evidence: Science and Values in Risk Managementco-edited with Rachelle Hollander, New York: Oxford University Press, 1994.

Aris Spanos (books):

Probability Theory and Statistical Inference, Cambridge, 1999.

Statistical Foundations of Econometric Modeling, Cambridge, 1986.

Joint (books): Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science, D. Mayo & A. Spanos (eds.), Cambridge: Cambridge University Press, 2010. [Intro, Background & Chapter 1. (The book includes both papers and exchanges between Mayo and A. Chalmers, A. Musgrave, P. Achinstein, J. Worrall, C. Glymour, A. Spanos, and joint papers with Mayo and Sir David Cox)].

Categories: Announcement, Error Statistics, Statistics | 9 Comments

Your 2014 wishing well….

images-3A reader asks how I would complete the following sentence:
I wish that new articles* written in 2014 would refrain from_______.  

Here are my quick answers, in no special order:
(a) rehearsing the howlers of significance tests and other frequentist statistical methods;

(b) misinterpreting p-values, ignoring discrepancy assessments (and thus committing fallacies of rejection and non-rejection);

(c) confusing an assessment of boosts in belief (or support) in claim H ,with assessing what (if anything) has been done to ensure/increase the severity of the tests H passes;

(d) declaring that “what we really want” are posterior probability assignments in statistical hypotheses without explaining what they would mean, and why we should want them;

(e) promoting the myth that frequentist tests (and estimates) form an inconsistent hybrid of incompatible philosophies (from Fisher and Neyman-Pearson);

(f) presupposing that a relevant assessment of the scientific credentials of research would be an estimate of the percentage of null hypothesis that are “true” (selected from an “urn of nulls”) given they are rejectable with a low p-value in an “up-down” use of tests;

(g) sidestepping the main sources of pseudoscience: insevere tests through interpretational and inferential latitude, and violations of statistical model assumptions.

The  “2014 wishing well” stands ready for your sentence completions.

*The question alluded to articles linked with philosophy & methodology of statistical science.

Categories: Error Statistics, science communication, Statistics | Leave a comment

Error Statistics Philosophy: 2013

metablog old fashion typewriter

I blog ergo I blog

Error Statistics Philosophy: 2013
Organized by Nicole Jinn & Jean Anne Miller* 

January 2013

(1/2) Severity as a ‘Metastatistical’ Assessment
(1/4) Severity Calculator
(1/6) Guest post: Bad Pharma? (S. Senn)
(1/9) RCTs, skeptics, and evidence-based policy
(1/10) James M. Buchanan
(1/11) Aris Spanos: James M. Buchanan: a scholar, teacher and friend
(1/12) Error Statistics Blog: Table of Contents
(1/15) Ontology & Methodology: Second call for Abstracts, Papers
(1/18) New Kvetch/PhilStock
(1/19) Saturday Night Brainstorming and Task Forces: (2013) TFSI on NHST
(1/22) New PhilStock
(1/23) P-values as posterior odds?
(1/26) Coming up: December U-Phil Contributions….
(1/27) U-Phil: S. Fletcher & N.Jinn
(1/30) U-Phil: J. A. Miller: Blogging the SLP

February 2013
(2/2) U-Phil: Ton o’ Bricks
(2/4) January Palindrome Winner
(2/6) Mark Chang (now) gets it right about circularity
(2/8) From Gelman’s blog: philosophy and the practice of Bayesian statistics
(2/9) New kvetch: Filly Fury
(2/10) U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof
(2/11) U-Phil: Mayo’s response to Hennig and Gandenberger
(2/13) Statistics as a Counter to Heavyweights…who wrote this?
(2/16) Fisher and Neyman after anger management?
(2/17) R. A. Fisher: how an outsider revolutionized statistics
(2/20) Fisher: from ‘Two New Properties of Mathematical Likelihood’
(2/23) Stephen Senn: Also Smith and Jones
(2/26) PhilStock: DO < $70
(2/26) Statistically speaking… Continue reading

Categories: blog contents, Error Statistics, Statistics | Leave a comment

Wasserman on Wasserman: Update! December 28, 2013

Professor Larry Wasserman

Professor Larry Wasserman

I had invited Larry to give an update, and I’m delighted that he has! The discussion relates to the last post (by Spanos), which follows upon my deconstruction of Wasserman*. So, for your Saturday night reading pleasure, join me** in reviewing this and the past two blogs and the links within.

“Wasserman on Wasserman: Update! December 28, 2013”

My opinions have shifted a bit.

My reference to Franken’s joke suggested that the usual philosophical 
debates about the foundations of statistics were un-important, much 
like the debate about media bias. I was wrong on both counts.

First, I now think Franken was wrong. CNN and network news have a 
strong liberal bias, especially on economic issues. FOX has an 
obvious right wing, and anti-atheist bias. (At least FOX has some 
libertarians on the payroll.) And this does matter. Because people
 believe what they see on TV and what they read in the NY times. Paul
 Krugman’s socialist bullshit parading as economics has brainwashed 
millions of Americans. So media bias is much more than who makes 
better hummus.

Similarly, the Bayes-Frequentist debate still matters. And people —
including many statisticians — are still confused about the 
distinction. I thought the basic Bayes-Frequentist debate was behind 
us. A year and a half of blogging (as well as reading other blogs) 
convinced me I was wrong here too. And this still does matter. Continue reading

Categories: Error Statistics, frequentist/Bayesian, Statistics, Wasserman | 57 Comments

Blog at WordPress.com.