Philosophy of Statistics

“Statistical Science and Philosophy of Science: where should they meet?”

img_1142

Four score years ago (!) we held the conference “Statistical Science and Philosophy of Science: Where Do (Should) They meet?” at the London School of Economics, Center for the Philosophy of Natural and Social Science, CPNSS, where I’m visiting professor [1] Many of the discussions on this blog grew out of contributions from the conference, and conversations initiated soon after. The conference site is here; my paper on the general question is here.[2]

My main contribution was “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. It begins like this: 

1. Comedy Hour at the Bayesian Retreat[3]

 Overheard at the comedy hour at the Bayesian retreat: Did you hear the one about the frequentist… Continue reading

Categories: Error Statistics, Philosophy of Statistics, Severity, Statistics, StatSci meets PhilSci | 23 Comments

Allan Birnbaum, Philosophical Error Statistician: 27 May 1923 – 1 July 1976

27 May 1923-   1 July 1976

Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference” is in Breakthroughs in Statistics (volume I 1993).  I’ve a hunch that Birnbaum would have liked my rejoinder to discussants of my forthcoming paper (Statistical Science): Bjornstad, Dawid, Evans, Fraser, Hannig, and Martin and Liu. I hadn’t realized until recently that all of this is up under “future papers” here [1]. You can find the rejoinder: STS1404-004RA0-2. That takes away some of the surprise of having it all come out at once (and in final form). For those unfamiliar with the argument, at the end of this entry are slides from a recent, entirely informal, talk that I never posted, as well as some links from this blog. Happy Birthday Birnbaum! Continue reading

Categories: Birnbaum, Birnbaum Brakes, Likelihood Principle, Statistics | Leave a comment

Deconstructing Andrew Gelman: “A Bayesian wants everybody else to be a non-Bayesian.”

At the start of our seminar, I said that “on weekends this spring (in connection with Phil 6334, but not limited to seminar participants) I will post some of my ‘deconstructions of articles”. I began with Andrew Gelman‘s note  “Ethics and the statistical use of prior information”[i], but never posted my deconstruction of it. So since it’s Saturday night, and the seminar is just ending, here it is, along with related links to Stat and ESP research (including me, Jack Good, Persi Diaconis and Pat Suppes). Please share comments especially in relation to current day ESP research. Continue reading

Categories: Background knowledge, Gelman, Phil6334, Statistics | 35 Comments

Severe osteometric probing of skeletal remains: John Byrd

images-3John E. Byrd, Ph.D. D-ABFA

Central Identification Laboratory
JPAC

Guest, March 27, PHil 6334

“Statistical Considerations of the Histomorphometric Test Protocol for Determination of Human Origin of Skeletal Remains”

 By:
Byrd 1John E. Byrd, Ph.D. D-ABFA
Maria-Teresa Tersigni-Tarrant, Ph.D.
Central Identification Laboratory
JPAC

Categories: Phil6334, Philosophy of Statistics, Statistics | 1 Comment

The Unexpected Way Philosophy Majors Are Changing The World Of Business

 

Philosopher

Philosopher

“Philosophy majors rule” according to this recent article. We philosophers should be getting the word out. Admittedly, the type of people inclined to do well in philosophy are already likely to succeed in analytic areas. Coupled with the chuzpah of taking up an “outmoded and impractical” major like philosophy in the first place, innovative tendencies are not surprising.  But can the study of philosophy also promote these capacities? I think it can and does; yet it could be far more effective than it is, if it was less hermetic and more engaged with problem-solving across the landscape of science,statistics,law,medicine,and evidence-based policy. Here’s the article: Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics | 1 Comment

Significance tests and frequentist principles of evidence: Phil6334 Day #6

picture-216-1Slides (2 sets) from Phil 6334 2/27/14 class (Day#6).

spanos

D. Mayo:
“Frequentist Statistics as a Theory of Inductive Inference”

A. Spanos
“Probability/Statistics Lecture Notes 4: Hypothesis Testing”

Categories: P-values, Phil 6334 class material, Philosophy of Statistics, Statistics | Tags: | Leave a comment

Phil 6334: February 20, 2014 (Spanos): Day #5

may-4-8-aris-spanos-e2809contology-methodology-in-statistical-modelinge2809dPHIL 6334 – “Probability/Statistics Lecture Notes 3 for 2/20/14: Estimation (Point and Interval)”:(Prof. Spanos)*

*This is Day #5 on the Syllabus, as Day #4 had to be made up (Feb 24, 2014) due to snow. Slides for Day #4 will go up Feb. 26, 2014. (See the revised Syllabus Second Installment.)

Categories: Phil6334, Philosophy of Statistics, Spanos | 5 Comments

Phil 6334: Day #2 Slides

 

Picture 216 1mayo Day #2, Part 1: D. Mayo: 

Class, Part 2: A. Spanos:picture-072-1-1
Probability/Statistics Lecture Notes 1: Introduction to Probability and Statistical Inference

Day #1 slides are here.

Categories: Phil 6334 class material, Philosophy of Statistics, Statistics | 8 Comments

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2013–2014
54th Annual Program

Download the 54th Annual Program

REVISITING THE FOUNDATIONS OF STATISTICS IN THE ERA OF BIG DATA: SCALING UP TO MEET THE CHALLENGE

Cosponsored by the Department of Mathematics & Statistics at Boston University.
Friday, February 21, 2014
10 a.m. – 5:30 p.m.
Photonics Center, 9th Floor Colloquium Room (Rm 906)
8 St. Mary’s Street

10 a.m.–noon

  • Computational Challenges in Genomic Medicine
    Jill Mesirov Computational Biology and Bioinformatics, Broad Institute
  • Selection, Significance, and Signification: Issues in High Energy Physics
    Kent Staley Philosophy, Saint Louis University

1:30–5:30 p.m.

  • Multi-Resolution Inference: An Engineering (Engineered?) Foundation of Statistical Inference
    Xiao-Li Meng Statistics, Harvard University
  • Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?
    Deborah Mayo Philosophy, Virginia Tech
  • Targeted Learning from Big Data
    Mark van der Laan Biostatistics and Statistics, UC Berkeley

Panel Discussion

Boston Colloquium 2013-2014 (3)

Categories: Announcement, philosophy of science, Philosophy of Statistics, Statistical fraudbusting, Statistics | Leave a comment

U-Phil (Phil 6334) How should “prior information” enter in statistical inference?

On weekends this spring (in connection with Phil 6334, but not limited to seminar participants) I will post relevant “comedy hours”, invites to analyze short papers or blogs (“U-Phils”, as in “U-philosophize”), and some of my “deconstructions” of articles. To begin with a “U-Phil”, consider a note by Andrew Gelman: “Ethics and the statistical use of prior information,”[i].

RMM: "A Conversation Between Sir David Cox & D.G. Mayo"I invite you to send (to error@vt.edu) informal analyses (“U-Phil”, ~500-750 words) by February 10) [iv]. Indicate if you want your remarks considered for possible posting on this blog.

Writing philosophy differs from other types of writing: Some links to earlier U-Phils are here. Also relevant is this note: “So you want to do a philosophical analysis?”

U-Phil (2/10/14): In section 3 Gelman comments on some of David Cox’s remarks in a (highly informal and non-scripted) conversation we recorded:

 A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo,” published in Rationality, Markets and Morals [iii] (Section 2 has some remarks on Larry Wasserman, by the way.)

Here’s the relevant portion of the conversation:

COX: Deborah, in some fields foundations do not seem very important, but we both think foundations of statistical inference are important; why do you think that is?

MAYO: I think because they ask about fundamental questions of evidence, inference, and probability. I don’t think that foundations of different fields are all alike; because in statistics we’re so intimately connected to the scientific interest in learning about the world, we invariably cross into philosophical questions about empirical knowledge and inductive inference.

COX: One aspect of it is that it forces us to say what it is that we really want to know when we analyze a situation statistically. Do we want to put in a lot of information external to the data, or as little as possible. It forces us to think about questions of that sort.

MAYO: But key questions, I think, are not so much a matter of putting in a lot or a little information. …What matters is the kind of information, and how to use it to learn. This gets to the question of how we manage to be so successful in learning about the world, despite knowledge gaps, uncertainties and errors. To me that’s one of the deepest questions and it’s the main one I care about. I don’t think a (deductive) Bayesian computation can adequately answer it.…..

COX: There’s a lot of talk about what used to be called inverse probability and is now called Bayesian theory. That represents at least two extremely different approaches. How do you see the two? Do you see them as part of a single whole? Or as very different? Continue reading

Categories: Background knowledge, Philosophy of Statistics, U-Phil | Tags: , | 2 Comments

Phil 6334: Slides from Day #1: Four Waves in Philosophy of Statistics

images-4First installment 6334 syllabus (Mayo and Spanos)
D. Mayo slides from Day #1: Jan 23, 2014

 

I will post seminar slides here (they will generally be ragtag affairs), links to the papers are in the syllabus.

Categories: Phil 6334 class material, Philosophy of Statistics, Statistics | 14 Comments

More on deconstructing Larry Wasserman (Aris Spanos)

This follows up on yesterday’s deconstruction:


 Aris Spanos (2012)[i] – Comments on: L. Wasserman “Low Assumptions, High Dimensions (2011)*

I’m happy to play devil’s advocate in commenting on Larry’s very interesting and provocative (in a good way) paper on ‘how recent developments in statistical modeling and inference have [a] changed the intended scope of data analysis, and [b] raised new foundational issues that rendered the ‘older’ foundational problems more or less irrelevant’.

The new intended scope, ‘low assumptions, high dimensions’, is delimited by three characteristics:

“1. The number of parameters is larger than the number of data points.

2. Data can be numbers, images, text, video, manifolds, geometric objects, etc.

3. The model is always wrong. We use models, and they lead to useful insights but the parameters in the model are not meaningful.” (p. 1)

In the discussion that follows I focus almost exclusively on the ‘low assumptions’ component of the new paradigm. The discussion by David F. Hendry (2011), “Empirical Economic Model Discovery and Theory Evaluation,” RMM, 2: 115-145,  is particularly relevant to some of the issues raised by the ‘high dimensions’ component in a way that complements the discussion that follows.

My immediate reaction to the demarcation based on 1-3 is that the new intended scope, although interesting in itself, excludes the overwhelming majority of scientific fields where restriction 3 seems unduly limiting. In my own field of economics the substantive information comes primarily in the form of substantively specified mechanisms (structural models), accompanied with theory-restricted and substantively meaningful parameters.

In addition, I consider the assertion “the model is always wrong” an unhelpful truism when ‘wrong’ is used in the sense that “the model is not an exact picture of the ‘reality’ it aims to capture”. Worse, if ‘wrong’ refers to ‘the data in question could not have been generated by the assumed model’, then any inference based on such a model will be dubious at best! Continue reading

Categories: Philosophy of Statistics, Spanos, Statistics, U-Phil, Wasserman | Tags: , , , , | 5 Comments

Deconstructing Larry Wasserman

 Greek dancing lady gold SavoyLarry Wasserman (“Normal Deviate”) has announced he will stop blogging (for now at least). That means we’re losing one of the wisest blog-voices on issues relevant to statistical foundations (among many other areas in statistics). Whether this lures him back or reaffirms his decision to stay away, I thought I’d reblog my (2012) “deconstruction” of him (in relation to a paper linked below)[i]

Deconstructing Larry Wasserman [i] by D. Mayo

The temptation is strong, but I shall refrain from using the whole post to deconstruct Al Franken’s 2003 quip about media bias (from Lies and Lying Liars Who Tell Them: A Fair and Balanced Look at the Right), with which Larry Wasserman begins his paper “Low Assumptions, High Dimensions” (2011) in his contribution to Rationality, Markets and Morals (RMM) Special Topic: Statistical Science and Philosophy of Science:

Wasserman: There is a joke about media bias from the comedian Al Franken:
‘To make the argument that the media has a left- or right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?’

According to Wasserman, “a similar comment could be applied to the usual debates in the foundations of statistical inference.”

Although it’s not altogether clear what Wasserman means by his analogy with comedian (now senator) Franken, it’s clear enough what Franken meant if we follow up the quip with the next sentence in his text (which Wasserman omits): “The problem with al Qaeda is that they’re trying to kill us!” (p. 1). The rest of Franken’s opening chapter is not about al Qaeda but about bias in media. Conservatives, he says, decry what they claim is a liberal bias in mainstream media. Franken rejects their claim.

The mainstream media does not have a liberal bias. And for all their other biases . . . , the mainstream media . . . at least try to be fair. …There is, however, a right-wing media. . . . They are biased. And they have an agenda…The members of the right-wing media are not interested in conveying the truth… . They are an indispensable component of the right-wing machine that has taken over our country… .   We have to be vigilant.  And we have to be more than vigilant.  We have to fight back… . Let’s call them what they are: liars. Lying, lying, liars. (Franken, pp. 3-4)

When I read this in 2004 (when Bush was in office), I couldn’t have agreed more. How things change*. Now, of course, any argument that swerves from the politically correct is by definition unsound, irrelevant, and/ or biased. [ii]

But what does this have to do with Bayesian-frequentist foundations? What is Wasserman, deep down, really trying to tell us by way of this analogy (if only subliminally)? Such are my ponderings—and thus this deconstruction.  (I will invite your “U-Phils” at the end[a].) I will allude to passages from my contribution to  RMM (2011)  http://www.rmm-journal.de/htdocs/st01.html  (in red).

A.What Is the Foundational Issue?

Wasserman: To me, the most pressing foundational question is: how do we reconcile the two most powerful needs in modern statistics: the need to make methods assumption free and the need to make methods work in high dimensions… . The Bayes-Frequentist debate is not irrelevant but it is not as central as it once was. (p. 201)

One may wonder why he calls this a foundational issue, as opposed to, say, a technical one. I will assume he means what he says and attempt to extract his meaning by looking through a foundational lens. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , | 6 Comments

Surprising Facts about Surprising Facts

Mayo mirror

double-counting

A paper of mine on “double-counting” and novel evidence just came out: “Some surprising facts about (the problem of) surprising facts” in Studies in History and Philosophy of Science (2013), http://dx.doi.org/10.1016/j.shpsa.2013.10.005

ABSTRACT: A common intuition about evidence is that if data x have been used to construct a hypothesis H, then x should not be used again in support of H. It is no surprise that x fits H, if H was deliberately constructed to accord with x. The question of when and why we should avoid such ‘‘double-counting’’ continues to be debated in philosophy and statistics. It arises as a prohibition against data mining, hunting for significance, tuning on the signal, and ad hoc hypotheses, and as a preference for predesignated hypotheses and ‘‘surprising’’ predictions. I have argued that it is the severity or probativeness of the test—or lack of it—that should determine whether a double-use of data is admissible. I examine a number of surprising ambiguities and unexpected facts that continue to bedevil this debate.

Categories: double-counting, Error Statistics, philosophy of science, Statistics | 36 Comments

Oxford Gaol: Statistical Bogeymen

Memory Lane: 2 years ago. Oxford Jail (also called Oxford Castle) is an entirely fitting place to be on (and around) Halloween! Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba! (I’m serious, it is now a boutique hotel.)  My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should I think be shelved, not that names should matter.

In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory.  Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended.   But for (most) Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort)Unknown-2

Criticisms then follow readily: the form of one or both:

  • Error probabilities do not supply posterior probabilities in hypotheses, interpreted as if they do (and some say we just can’t help it), they lead to inconsistencies
  • Methods with good long-run error rates can give rise to counterintuitive inferences in particular cases.
  • I have proposed an alternative philosophy that replaces these tenets with different ones:
  • the role of probability in inference is to quantify how reliably or severely claims (or discrepancies from claims) have been tested
  • the severity goal directs us to the relevant error probabilities, avoiding the oft-repeated statistical fallacies due to tests that are overly sensitive, as well as those insufficiently sensitive to particular errors.
  • Control of long run error probabilities, while necessary is not sufficient for good tests or warranted inferences.

What is key on the statistics side of this alternative philosophy is that the probabilities refer to the distribution of a statistic d(x)—the so-called sampling distribution.  Hence such accounts are often called sampling theory accounts. Since the sampling distribution is the basis for error probabilities, another term might be error statistical. Continue reading

Categories: Philosophy of Statistics | Tags: , | Leave a comment

Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*

images-2*addition of note [2].

A long-running research program in philosophy is to seek a quantitative measure

C(h,x)

to capture intuitive ideas about “confirmation” and about “confirmational relevance”. The components of C(h,x) are allowed to be any statements, no reference to a probability model or to joint distributions are required. Then h is “confirmed” or supported by x if P(h|x) > P(h), disconfirmed (or undermined) if P(h|x) < P(h), (else x is confirmationally irrelevant to h). This is the generally accepted view of philosophers of confirmation (or Bayesian formal epistemologists) up to the present. There is generally a background “k” included, but to avoid a blinding mass of symbols I omit it. (We are rarely told how to get the probabilities anyway; but I’m going to leave that to one side, as it will not really matter here.)

A test of any purported philosophical confirmation theory is whether it elucidates or is even in sync with intuitive methodological principles about evidence or testing. One of the first problems that arises stems from asking…

Is Probability a good measure of confirmation?

A natural move then would be to identify the degree of confirmation of h by x with probability P(h|x), (which philosophers sometimes write as P(h,x)). Statement x affords hypothesis h higher confirmation than it does h’ iff P(h|x) > P(h’|x).

Some puzzles immediately arise. Hypothesis h can be confirmed by x, while h’ disconfirmed by x, and yet P(h|x) < P(h’|x).  In other words, we can have P(h|x) > P(h) and P(h’|x) < P(h’) and yet P(h|x) < P(h’|x).

Popper (The Logic of Scientific Discovery, 1959, 390) gives this example, (I quote from him, only changing symbols slightly):

Consider the next toss with a homogeneous die.

h: 6 will turn up

h’: 6 will not turn up

x: an even number will turn up.

P(h) = 1/6, p(h’) = 5/6 P(x) = ½

The probability of h is raised by information x, while h’ is undermined by x. (It’s probability goes from 5/6 to 4/6.) If we identify probability with degree of confirmation, x confirms h and disconfirms h’ (i.e., P(h|x) >P(h) and P(h’|x) < P(h’)). Yet because P(h|x) < P(h’|x), h is less well confirmed given x than is h’.  (This happens because P(h) is sufficiently low.) So P(h|x) cannot just be identified with the degree of confirmation that x affords h.

Note, these are not real statistical hypotheses but statements of events.

Obviously there needs to be a way to distinguish between some absolute confirmation for h, and a relative measure of how much it has increased due to x. From the start, Rudolf Carnap noted that “the verb ‘to confirm’ is ambiguous” but thought it had “the connotation of ‘making firmer’ even more often than that of ‘making firm’.” (Carnap, Logical Foundations of Probability (2nd), xviii ). x can increase the firmness of h, but C(h,x) < C(~h,x) (h is more firm, given x, than is ~h). Like Carnap, it’s the ‘making firmer’ that is generally assumed in Bayesian confirmation theory.

But there are many different measures of making firmer (Popper, Carnap, Fitelson).  Referring to Popper’s example, we can report the ratio R: P(h|x)/P(h) = 2.

(In this case h’ = ~h).

Or we use the likelihood ratio LR: P(x|h)/P(x|~h) = (1/.4) = 2.5.

Many other ways of measuring the increase in confirmation x affords h could do as well. But what shall we say about the numbers like 2, 2.5? Do they mean the same thing in different contexts? What happens if we get beyond toy examples to scientific hypotheses where ~h would allude to all possible theories not yet thought of. What’s P(x|~h) where ~h is “the catchall” hypothesis asserting “something else”? (see, for example, Mayo 1997)

Perhaps this point won’t prevent confirmation logics from accomplishing the role of capturing and justifying intuitions about confirmation. So let’s consider the value of confirmation theories to that role.  One of the early leaders of philosophical Bayesian confirmation, Peter Achinstein (2001), began to have doubts about the value of the philosopher’s a priori project.  He even claims, rather provocatively, that “scientists do not and should not take … philosophical accounts of evidence seriously” (p. 9) because they give us formal syntactical (context –free) measures; whereas, scientists look to empirical grounds for confirmation. Philosophical accounts, moreover, make it too easy to confirm. He rejects confirmation as increased firmness, denying it is either necessary or sufficient for evidence. As far as making it too easy to get confirmation, there is the classic problem: it appears we can get everything to confirm everything, so long as one thing is confirmed. This is a famous argument due to Glymour (1980).

Paradox of irrelevant conjunctions

We now switch to emphasizing that the hypotheses may be statistical hypotheses or substantive theories. Both for this reason and because I think they look better, I move away from Popper and Carnap’s lower case letters for hypotheses.

The problem of irrelevant conjunctions (the “tacking paradox”) is this: If x confirms H, then x also confirms (H & J), even if hypothesis J is just “tacked on” to H. As with most of these chestnuts, there is a long history (e.g., Earman 1992, Rosenkrantz 1977), but consider just a leading contemporary representative, Branden Fitelson. Fitelson has importantly emphasized how many different C functions there are for capturing “makes firm”.  Fitelson defines:

J is an irrelevant conjunct to H, with respect to x just in case P(x|H) = P(x|J & H).

For instance, x might be radioastronomic data in support of:

H: the deflection of light effect (due to gravity) is as stipulated in the General Theory of Relativity (GTR), 1.75” at the limb of the sun.

and the irrelevant conjunct:

J: the radioactivity of the Fukushima water being dumped in the Pacific ocean is within acceptable levels.

(1)   Bayesian (Confirmation) Conjunction: If x Bayesian confirms H, then x Bayesian-confirms (H & J), where P(x| H & J ) = P(x|H) for any J consistent with H.

The reasoning is as follows:

P(x|H) /P(x) > 1     (x Bayesian confirms H)

P(x|H & J) = P(x|H)  (given)

So [P(x|H & J) /P(x)]> 1

Therefore x Bayesian confirms (H & J)

However, it is also plausible to hold :

(2) Entailment condition: If x confirms T, and T entails J, then x confirms J.

 In particular, if x confirms (H & J), then x confirms J.

(3)   From (1) and (2) , if x confirms H, then x confirms J  for any irrelevant J consistent with H.

(Assume neither H nor J have probabilities 0 or 1).

It follows that if x confirms any H, then x confirms any J.

Branden Fitelson’s solution

Fitelson (2002), and Fitelson and Hawthorne (2004) offer this “solution”: He will allow that x confirms (H & J), but deny the entailment condition. So, in particular, x confirms the conjunction although x does not confirm the irrelevant conjunct. Moreover, Fitelson shows, even though (J) is confirmed by x, (H & J) gets less of a confirmation (firmness) boost than does H—so long as one doesn’t measure the confirmation boost using R: P(h|x)/P(x). If one does use R, then (H & J) is just as well confirmed as is H, which is disturbing.

But even if we use the LR as our firmness boost, I would agree with Glymour that the solution scarcely solves the real problem. Paraphrasing him, we would not be assured by an account that tells us deflection of light data (x) confirms both GTR (H) and the radioactivity of the Fukushima water is within acceptable levels (J), while assuring us that x does not confirm the Fukishima water having acceptable levels of radiation (31).

The tacking paradox is to be expected if confirmation is taken as a variation on probabilistic affirming the consequent. Hypothetico-deductivists had the same problem, which is why Popper said we need to supplement each of the measures of confirmation boost with the condition of “severity”. However, he was unable to characterize severity adequately, and ultimately denied it could be formalized. He left it as an intuitive requirement that before applying any C-function, the confirming evidence must be the result of “a sincere (and ingenious) attempt to falsify the hypothesis” in question. I try to supply a more adequate account of severity (e.g., Mayo 1996, 2/3/12 post (no-pain philosophy III)).

How would the tacking method fare on the severity account? We’re not given the details we’d want for an error statistical appraisal, but let’s do the best with their stipulations. From our necessary condition, we have that (H and J) cannot warrant taking x as evidence for (H and J) if x counts as a highly insevere test of (H and J). The “test process” with tacking is something like this: having confirmed H, tack on any consistent but irrelevant J to obtain (H & J).(Sentence was amended on 10/21/13)

A scrutiny of well-testedness may proceed by denying either condition for severity. To follow the confirmation theorists, let’s grant the fit requirement (since H fits or entails x). This does not constitute having done anything to detect the falsity of H& J. The conjunction has been subjected to a radically non-risky test. (See also 1/2/13 post, esp. 5.3.4 Tacking Paradox Scotched.)

What they call confirmation we call mere “fit”

In fact, all their measures of confirmation C, be it the ratio measure R: P(H|x)/P(H) or the (so-called[1]) likelihood ratio LR: P(H|x)/P(~H|x), or one of the others, count merely as “fit” or “accordance” measures to the error statistician. There is no problem allowing each to be relevant for different problems and different dimensions of evidence. What we need to add in each case are the associated error probabilities:

P([H & J] is Bayesian confirmed; ~(J&H)) = maximal, so x is “bad evidence, no test” (BENT) for the conjunction.

We read “;” as “under the assumption that”.

In fact, all their measures of confirmation C are mere “fit” measures, be it the ratio measure R: P(H|x)/P(H) or the LR or other.

The following was added on 10-21-13: The above probability stems from taking the “fit measure” as a statistic, and assessing error probabilities by taking account the test process, as in error statistics. The result is

SEV[(H & J), tacking test, x] is minimal 

I have still further problems with these inductive logic paradigms: an adequate philosophical account should answer questions and explicate principles about the methodology of scientific inference. Yet the Bayesian inductivist starts out assuming the intuition or principle, the task then being the homework problem of assigning priors and likelihoods that mesh with the principles. This often demands beating a Bayesian analysis into line, while still not getting at its genuine rationale. “The idea of putting probabilities over hypotheses delivered to philosophy a godsend, and an entire package of superficiality.” (Glymour 2010, 334). Perhaps philosophers are moving away from analytic reconstructions. Enough tears have been shed. But does an analogous problem crop up in Bayesian logic more generally?

I may update this post, and if I do I will alter the number following the title.

Oct. 20, 2013: I am updating this to reflect corrections pointed out by James Hawthorne, for which I’m very grateful. I will call this draft (ii).

Oct. 21, 2013 (updated in blue). I think another sentence might have accidentally got moved around.

Oct. 23, 2013. Given some issues that cropped up in the discussion (and the fact that certain symbols didn’t always come out right in the comments, I’m placing the point below in Note [2]):


[1] I say “so-called” because there’s no requirement of a proper statistical model here.

[2] Can P = C?

Spoze there’s a case where z confirms hh’ more than z confirms h’:  C(hh’,z) > C(h’,z)

Now h’ = (~hh’ or hh’)
So,
(i) C(hh’,z) > C(~hh’ or hh’,z)

Since ~hh’ and hh’ are mutually exclusive, we have from special addition rule
(ii) P(hh’,z) < P(~hh’ or hh’,z)

So if P = C, (i) and (ii) yield a contradiction.

REFERENCES

Achinstein, P. (2001). The Book of EvidenceOxford: Oxford University Press.

Carnap, R. (1962). Logical Foundations of Probability. Chicago: University of Chicago Press.

Earman, J.  (1992). Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory Cambridge MA: MIT Press.

Fitelson, B.  (2002). Putting the Irrelevance Back Into the Problem of Irrelevant Conjunction. Philosophy of Science 69(4), 611–622.

Fitelson, B. & Hawthorne, J.  (2004). Re-Solving Irrelevant Conjunction with Probabilistic Independence,  Philosophy of Science, 71: 505–514.

Glymour, C. (1980) . Theory and Evidence. Princeton: Princeton University Press

_____. (2010). Explanation and Truth. In D. G. Mayo & A. Spanos (Eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, 305–314. Cambridge: Cambridge University Press.

Mayo, D. (1996). Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press.

_____. (1997). “Duhem’s Problem, The Bayesian Way, and Error Statistics, or ‘What’s Belief got To Do With It?’” and “Response to Howson and Laudan,” Philosophy of Science 64(1): 222-244 and 323-333.

_____. (2010). Explanation and Testing Exchanges with Clark Glymour. In D. G. Mayo & A. Spanos (Eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, 305–314. Cambridge: Cambridge University Press.

Popper, K. (1959). The Logic of Scientific Discovery. New York: Basic Books.

Rosenkranz, R. (1977). Inference, Method and Decision: Towards a Bayesian Philosophy of Science. Dordrecht, The Netherlands: D. Reidel.

Categories: confirmation theory, Philosophy of Statistics, Statistics | Tags: | 75 Comments

Highly probable vs highly probed: Bayesian/ error statistical differences

3077175-lgA reader asks: “Can you tell me about disagreements on numbers between a severity assessment within error statistics, and a Bayesian assessment of posterior probabilities?” Sure.

There are differences between Bayesian posterior probabilities and formal error statistical measures, as well as between the latter and a severity (SEV) assessment, which differs from the standard type 1 and 2 error probabilities, p-values, and confidence levels—despite the numerical relationships. Here are some random thoughts that will hopefully be relevant for both types of differences. (Please search this blog for specifics.)

1. The most noteworthy difference is that error statistical inference makes use of outcomes other than the one observed, even after the data are available: there’s no other way to ask things like, how often would you find 1 nominally statistically significant difference in a hunting expedition over k or more factors?  Or to distinguish optional stopping with sequential trials from fixed sample size experiments.  Here’s a quote I came across just yesterday:

“[S]topping ‘when the data looks good’ can be a serious error when combined with frequentist measures of evidence. For instance, if one used the stopping rule [above]…but analyzed the data as if a fixed sample had been taken, one could guarantee arbitrarily strong frequentist ‘significance’ against H0.” (Berger and Wolpert, 1988, 77).

The worry about being guaranteed to erroneously exclude the true parameter value here is an error statistical affliction that the Bayesian is spared (even though I don’t think they can be too happy about it, especially when HPD intervals are assured of excluding the true parameter value.) See this post for an amusing note; Mayo and Kruse (2001) below; and, if interested, search the (strong)  likelihood principle, and Birnbaum.

2. Highly probable vs. highly probed. SEV doesn’t obey the probability calculus: for any test T and outcome x, the severity for both H and ~H might be horribly low. Moreover, an error statistical analysis is not in the business of probabilifying hypotheses but evaluating and controlling the capabilities of methods to discern inferential flaws (problems with linking statistical and scientific claims, problems of interpreting statistical tests and estimates, and problems of underlying model assumptions). This is the basis for applying what may be called the Severity principle. Continue reading

Categories: Bayesian/frequentist, Error Statistics, P-values, Philosophy of Statistics, Statistics, Stephen Senn, strong likelihood principle | 40 Comments

Barnard’s Birthday: background, likelihood principle, intentions

G.A. Barnard: 23 Sept.1915 – 9 Aug.2002

Reblog (year ago) : G.A. Barnard’s birthday is today, so here’s a snippet of his discussion with Savage (1962) (link below [i]) that connects to some earlier issues: stopping rules, likelihood principle, and background information here and here (at least of one type). (A few other Barnard links on this blog are below* .) Happy Birthday George!

Barnard: I have been made to think further about this issue of the stopping rule since I first suggested that the stopping rule was irrelevant (Barnard 1947a,b). This conclusion does not follow only from the subjective theory of probability; it seems to me that the stopping rule is irrelevant in certain circumstances.  Since 1947 I have had the great benefit of a long correspondence—not many letters because they were not very frequent, but it went on over a long time—with Professor Bartlett, as a result of which I am considerably clearer than I was before. My feeling is that, as I indicated [on p. 42], we meet with two sorts of situation in applying statistics to data One is where we want to have a single hypothesis with which to confront the data. Do they agree with this hypothesis or do they not? Now in that situation you cannot apply Bayes’s theorem because you have not got any alternatives to think about and specify—not yet. I do not say they are not specifiable—they are not specified yet. And in that situation it seems to me the stopping rule is relevant.

In particular, suppose somebody sets out to demonstrate the existence of extrasensory perception and says ‘I am going to go on until I get a one in ten thousand significance level’. Knowing that this is what he is setting out to do would lead you to adopt a different test criterion. What you would look at would not be the ratio of successes obtained, but how long it took him to obtain it. And you would have a very simple test of significance which said if it took you so long to achieve this increase in the score above the chance fraction, this is not at all strong evidence for E.S.P., it is very weak evidence. And the reversing of the choice of test criteria would I think overcome the difficulty.

This is the answer to the point Professor Savage makes; he says why use one method when you have vague knowledge, when you would use a quite different method when you have precise knowledge. It seem to me the answer is that you would use one method when you have precisely determined alternatives, with which you want to compare a given hypothesis, and you use another method when you do not have these alternatives.

Savage: May I digress to say publicly that I learned the stopping-rule principle from professor Barnard, in conversation in the summer of 1952. Frankly I then thought it a scandal that anyone in the profession could advance an idea so patently wrong, even as today I can scarcely believe that some people resist an idea so patently right. I am particularly surprised to hear Professor Barnard say today that the stopping rule is irrelevant in certain circumstances only, for the argument he first gave in favour of the principle seems quite unaffected by the distinctions just discussed. The argument then was this: The design of a sequential experiment is, in the last analysis, what the experimenter actually intended to do. His intention is locked up inside his head and cannot be known to those who have to judge the experiment. Never having been comfortable with that argument, I am not advancing it myself. But if Professor Barnard still accepts it, how can he conclude that the stopping-rule principle is only sometimes valid? (emphasis added) Continue reading

Categories: Background knowledge, Likelihood Principle, phil/history of stat, Philosophy of Statistics | Leave a comment

Gandenberger: How to Do Philosophy That Matters (guest post)

greg picGreg Gandenberger                             
Philosopher of Science
University of Pittsburgh
gandenberger.org                                                                                    468px-Karl_Popper

Genuine philosophical problems are always rooted in urgent problems outside philosophy,
and they die if these roots decay
Karl Popper (1963, 72)

My concern in this post is how we philosophers can use our skills to do work that matters to people both inside and outside of philosophy.

Philosophers are highly skilled at conceptual analysis, in which one takes an interesting but unclear concept and attempts to state precisely when it applies and when it doesn’t.

What is the point of this activity? In many cases, this question has no satisfactory answer. Conceptual analysis becomes an end in itself, and philosophical debates become fruitless arguments about words. The pleasure we philosophers take in such arguments hardly warrants scarce government and university resources. It does provide good training in critical thinking, but so do many other activities that are also immediately useful, such as doing science and programming computers.

Conceptual analysis does not have to be pointless. It is often prompted by a real-world problem. In Plato’s Euthyphro, for instance, the character Euthyphro thought that piety required him to prosecute his father for murder. His family thought on the contrary that for a son to prosecute his own father was the height of impiety. In this situation, the question “what is piety?” took on great urgency. It also had great urgency for Socrates, who was awaiting trial for corrupting the youth of Athens.

In general, conceptual analysis often begins as a response to some question about how we ought to regulate our beliefs or actions. It can be a fruitful activity as long as the questions that prompted it are kept in view. It tends to degenerate into merely verbal disputes when it becomes an end in itself.

The kind of goal-oriented view of conceptual analysis I aim to articulate and promote is not teleosemantics: it is a view about how philosophy should be done rather than a theory of meaning. It is consistent with Carnap’s notion of explication (one of the desiderata of which is fruitfulness) (Carnap 1963, 5), but in practice Carnapian explication seems to devolve into idle word games just as easily as conceptual analysis. Our overriding goal should not be fidelity to intuitions, precision, or systematicity, but usefulness.

How I Became Suspicious of Conceptual Analysis

When I began working on proofs of the Likelihood Principle, I assumed that following my intuitions about the concept of “evidential equivalence” would lead to insights about how science should be done. Birnbaum’s proof showed me that my intuitions entail the Likelihood Principle, which frequentist methods violate. Voila! Voila! Scientists shouldn’t use frequentist methods. All that remained to be done was to fortify Birnbaum’s proof, as I do in “A New Proof of the Likelihood Principle” by defending it against objections and buttressing it with an alternative proof. [Editor: For a number of related materials on this blog see Mayo’s JSM presentation, and note [i].]

After working on this topic for some time, I realized that I was making simplistic assumptions about the relationship between conceptual intuitions and methodological norms. At most, a proof of the Likelihood Principle can show you that frequentist methods run contrary to your intuitions about evidential equivalence. Even if those intuitions are true, it does not follow immediately that scientists should not use frequentist methods. The ultimate aim of science, presumably, is not to respect evidential equivalence but (roughly) to learn about the world and make it better. The demand that scientists use methods that respect evidential equivalence is warranted only insofar as it is conducive to achieving those ends. Birnbaum’s proof says nothing about that issue.

  • In general, a conceptual analysis–even of a normatively freighted term like “evidence”–is never enough by itself to justify a normative claim. The questions that ultimately matter are not about “what we mean” when we use particular words and phrases, but rather about what our aims are and how we can best achieve them.

How to Do Conceptual Analysis Teleologically

This is not to say that my work on the Likelihood Principle or conceptual analysis in general is without value. But it is nothing more than a kind of careful lexicography. This kind of work is potentially useful for clarifying normative claims with the aim of assessing and possibly implementing them. To do work that matters, philosophers engaged in conceptual analysis need to take enough interest in the assessment and implementation stages to do their conceptual analysis with the relevant normative claims in mind.

So what does this kind of teleological (goal-oriented) conceptual analysis look like?

It can involve personally following through on the process of assessing and implementing the relevant norms. For example, philosophers at Carnegie Mellon University working on causation have not only provided a kind of analysis of the concept of causation but also developed algorithms for causal discovery, proved theorems about those algorithms, and applied those algorithms to contemporary scientific problems (see e.g. Spirtes et al. 2000).

I have great respect for this work. But doing conceptual analysis does not have to mean going so far outside the traditional bounds of philosophy. A perfect example is James Woodward’s related work on causal explanation, which he describes as follows (2003, 7-8, original emphasis):

My project…makes recommendations about what one ought to mean by various causal and explanatory claims, rather than just attempting to describe how we use those claims. It recognizes that causal and explanatory claims sometimes are confused, unclear, and ambiguous and suggests how those limitations might be addressed…. we introduce concepts…and characterize them in certain ways…because we want to do things with them…. Concepts can be well or badly designed for such purposes, and we can evaluate them accordingly.

Woodward keeps his eye on what the notion of causation is for, namely distinguishing between relationships that do and relationships that do not remain invariant under interventions. This distinction is enormously important because only relationships that remain invariant under interventions provide “handles” we can use to change the world.

Here are some lessons about teleological conceptual analysis that we can take from Woodward’s work. (I’m sure this list could be expanded.)

  1. Teleological conceptual analysis puts us in charge. In his wonderful presidential address at the 2012 meeting of the Philosophy of Science Association, Woodward ended a litany of metaphysical arguments against regarding mental events as causes by asking “Who’s in charge here?” There is no ideal form of Causation to which we must answer. We are free to decide to use “causation” and related words in the ways that best serve our interests.
  2. Teleological conceptual analysis can be revisionary. If ordinary usage is not optimal, we can change it.
  3. The product of a teleological conceptual analysis need not be unique. Some philosophers reject Woodward’s account because they regard causation as a process rather than as a relationship among variables. But why do we need to choose? There could just be two different notions of causation. Woodward’s account captures one notion that is very important in science and everyday life. If it captures all of the causal notions that are important, then so much the better. But this kind of comprehensiveness is not essential.
  4. Teleological conceptual analysis can be non-reductive. Woodward characterizes causal relations as (roughly) correlation relations that are invariant under certain kinds of interventions. But the notion of an intervention is itself causal. Woodward’s account is not circular because it characterizes what it means for a causal relationship to hold between two variables in terms of a different causal processes involving different sets of variables. But it is non-reductive in the sense that does not allow us to replace causal claims with equivalent non-causal claims (as, e.g., counterfactual, regularity, probabilistic, and process theories purport to do). This fact is a problem if one’s primary concern is to reduce one’s ultimate metaphysical commitments, but it is not necessarily a problem if one’s primary concern is to improve our ability to assess and use causal claims.

Conclusion

Philosophers rarely succeed in capturing all of our intuitions about an important informal concept. Even if they did succeed, they would have more work to do in justifying any norms that invoke that concept. Conceptual analysis can be a first step toward doing philosophy that matters, but it needs to be undertaken with the relevant normative claims in mind.

Question: What are your best examples of philosophy that matters? What can we learn from them?


Citations

  • Birnbaum, Allan. “On the Foundations of Statistical Inference.” Journal of the American Statistical Association 57.298 (1962): 269-306.
  • Carnap, Rudolf. Logical Foundations of Probability. U of Chicago Press, 1963.
  • Gandenberger, Greg. “A New Proof of the Likelihood Principle.” The British Journal for the Philosophy of Science (forthcoming).
  • Plato. Euthyphrohttp://classics.mit.edu/Plato/euthyfro.html.
  • Popper, Karl. Conjectures and Refutations. London: Routledge & Kegan Paul, 1963.
  • Spirtes, Peter, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. Vol. 81. The MIT Press, 2000.
  • Woodward, James. Making Things Happen: A Theory of Causal Explanation. Oxford University Press, 2003.

[i] Earlier posts are here and here. Some U-Phils are here, here, and here. For some amusing notes (e.g., Don’t Birnbaumize that experiment my friend, and Midnight with Birnbaum).

Some related papers:

  • Cox D. R. and Mayo. D. G. (2010). “Objectivity and Conditionality in Frequentist Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 276-304.
Categories: Birnbaum Brakes, Likelihood Principle, StatSci meets PhilSci | 9 Comments

E.S. Pearson: “Ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot”

E.S.Pearson on Gate

E.S.Pearson on a Gate,             Mayo sketch

Today is Egon Pearson’s birthday (11 Aug., 1895-12 June, 1980); and here you see my scruffy sketch of him, at the start of my book, “Error and the Growth of Experimental Knowledge” (EGEK 1996). As Erich Lehmann put it in his EGEK review, Pearson is “the hero of Mayo’s story” because I found in his work, if only in brief discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of Neyman-Pearson theory of statistics.  “Pearson and Pearson” statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect. One of the few sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It begins like this:

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”.  There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!…
Indeed, to dispel the picture of the Russian technological bogey, I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot…!

To continue reading, “Statistical Concepts in Their Relation to Reality” click HERE.

See also Aris Spanos: “Egon Pearson’s Neglected Contributions to Statistics“.

Happy Birthday E.S. Pearson!

Categories: phil/history of stat, Philosophy of Statistics, Statistics | Tags: , | 4 Comments

Blog at WordPress.com. The Adventure Journal Theme.

Follow

Get every new post delivered to your Inbox.

Join 394 other followers