Here are the slides from my discussion of Nancy Reid today at BFF4: The Fourth Bayesian, Fiducial, and Frequentist Workshop: May 1-3, 2017 (hosted by Harvard University)

# confirmation theory

## “Fusion-Confusion?” My Discussion of Nancy Reid: “BFF Four- Are we Converging?”

## Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs B-boosts)

Since we’ll be discussing Bayesian confirmation measures in next week’s seminar—the relevant blogpost being here--let’s listen in to one of the comedy hours at the Bayesian retreat as reblogged from May 5, 2012.

The problem was, the epistemic probability in H was so low that H couldn’t be believed! Instead we believe its denial H’! So, she will infer hypotheses that are simply unbelievable!

So it appears the error statistical testing account fails to serve as an account of knowledge or evidence (i.e., an epistemic account). However severely I might wish to say that a hypothesis *H* has passed a test, this Bayesian critic assigns a sufficiently low prior probability to *H* so as to yield a low posterior probability in *H*[i]*. * But this is no argument about why this counts in favor of, rather than against, their particular Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis *H*.

To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.” This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true. This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H. Continue reading

## Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)

Our favorite high school student, Isaac, gets a better shot at showing his college readiness using one of the comparative measures of support or confirmation discussed last week. Their assessment thus seems more in sync with the severe tester, but they are not purporting that z is evidence for inferring (or even believing) an H to which z affords a high B-boost*. Their measures identify a third category that reflects the degree to which H would predict z (where the comparison might be predicting without z, or under ~H or the like). At least if we give it an empirical, rather than a purely logical, reading. Since it’s Saturday night let’s listen in to one of the comedy hours at the Bayesian retreat as reblogged from May 5, 2012.

The problem was, the epistemic probability in H was so low that H couldn’t be believed! Instead we believe its denial H’! So, she will infer hypotheses that are simply unbelievable!

So it appears the error statistical testing account fails to serve as an account of knowledge or evidence (i.e., an epistemic account). However severely I might wish to say that a hypothesis *H* has passed a test, this Bayesian critic assigns a sufficiently low prior probability to *H* so as to yield a low posterior probability in *H*[i]*. * But this is no argument about why this counts in favor of, rather than against, their particular Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis *H*.

To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.” This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true. This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H. Continue reading

## Bayesian confirmation theory: example from last post…

Before leaving the “irrelevant conjunct” business of the last post, I am setting out that Popper example for future reference, so we don’t have to wade through 50+ comments, if we want to come back to it. Alexandre converted the example to more familiar set theory language, and Corey made a nice picture. Thanks! I now think I’ve gotten to the bottom of the issue*.

Popper (on probability vs confirmation as a B-boost):

“Consider now the conjecture that there are three statements, h, h’, z, such that (i) h and h’ are independent of z (or undermined by z) while (ii) z supports their conjunction hh’. Obviously we should have to say in such a case that z confirms hh’ to a higher degree than it confirms either h or h’; in symbols,

(4.1) C(h,z) < C(hh’, z) > C(h’, z)

But this would be incompatible with the view that C(h,z) is a probability, i.e., with

(4.2) C(h,z) = P(h|z)

Since for probabilities we have the generally valid formula

(4.3) P(h|z) > P(hh’|z) < P(h’|z)…..” (Popper, LSD, 396-7)

“Take coloured counters, called ‘a’,’b’,…, with four exclusive and equally probable properties, blue, green, red, and yellow.

h: ‘b is blue or green’;

h’: ‘b is blue or red’

z: ‘b is blue or yellow’.

Then all our conditions are satisfied. h and h’ are independent of z. (That z supports hh’ is obvious: z follows from hh’, and its presence raises the probability of hh’ to twice the value it has in the absence of z.” (LSD 398) (The conjunction of h and h’ yields ‘b is blue’.)[i]

Let me provide a simple example in terms of sets, hopefully this can to make your point clear:

Let O = {a,b,c,d} be the universe set endowed with a probability measure P such that:

P({a}) = P({b})=P({c}) = P({d}) = 1/4.

Define the subsets X = {a,b}; Y = {a,c} and Z = {a,d}. Hence, we have:

P(X) = P(Y) = P(Z) = 1/2

and

P(X /\ Y) = P(X /\ Z) = P(Y /\ Z) = P({a}) = 1/4,

where the symbol /\ stands for the intersection. Then, the conditional probabilities are

P(X|Z) = P(X /\ Z)/P(Z) = 1/2 = P(X),

P(Y|Z) = P(Y /\ Z)/P(Z) = 1/2 = P(Y)

and

P(X /\ Y |Z) = P(X /\ Y /\ Z)/P(Z) = P({a})/ P(Z) = 1/2.

It means that X and Y are both independent of Z, but (X /\ Y) is not.

Assume that: C(w,q) = P(w|q)/P(w) is our confirmation measure, then

C(X,Z) = 1, that is, Z does not support X

C(Y,Z) = 1, that is, Z does not support Y

C(X /\ Y,X) = 2, that is, Z does support X /\ Y

In Deborah Mayo’s words:

C(X,Z) is less than C(X /\ Y, Z) that is greater than C(Y, Z).

*Once I write it up I’ll share it. I am grateful for insights arising from the last discussion. I will post at least one related point over the weekend.

(Continued discussion should go here I think, the other post is too crowded.)

[i]From Note (2) of previous past:

[2] Can P = C? 9i.e., can “confirmation” defined as a B-boost, equal probability P?)

Spoze there’s a case where z confirms hh’ more than z confirms h’: C(hh’,z) > C(h’,z)

Now h’ = (~hh’ or hh’)

So,

(i) C(hh’,z) > C(~hh’ or hh’,z)

Since ~hh’ and hh’ are mutually exclusive, we have from special addition rule

(ii) P(hh’,z) < P(~hh’ or hh’,z)

So if P = C, (i) and (ii) yield a contradiction.

## Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*

A long-running research program in philosophy is to seek a quantitative measure

C(*h*,* x*)

to capture intuitive ideas about “confirmation” and about “confirmational relevance”. The components of C(*h*,* x*) are allowed to be any statements, no reference to a probability model or to joint distributions are required. Then

*h*is “confirmed” or supported by

*if P(*

**x***h*|

**) > P(**

*x**h*), disconfirmed (or undermined) if P(

*h*|

*) < P(*

**x***h*), (else

*is confirmationally irrelevant to*

**x***h*). This is the generally accepted view of philosophers of confirmation (or Bayesian formal epistemologists) up to the present. There is generally a background “k” included, but to avoid a blinding mass of symbols I omit it. (We are rarely told how to get the probabilities anyway; but I’m going to leave that to one side, as it will not really matter here.)

A test of any purported philosophical confirmation theory is whether it elucidates or is even in sync with intuitive methodological principles about evidence or testing. One of the first problems that arises stems from asking…

**Is Probability a good measure of confirmation?**

A natural move then would be to identify the degree of confirmation of *h* by ** x** with probability P(

*h*|

**), (which philosophers sometimes write as P(**

*x**h*,

*)). Statement*

**x***affords hypothesis*

**x***h*higher confirmation than it does

*h*’ iff P(

*h*|

**) > P(**

*x**h*’|

**).**

*x*Some puzzles immediately arise. Hypothesis *h* can be confirmed by ** x**, while

*h*’ disconfirmed by

**, and yet P(**

*x**h*|

**) < P(**

*x**h*’|

**). In other words, we can have P(**

*x**h*|

**) > P(**

*x**h*) and P(

*h*’|

**) < P(**

*x**h*’) and yet P(

*h*|

**) < P(**

*x**h*’|

**).**

*x*Popper (*The Logic of Scientific Discovery,* 1959, 390) gives this example, (I quote from him, only changing symbols slightly):

Consider the next toss with a homogeneous die.

*h*: 6 will turn up

*h*’: 6 will not turn up

** x**: an even number will turn up.

P(*h*) = 1/6, p(*h*’) = 5/6 P(** x**) = ½

The probability of *h* is raised by information ** x**, while

*h*’ is undermined by

**. (It’s probability goes from 5/6 to 4/6.) If we identify probability with degree of confirmation,**

*x***confirms**

*x**h*and disconfirms

*h*’ (i.e., P(

*h*|

**) >P(**

*x**h*) and P(

*h*’|

**) < P(**

*x**h*’)). Yet because P(

*h*|

**) < P(**

*x**h*’|

**),**

*x**h*is less well confirmed given

**than is**

*x**h*’. (This happens because P(h) is sufficiently low.) So P(

*h*|

**) cannot just be identified with the degree of confirmation that**

*x***affords**

*x**h*.

Note, these are not real statistical hypotheses but statements of events.

Obviously there needs to be a way to distinguish between some absolute confirmation for *h*, and a relative measure of how much it has increased due to ** x**. From the start, Rudolf Carnap noted that “the verb ‘to confirm’ is ambiguous” but thought it had “the connotation of ‘making firmer’ even more often than that of ‘making firm’.” (Carnap,

*Logical Foundations of Probability*(2

^{nd}), xviii ).

*can increase the firmness of*

**x***h*, but C(

*h*,

**) < C(~**

*x**h*,

**) (**

*x**h*is more firm, given

**, than is ~**

*x**h*). Like Carnap, it’s the ‘making firmer’ that is generally assumed in Bayesian confirmation theory.

But there are many different measures of making firmer (Popper, Carnap, Fitelson). Referring to Popper’s example, we can report the ratio R: P(*h*|** x**)/P(

*h*) = 2.

(In this case *h*’ = ~*h*).

Or we use the likelihood ratio LR: P(** x**|

*h*)/P(

**|~**

*x**h*) = (1/.4) = 2.5.

Many other ways of measuring the increase in confirmation ** x** affords

*h*could do as well. But what shall we say about the numbers like 2, 2.5? Do they mean the same thing in different contexts? What happens if we get beyond toy examples to scientific hypotheses where ~

*h*would allude to all possible theories not yet thought of. What’s P(

**|~**

*x**h*) where ~

*h*is “the catchall” hypothesis asserting “something else”? (see, for example, Mayo 1997)

Perhaps this point won’t prevent confirmation logics from accomplishing the role of capturing and justifying intuitions about confirmation. So let’s consider the value of confirmation theories to that role. One of the early leaders of philosophical Bayesian confirmation, Peter Achinstein (2001), began to have doubts about the value of the philosopher’s *a priori* project. He even claims, rather provocatively, that “scientists do not and should not take … philosophical accounts of evidence seriously” (p. 9) because they give us formal syntactical (context –free) measures; whereas, scientists look to empirical grounds for confirmation. Philosophical accounts, moreover, make it too easy to confirm. He rejects confirmation as increased firmness, denying it is either necessary or sufficient for evidence. As far as making it too easy to get confirmation, there is the classic problem: it appears we can get everything to confirm everything, so long as one thing is confirmed. This is a famous argument due to Glymour (1980).

*Paradox of irrelevant conjunctions*

We now switch to emphasizing that the hypotheses may be statistical hypotheses or substantive theories. Both for this reason and because I think they look better, I move away from Popper and Carnap’s lower case letters for hypotheses.

The problem of irrelevant conjunctions (the “tacking paradox”) is this: If ** x **confirms

*H*, then

**also confirms (**

*x**H*&

*J*), even if hypothesis

*J*is just “tacked on” to

*H*. As with most of these chestnuts, there is a long history (e.g., Earman 1992, Rosenkrantz 1977), but consider just a leading contemporary representative, Branden Fitelson. Fitelson has importantly emphasized how many different C functions there are for capturing “makes firm”. Fitelson defines:

Jis an irrelevant conjunct toH, with respect tojust in case P(x|H) = P(x|xJ&H).

For instance,** x** might be radioastronomic data in support of:

H: the deflection of light effect (due to gravity) is as stipulated in the General Theory of Relativity (GTR), 1.75” at the limb of the sun.

and the irrelevant conjunct:

J: the radioactivity of the Fukushima water being dumped in the Pacific ocean is within acceptable levels.

(1) Bayesian (Confirmation) Conjunction: If ** x** Bayesian confirms

*H*, then

*Bayesian-confirms (*

**x***H*&

*J*), where P(

**|**

*x**H*&

*J*) = P(

**|**

*x**H*) for any

*J*consistent with

*H*.

The reasoning is as follows:

P(** x**|

*H*) /P(

**) > 1 (**

*x***Bayesian confirms**

*x**H*)

P(** x**|

*H*&

*J*) = P(

**|**

*x**H*) (given)

So [P(** x**|

*H*&

*J*) /P(

**)]> 1**

*x*Therefore ** x** Bayesian confirms (

*H*&

*J*)

However, it is also plausible to hold :

(2) Entailment condition: If ** x** confirms

*T*, and

*T*entails

*J*, then

**confirms**

*x**J*.

In particular, if ** x** confirms (

*H*&

*J*), then

**confirms**

*x**J*.

(3) From (1) and (2) , if ** x** confirms

*H*, then

**confirms**

*x**J*for any irrelevant

*J*consistent with

*H.*

(Assume neither *H* nor* J* have probabilities 0 or 1).

It follows that if ** x** confirms any

*H*, then

**confirms any**

*x**J*.

*Branden Fitelson’s solution*

Fitelson (2002), and Fitelson and Hawthorne (2004) offer this “solution”: He will allow that ** x** confirms (

*H*&

*J*), but deny the entailment condition. So, in particular,

**confirms the conjunction although**

*x***does not confirm the irrelevant conjunct. Moreover, Fitelson shows, even though (**

*x**H*&

*J*) is confirmed by

**, (**

*x**H*&

*J*) gets less of a confirmation (firmness) boost than does

*H*—so long as one doesn’t measure the confirmation boost using R: P(

*h*|

**)/P(**

*x***). If one does use R, then (**

*x**H*&

*J*) is just as well confirmed as is

*H*, which is disturbing.

But even if we use the LR as our firmness boost, I would agree with Glymour that the solution scarcely solves the real problem. Paraphrasing him, we would not be assured by an account that tells us deflection of light data (** x**) confirms both GTR (

*H*) and the radioactivity of the Fukushima water is within acceptable levels (

*J*), while assuring us that

**does not confirm the Fukishima water having acceptable levels of radiation (31).**

*x*The tacking paradox is to be expected if confirmation is taken as a variation on probabilistic affirming the consequent. Hypothetico-deductivists had the same problem, which is why Popper said we need to supplement each of the measures of confirmation boost with the condition of “severity”. However, he was unable to characterize severity adequately, and ultimately denied it could be formalized. He left it as an intuitive requirement that before applying any C-function, the confirming evidence must be the result of “a sincere (and ingenious) attempt to falsify the hypothesis” in question. I try to supply a more adequate account of severity (e.g., Mayo 1996, 2/3/12 post (no-pain philosophy III)).

How would the tacking method fare on the severity account? We’re not given the details we’d want for an error statistical appraisal, but let’s do the best with their stipulations. From our necessary condition, we have that (*H* and *J*) cannot warrant taking ** x** as evidence for (

*H*and

*J*) if

**counts as a highly insevere test of (**

*x**H*and

*J*). The “test process” with tacking is something like this: having confirmed

*H*, tack on any consistent but irrelevant

*J*to obtain (

*H*&

*J*).(Sentence was amended on 10/21/13)

A scrutiny of well-testedness may proceed by denying either condition for severity. To follow the confirmation theorists, let’s grant the fit requirement (since *H* fits or entails ** x**). This does not constitute having done anything to detect the falsity of

*H*&

*J*. The conjunction has been subjected to a radically non-risky test. (See also 1/2/13 post, esp.

*5.3.4 Tacking Paradox Scotched.*)

*What they call confirmation we call mere “fit”*

In fact, all their measures of confirmation C, be it the ratio measure R: P(*H*|** x**)/P(

*H*) or the (so-called[1]) likelihood ratio LR: P(

*H*|

**)/P(~**

*x**H*|

**), or one of the others, count merely as “fit” or “accordance” measures to the error statistician. There is no problem allowing each to be relevant for different problems and different dimensions of evidence. What we need to add in each case are the associated error probabilities:**

*x*P([*H* &* J*] is Bayesian confirmed; ~(*J*&*H*)) = maximal, so ** x** is “bad evidence, no test” (BENT) for the conjunction.

We read “;” as “under the assumption that”.

In fact, all their measures of confirmation C are mere “fit” measures, be it the ratio measure R: P(*H*|** x**)/P(

*H*) or the LR or other.

The following was added on 10-21-13: The above probability stems from taking the “fit measure” as a statistic, and assessing error probabilities by taking account the test process, as in error statistics. The result is

SEV[(H & J), tacking test, x] is minimal

I have still further problems with these inductive logic paradigms: an adequate philosophical account should answer questions and explicate principles about the methodology of scientific inference. Yet the Bayesian inductivist starts out assuming the intuition or principle, the task then being the homework problem of assigning priors and likelihoods that mesh with the principles. This often demands beating a Bayesian analysis into line, while still not getting at its genuine rationale. “The idea of putting probabilities over hypotheses delivered to philosophy a godsend, and an entire package of superficiality.” (Glymour 2010, 334). Perhaps philosophers are moving away from analytic reconstructions. Enough tears have been shed. But does an analogous problem crop up in Bayesian logic more generally?

I may update this post, and if I do I will alter the number following the title.

Oct. 20, 2013: I am updating this to reflect corrections pointed out by James Hawthorne, for which I’m very grateful. I will call this draft (ii).

Oct. 21, 2013 (updated in blue). I think another sentence might have accidentally got moved around.

Oct. 23, 2013. Given some issues that cropped up in the discussion (and the fact that certain symbols didn’t always come out right in the comments, I’m placing the point below in Note [2]):

[1] I say “so-called” because there’s no requirement of a proper statistical model here.

[2] Can P = C?

Spoze there’s a case where z confirms hh’ more than z confirms h’: C(hh’,z) > C(h’,z)

Now h’ = (~hh’ or hh’)

So,

(i) C(hh’,z) > C(~hh’ or hh’,z)

Since ~hh’ and hh’ are mutually exclusive, we have from special addition rule

(ii) P(hh’,z) < P(~hh’ or hh’,z)

So if P = C, (i) and (ii) yield a contradiction.

**REFERENCES**

Achinstein, P. (2001). *The Book of Evidence**. *Oxford: Oxford University Press.

Carnap, R. (1962). *Logical Foundations of Probability*. Chicago: University of Chicago Press.

Earman, J. (1992). *Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory* Cambridge MA: MIT Press.

Fitelson, B. (2002). Putting the Irrelevance Back Into the Problem of Irrelevant Conjunction. *Philosophy of Science* *69*(4), 611–622.

Fitelson, B. & Hawthorne, J. (2004). Re-Solving Irrelevant Conjunction with Probabilistic Independence, *Philosophy of Science*, *71*: 505–514.

Glymour, C. (1980) . *Theory and Evidence*. Princeton: Princeton University Press

_____. (2010). Explanation and Truth. In D. G. Mayo & A. Spanos (Eds.), *Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science*, 305–314. Cambridge: Cambridge University Press.

Mayo, D. (1996). *Error and the Growth of Experimental Knowledge*. Chicago: University of Chicago Press.

_____. (1997). “Duhem’s Problem, The Bayesian Way, and Error Statistics, or ‘What’s Belief got To Do With It?‘” and “Response to Howson and Laudan,” *Philosophy of Science* **64**(1): 222-244 and 323-333.

_____. (2010). Explanation and Testing Exchanges with Clark Glymour. In D. G. Mayo & A. Spanos (Eds.), *Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science*, 305–314. Cambridge: Cambridge University Press.

Popper, K. (1959). *The Logic of Scientific Discovery. *New York: Basic Books.

Rosenkranz, R. (1977). *Inference, Method and Decision: Towards a Bayesian Philosophy of Science.* Dordrecht, The Netherlands: D. Reidel.