Monthly Archives: October 2013

WHIPPING BOYS AND WITCH HUNTERS

Posted on October 31, 2013 by Mayo

This, from 2 years ago, “fits” at least as well today…HAPPY HALLOWEEN! Memory Lane

In an earlier post I alleged that frequentist hypotheses tests often serve as whipping boys, by which I meant “scapegoats”, for the well-known misuses, abuses, and flagrant misinterpretations of tests (both simple Fisherian significance tests and Neyman-Pearson tests, although in different ways). Checking the history of this term however, there is a certain disanalogy with at least the original meaning of a of “whipping boy,” namely, an innocent boy who was punished when a medieval prince misbehaved and was in need of discipline. It was thought that seeing an innocent companion, often a friend, beaten for his own transgressions would supply an effective way to ensure the prince would not repeat the same mistake. But significance tests floggings, rather than a tool for a humbled self-improvement and commitment to avoiding flagrant rule violations, has tended instead to yield declarations that it is the rules that are invalid! The violators are excused as not being able to help it! The situation is more akin to that of witch hunting, that in some places became an occupation in its own right.

Now some early literature, e.g., Morrison and Henkel’s Significance Test Controversy (1962), performed an important service over fifty years ago. They alerted social scientists to the fallacies of significance tests: misidentifying a statistically significant difference with one of substantive importance, interpreting insignificant results as evidence for the null hypothesis–especially problematic with insensitive tests, and the like. Chastising social scientists for applying significance tests in slavish and unthinking ways, contributors call attention to a cluster of pitfalls and fallacies of testing.

The volume describes research studies conducted for the sole purpose of revealing these flaws. Rosenthal and Gaito (1963) document how it is not rare for scientists to mistakenly regard a statistically significant difference, say at level .05, as indicating a greater discrepancy from the null when arising from a large sample size rather than a smaller sample size—even though a correct interpretation of tests indicates the reverse. By and large, these critics are not espousing a Bayesian line but rather see themselves as offering “reforms” e.g., supplementing simple significance tests with power (e.g., Jacob Cohen’s “power analytic movement), and most especially, replacing tests with confidence interval estimates of the size of discrepancy (from the null) indicated by the data. Of course, the use of power is central for (frequentist) Neyman-Pearson tests, and (frequentist) confidence interval estimation even has a duality with hypothesis tests!)

But rather than take a temporary job of pointing up some understandable fallacies in the use of newly adopted statistical tools by social scientific practitioners, or lead by example of right-headed statistical analyses, the New Reformers have seemed to settle into a permanent career of showing the same fallacies. Yes, they advocate “alternative” methods, e.g., “effect size” analysis, power analysis, confidence intervals, meta-analysis. But never having adequately unearthed the essential reasoning and rationale of significance tests—admittedly something that goes beyond many typical expositions—their supplements and reforms often betray the same confusions and pitfalls that underlie the methods they seek to supplement or replace! (I will give readers a chance to demonstrate this in later posts.)

We all reject the highly lampooned, recipe-like uses of significance tests; I and others insist on interpreting tests to reflect the extent of discrepancy indicated or not (back when I was writing my doctoral dissertation and EGEK 1996). I never imagined that hypotheses tests (of all stripes) would continue to be flogged again and again, in the same ways!

Frustrated with the limited progress in psychology, apparently inconsistent results, and lack of replication, an imagined malign conspiracy of significance tests is blamed: traditional reliance on statistical significance testing, we hear,

“has a debilitating effect on the general research effort to develop cumulative theoretical knowledge and understanding. However, it is also important to note that it destroys the usefulness of psychological research as a means for solving practical problems in society” (Schmidt 1996, 122)[i].

Meta-analysis was to be the cure that would provide cumulative knowledge to psychology: Lest enthusiasm for revisiting the same cluster of elementary fallacies of tests begin to lose steam, the threats of dangers posed become ever shriller: just as the witch is responsible for whatever ails a community, the significance tester is portrayed as so powerful as to be responsible for blocking scientific progress. In order to keep the gig alive, a certain level of breathless hysteria is common: “statistical significance is hurting people, indeed killing them” (Ziliak and McCloskey 2008, 186)[ii]; significance testers are members of a “cult” led by R.A. Fisher” whom they call “The Wasp”. To the question, “What if there were no Significance Tests,” as the title of one book inquires[iii], surely the implication is that once tests are extirpated, their research projects would bloom and thrive; so let us have Task Forces [iv] to keep reformers busy at journalistic reforms to banish the test once and for all!

Harlow, L., Mulaik, S., Steiger, J. (Eds.) What if there were no significance tests? (pp. 37-64). Mahwah, NJ: Lawrence Erlbaum Associates.

Hunter, J.E. (1997), “Needed: A Ban on the Significance Test,”, American Psychological Society 8:3-7.

Morrison, D. and Henkel, R. (eds.) (1970), The Significance Test Controversy, Aldine, Chicago.

MSERA (1998), Research in the Schools, 5(2) “Special Issue: Statistical Significance Testing,” Birmingham, Alabama.

Rosenthal, R. and Gaito, J. (1963), “The Interpretation of Levels of Significance by Psychologicl Researchers,” Journal of Psychology 55:33-38.

Ziliak, T. and McCloskey, D. (2008), The Cult of Statistical Significance, University of Michigan Press.

[i]Schmidt was the one Erich Lehmann wrote to me about, expressing great concern.

[ii] While setting themselves up as High Priest and Priestess of “reformers” their own nostroms reveal they fall into the same fallacy pointed up by Rosenthal and Gaito (among many others) nearly a half a century ago. That’s what should scare us!

[iii] In Lisa A. Harlow, Stanley A. Mulaik, and James H. Steiger (Eds.) What if there were no significance tests? (pp. 37-64). Mahwah, NJ: Lawrence Erlbaum Associates.

[iv] MSERA (1998): ‘Special Issue: Statistical Significance Testing,’ Research in the Schools, 5. See also Hunter (1997). The last I heard, they have not succeeded in their attempt at an all-out “test ban”. Interested readers might check the status of the effort, and report back.

“Saturday night brainstorming and taskforces”

“What do these share in common: MMs, limbo stick, ovulation, Dale Carnegie?: Sat. night potpourri”

Categories: significance tests, Statistics | Tags: reformers, significance test controversies, significance tests | 3 Comments

Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)

Posted on October 26, 2013 by Mayo

Our favorite high school student, Isaac, gets a better shot at showing his college readiness using one of the comparative measures of support or confirmation discussed last week. Their assessment thus seems more in sync with the severe tester, but they are not purporting that z is evidence for inferring (or even believing) an H to which z affords a high B-boost*. Their measures identify a third category that reflects the degree to which H would predict z (where the comparison might be predicting without z, or under ~H or the like). At least if we give it an empirical, rather than a purely logical, reading. Since it’s Saturday night let’s listen in to one of the comedy hours at the Bayesian retreat as reblogged from May 5, 2012.

Did you hear the one about the frequentist error statistical tester who inferred a hypothesis H passed a stringent test (with data x)?

The problem was, the epistemic probability in H was so low that H couldn’t be believed! Instead we believe its denial H’! So, she will infer hypotheses that are simply unbelievable!

So it appears the error statistical testing account fails to serve as an account of knowledge or evidence (i.e., an epistemic account). However severely I might wish to say that a hypothesis H has passed a test, this Bayesian critic assigns a sufficiently low prior probability to H so as to yield a low posterior probability in H[i]. But this is no argument about why this counts in favor of, rather than against, their particular Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis H.

To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.” This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true. This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H. Continue reading →

Categories: Comedy, confirmation theory, Statistics | Tags: Comedy club, criticism of frequentist methods, epistemic probabilsm, Frequentist inference, Peter Achinstein | 20 Comments

Bayesian confirmation theory: example from last post…

Posted on October 25, 2013 by Mayo

Alexandre’s example, by Corey

Before leaving the “irrelevant conjunct” business of the last post, I am setting out that Popper example for future reference, so we don’t have to wade through 50+ comments, if we want to come back to it. Alexandre converted the example to more familiar set theory language, and Corey made a nice picture. Thanks! I now think I’ve gotten to the bottom of the issue*.

Popper (on probability vs confirmation as a B-boost):

“Consider now the conjecture that there are three statements, h, h’, z, such that (i) h and h’ are independent of z (or undermined by z) while (ii) z supports their conjunction hh’. Obviously we should have to say in such a case that z confirms hh’ to a higher degree than it confirms either h or h’; in symbols,

(4.1) C(h,z) < C(hh’, z) > C(h’, z)

But this would be incompatible with the view that C(h,z) is a probability, i.e., with

(4.2) C(h,z) = P(h|z)

Since for probabilities we have the generally valid formula

(4.3) P(h|z) > P(hh’|z) < P(h’|z)…..” (Popper, LSD, 396-7)

“Take coloured counters, called ‘a’,’b’,…, with four exclusive and equally probable properties, blue, green, red, and yellow.

h: ‘b is blue or green’;

h’: ‘b is blue or red’

z: ‘b is blue or yellow’.

Then all our conditions are satisfied. h and h’ are independent of z. (That z supports hh’ is obvious: z follows from hh’, and its presence raises the probability of hh’ to twice the value it has in the absence of z.” (LSD 398) (The conjunction of h and h’ yields ‘b is blue’.)[i]

Alexandre Patriota:

Let me provide a simple example in terms of sets, hopefully this can to make your point clear:

Let O = {a,b,c,d} be the universe set endowed with a probability measure P such that:

P({a}) = P({b})=P({c}) = P({d}) = 1/4.

Alexandre’s example, by Corey

Define the subsets X = {a,b}; Y = {a,c} and Z = {a,d}. Hence, we have:

P(X) = P(Y) = P(Z) = 1/2

and

P(X /\ Y) = P(X /\ Z) = P(Y /\ Z) = P({a}) = 1/4,

where the symbol /\ stands for the intersection. Then, the conditional probabilities are

P(X|Z) = P(X /\ Z)/P(Z) = 1/2 = P(X),

P(Y|Z) = P(Y /\ Z)/P(Z) = 1/2 = P(Y)

and

P(X /\ Y |Z) = P(X /\ Y /\ Z)/P(Z) = P({a})/ P(Z) = 1/2.

It means that X and Y are both independent of Z, but (X /\ Y) is not.

Assume that: C(w,q) = P(w|q)/P(w) is our confirmation measure, then

C(X,Z) = 1, that is, Z does not support X

C(Y,Z) = 1, that is, Z does not support Y

C(X /\ Y,X) = 2, that is, Z does support X /\ Y

In Deborah Mayo’s words:

C(X,Z) is less than C(X /\ Y, Z) that is greater than C(Y, Z).

*Once I write it up I’ll share it. I am grateful for insights arising from the last discussion. I will post at least one related point over the weekend.

(Continued discussion should go here I think, the other post is too crowded.)

[i]From Note (2) of previous past:

[2] Can P = C? 9i.e., can “confirmation” defined as a B-boost, equal probability P?)

Spoze there’s a case where z confirms hh’ more than z confirms h’: C(hh’,z) > C(h’,z)

Now h’ = (~hh’ or hh’)
So,
(i) C(hh’,z) > C(~hh’ or hh’,z)

Since ~hh’ and hh’ are mutually exclusive, we have from special addition rule
(ii) P(hh’,z) < P(~hh’ or hh’,z)

So if P = C, (i) and (ii) yield a contradiction.

Categories: confirmation theory | 9 Comments

Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*

Posted on October 19, 2013 by Mayo

*addition of note [2].

A long-running research program in philosophy is to seek a quantitative measure

C(h,x)

to capture intuitive ideas about “confirmation” and about “confirmational relevance”. The components of C(h,x) are allowed to be any statements, no reference to a probability model or to joint distributions are required. Then h is “confirmed” or supported by x if P(h|x) > P(h), disconfirmed (or undermined) if P(h|x) < P(h), (else x is confirmationally irrelevant to h). This is the generally accepted view of philosophers of confirmation (or Bayesian formal epistemologists) up to the present. There is generally a background “k” included, but to avoid a blinding mass of symbols I omit it. (We are rarely told how to get the probabilities anyway; but I’m going to leave that to one side, as it will not really matter here.)

A test of any purported philosophical confirmation theory is whether it elucidates or is even in sync with intuitive methodological principles about evidence or testing. One of the first problems that arises stems from asking…

Is Probability a good measure of confirmation?

A natural move then would be to identify the degree of confirmation of h by x with probability P(h|x), (which philosophers sometimes write as P(h,x)). Statement x affords hypothesis h higher confirmation than it does h’ iff P(h|x) > P(h’|x).

Some puzzles immediately arise. Hypothesis h can be confirmed by x, while h’ disconfirmed by x, and yet P(h|x) < P(h’|x). In other words, we can have P(h|x) > P(h) and P(h’|x) < P(h’) and yet P(h|x) < P(h’|x).

Popper (The Logic of Scientific Discovery, 1959, 390) gives this example, (I quote from him, only changing symbols slightly):

Consider the next toss with a homogeneous die.

h: 6 will turn up

h’: 6 will not turn up

x: an even number will turn up.

P(h) = 1/6, p(h’) = 5/6 P(x) = ½

The probability of h is raised by information x, while h’ is undermined by x. (It’s probability goes from 5/6 to 4/6.) If we identify probability with degree of confirmation, x confirms h and disconfirms h’ (i.e., P(h|x) >P(h) and P(h’|x) < P(h’)). Yet because P(h|x) < P(h’|x), h is less well confirmed given x than is h’. (This happens because P(h) is sufficiently low.) So P(h|x) cannot just be identified with the degree of confirmation that x affords h.

Note, these are not real statistical hypotheses but statements of events.

Obviously there needs to be a way to distinguish between some absolute confirmation for h, and a relative measure of how much it has increased due to x. From the start, Rudolf Carnap noted that “the verb ‘to confirm’ is ambiguous” but thought it had “the connotation of ‘making firmer’ even more often than that of ‘making firm’.” (Carnap, Logical Foundations of Probability (2^nd), xviii ). x can increase the firmness of h, but C(h,x) < C(~h,x) (h is more firm, given x, than is ~h). Like Carnap, it’s the ‘making firmer’ that is generally assumed in Bayesian confirmation theory.

But there are many different measures of making firmer (Popper, Carnap, Fitelson). Referring to Popper’s example, we can report the ratio R: P(h|x)/P(h) = 2.

(In this case h’ = ~h).

Or we use the likelihood ratio LR: P(x|h)/P(x|~h) = (1/.4) = 2.5.

Many other ways of measuring the increase in confirmation x affords h could do as well. But what shall we say about the numbers like 2, 2.5? Do they mean the same thing in different contexts? What happens if we get beyond toy examples to scientific hypotheses where ~h would allude to all possible theories not yet thought of. What’s P(x|~h) where ~h is “the catchall” hypothesis asserting “something else”? (see, for example, Mayo 1997)

Perhaps this point won’t prevent confirmation logics from accomplishing the role of capturing and justifying intuitions about confirmation. So let’s consider the value of confirmation theories to that role. One of the early leaders of philosophical Bayesian confirmation, Peter Achinstein (2001), began to have doubts about the value of the philosopher’s a priori project. He even claims, rather provocatively, that “scientists do not and should not take … philosophical accounts of evidence seriously” (p. 9) because they give us formal syntactical (context –free) measures; whereas, scientists look to empirical grounds for confirmation. Philosophical accounts, moreover, make it too easy to confirm. He rejects confirmation as increased firmness, denying it is either necessary or sufficient for evidence. As far as making it too easy to get confirmation, there is the classic problem: it appears we can get everything to confirm everything, so long as one thing is confirmed. This is a famous argument due to Glymour (1980).

Paradox of irrelevant conjunctions

We now switch to emphasizing that the hypotheses may be statistical hypotheses or substantive theories. Both for this reason and because I think they look better, I move away from Popper and Carnap’s lower case letters for hypotheses.

The problem of irrelevant conjunctions (the “tacking paradox”) is this: If x confirms H, then x also confirms (H & J), even if hypothesis J is just “tacked on” to H. As with most of these chestnuts, there is a long history (e.g., Earman 1992, Rosenkrantz 1977), but consider just a leading contemporary representative, Branden Fitelson. Fitelson has importantly emphasized how many different C functions there are for capturing “makes firm”. Fitelson defines:

J is an irrelevant conjunct to H, with respect to x just in case P(x|H) = P(x|J & H).

For instance, x might be radioastronomic data in support of:

H: the deflection of light effect (due to gravity) is as stipulated in the General Theory of Relativity (GTR), 1.75” at the limb of the sun.

and the irrelevant conjunct:

J: the radioactivity of the Fukushima water being dumped in the Pacific ocean is within acceptable levels.

(1) Bayesian (Confirmation) Conjunction: If x Bayesian confirms H, then x Bayesian-confirms (H & J), where P(x| H & J ) = P(x|H) for any J consistent with H.

The reasoning is as follows:

P(x|H) /P(x) > 1 (x Bayesian confirms H)

P(x|H & J) = P(x|H) (given)

So [P(x|H & J) /P(x)]> 1

Therefore x Bayesian confirms (H & J)

However, it is also plausible to hold :

(2) Entailment condition: If x confirms T, and T entails J, then x confirms J.

In particular, if x confirms (H & J), then x confirms J.

(3) From (1) and (2) , if x confirms H, then x confirms J for any irrelevant J consistent with H.

(Assume neither H nor J have probabilities 0 or 1).

It follows that if x confirms any H, then x confirms any J.

Branden Fitelson’s solution

Fitelson (2002), and Fitelson and Hawthorne (2004) offer this “solution”: He will allow that x confirms (H & J), but deny the entailment condition. So, in particular, x confirms the conjunction although x does not confirm the irrelevant conjunct. Moreover, Fitelson shows, even though (H & J) is confirmed by x, (H & J) gets less of a confirmation (firmness) boost than does H—so long as one doesn’t measure the confirmation boost using R: P(h|x)/P(x). If one does use R, then (H & J) is just as well confirmed as is H, which is disturbing.

But even if we use the LR as our firmness boost, I would agree with Glymour that the solution scarcely solves the real problem. Paraphrasing him, we would not be assured by an account that tells us deflection of light data (x) confirms both GTR (H) and the radioactivity of the Fukushima water is within acceptable levels (J), while assuring us that x does not confirm the Fukishima water having acceptable levels of radiation (31).

The tacking paradox is to be expected if confirmation is taken as a variation on probabilistic affirming the consequent. Hypothetico-deductivists had the same problem, which is why Popper said we need to supplement each of the measures of confirmation boost with the condition of “severity”. However, he was unable to characterize severity adequately, and ultimately denied it could be formalized. He left it as an intuitive requirement that before applying any C-function, the confirming evidence must be the result of “a sincere (and ingenious) attempt to falsify the hypothesis” in question. I try to supply a more adequate account of severity (e.g., Mayo 1996, 2/3/12 post (no-pain philosophy III)).

How would the tacking method fare on the severity account? We’re not given the details we’d want for an error statistical appraisal, but let’s do the best with their stipulations. From our necessary condition, we have that (H and J) cannot warrant taking x as evidence for (H and J) if x counts as a highly insevere test of (H and J). The “test process” with tacking is something like this: having confirmed H, tack on any consistent but irrelevant J to obtain (H & J).(Sentence was amended on 10/21/13)

A scrutiny of well-testedness may proceed by denying either condition for severity. To follow the confirmation theorists, let’s grant the fit requirement (since H fits or entails x). This does not constitute having done anything to detect the falsity of H& J. The conjunction has been subjected to a radically non-risky test. (See also 1/2/13 post, esp. 5.3.4 Tacking Paradox Scotched.)

What they call confirmation we call mere “fit”

In fact, all their measures of confirmation C, be it the ratio measure R: P(H|x)/P(H) or the (so-called[1]) likelihood ratio LR: P(H|x)/P(~H|x), or one of the others, count merely as “fit” or “accordance” measures to the error statistician. There is no problem allowing each to be relevant for different problems and different dimensions of evidence. What we need to add in each case are the associated error probabilities:

P([H & J] is Bayesian confirmed; ~(J&H)) = maximal, so x is “bad evidence, no test” (BENT) for the conjunction.

We read “;” as “under the assumption that”.

In fact, all their measures of confirmation C are mere “fit” measures, be it the ratio measure R: P(H|x)/P(H) or the LR or other.

The following was added on 10-21-13: The above probability stems from taking the “fit measure” as a statistic, and assessing error probabilities by taking account the test process, as in error statistics. The result is

SEV[(H & J), tacking test, x] is minimal

I have still further problems with these inductive logic paradigms: an adequate philosophical account should answer questions and explicate principles about the methodology of scientific inference. Yet the Bayesian inductivist starts out assuming the intuition or principle, the task then being the homework problem of assigning priors and likelihoods that mesh with the principles. This often demands beating a Bayesian analysis into line, while still not getting at its genuine rationale. “The idea of putting probabilities over hypotheses delivered to philosophy a godsend, and an entire package of superficiality.” (Glymour 2010, 334). Perhaps philosophers are moving away from analytic reconstructions. Enough tears have been shed. But does an analogous problem crop up in Bayesian logic more generally?

I may update this post, and if I do I will alter the number following the title.

Oct. 20, 2013: I am updating this to reflect corrections pointed out by James Hawthorne, for which I’m very grateful. I will call this draft (ii).

Oct. 21, 2013 (updated in blue). I think another sentence might have accidentally got moved around.

Oct. 23, 2013. Given some issues that cropped up in the discussion (and the fact that certain symbols didn’t always come out right in the comments, I’m placing the point below in Note [2]):

[1] I say “so-called” because there’s no requirement of a proper statistical model here.

[2] Can P = C?

Spoze there’s a case where z confirms hh’ more than z confirms h’: C(hh’,z) > C(h’,z)

Now h’ = (~hh’ or hh’)
So,
(i) C(hh’,z) > C(~hh’ or hh’,z)

Since ~hh’ and hh’ are mutually exclusive, we have from special addition rule
(ii) P(hh’,z) < P(~hh’ or hh’,z)

So if P = C, (i) and (ii) yield a contradiction.

REFERENCES

Achinstein, P. (2001). The Book of Evidence. Oxford: Oxford University Press.

Carnap, R. (1962). Logical Foundations of Probability. Chicago: University of Chicago Press.

Earman, J. (1992). Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory Cambridge MA: MIT Press.

Fitelson, B. (2002). Putting the Irrelevance Back Into the Problem of Irrelevant Conjunction. Philosophy of Science 69(4), 611–622.

Fitelson, B. & Hawthorne, J. (2004). Re-Solving Irrelevant Conjunction with Probabilistic Independence, Philosophy of Science, 71: 505–514.

Glymour, C. (1980) . Theory and Evidence. Princeton: Princeton University Press

_____. (2010). Explanation and Truth. In D. G. Mayo & A. Spanos (Eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, 305–314. Cambridge: Cambridge University Press.

Mayo, D. (1996). Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press.

_____. (1997). “Duhem’s Problem, The Bayesian Way, and Error Statistics, or ‘What’s Belief got To Do With It?‘” and “Response to Howson and Laudan,” Philosophy of Science 64(1): 222-244 and 323-333.

_____. (2010). Explanation and Testing Exchanges with Clark Glymour. In D. G. Mayo & A. Spanos (Eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, 305–314. Cambridge: Cambridge University Press.

Popper, K. (1959). The Logic of Scientific Discovery. New York: Basic Books.

Rosenkranz, R. (1977). Inference, Method and Decision: Towards a Bayesian Philosophy of Science. Dordrecht, The Netherlands: D. Reidel.

Categories: confirmation theory, Philosophy of Statistics, Statistics | Tags: Fitelson | 76 Comments

Blog Contents: September 2013

Posted on October 19, 2013 by Mayo

a tough month in exile

(9/2) Is Bayesian Inference a Religion?

(9/3) Gelman’s response to my comment on Jaynes

(9/5) Stephen Senn: Open Season (guest post)

(9/7) First blog: “Did you hear the one about the frequentist…”? and “Frequentists in Exile”

(9/10) Peircean Induction and the Error-Correcting Thesis (Part I)

(9/10) (Part 2) Peircean Induction and the Error-Correcting Thesis

(9/12) (Part 3) Peircean Induction and the Error-Correcting Thesis

(9/14) “When Bayesian Inference Shatters” Owhadi, Scovel, and Sullivan (guest post)

(9/18) PhilStock: Bad news is good news on Wall St.

(9/18) How to hire a fraudster chauffeur

(9/22) Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

(9/23) Barnard’s Birthday: background, likelihood principle, intentions

(9/24) Gelman est efffectivement une erreur statistician

(9/26) Blog Contents: August 2013

(9/29) Highly probable vs highly probed: Bayesian/ error statistical differences

Compiled by Nicole Jinn

Categories: Metablog | Leave a comment

Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”

Posted on October 12, 2013 by Mayo

Sir David Cox

David Cox sent me a letter relating to my post of Oct.5, 2013. He has his own theory as to who might have been doing the teasing! I’m posting it here, with his permission:

Dear Deborah

I was interested to see the correspondence about Jeffreys and the possible teasing by Neyman’s associate. It brought a number of things to mind.

While I am not at all convinced that any teasing was involved, if there was it seems to me much more likely that Jeffreys was doing the teasing. He, correctly surely, disapproved of that definition and was putting up a highly contrived illustration of its misuse.

In his work he was not writing about a subjective view of probability but about objective degree of belief. He did not disapprove of more physical definitions, such as needed to describe radioactive decay; he preferred to call them chances.

In assessing his work it is important that the part on probability was perhaps 10% of what he did. He was most famous for The earth (1924) which is said to have started the field of geophysics. (The first edition of his 1939 book on probability was in a series of monographs in physics.) The later book with his wife, Bertha, Methods of mathematical physics is a masterpiece.

I heard him speak from time to time and met him personally on a couple of occasions. He was superficially very mild and said very little. He was involved in various controversies but, and I am not sure about this, I don’t think they ever degenerated into personal bitterness. He lived to be 98 and, a mark of his determination is that in his early 90’s he cycled in Cambridge having a series of minor accidents. He was stopped only when Bertha removed the tires from his bike. Bertha was a highly respected teacher of mathematics.

He and R.A.Fisher were not only towering figures in statistics in the first part of the 20^th century but surely among the major applied mathematicians of that era in the world.

Neyman was not at all Germanic, in the sense that one of your correspondents described. He could certainly be autocratic but not in personal manner. While all the others at Berkeley were Professor this or Dr that, he insisted on being called Mr Neyman.

The remarks [i] about how people addressed one another 50 plus years ago in the UK are broadly accurate, although they were not specific to Cambridge and certainly could be varied. From about age 11 boys in school, students and men in the workplace addressed one another by surname only. Given names were for family and very close friends. Women did use given names or were Miss or Mrs, certainly never Madam unless they were French aristocrats. Thus in 1950 or so I worked with, published with and was very friendly with two physical scientists, R.C. Palmer and S.L. Anderson. I have no idea what their given names were; it was irrelevant. To address someone you did not know by name you used Sir or Madam. It would be very foolish to think that meant unfriendliness or that the current practice of calling absolutely everyone by their given name means universal benevolence.

Best wishes

David

D.R.Cox
Nuffield College
Oxford
UK

[i]In comments to this post.

Categories: phil/history of stat, Statistics | Tags: D.R. Cox | 13 Comments

Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock

Posted on October 9, 2013 by Mayo

There’s an update (with overview) on the infamous Harkonen case in Nature with the dubious title “Uncertainty on Trial“, first discussed in my (11/13/12) post “Bad statistics: Crime or Free speech”, and continued here. The new Nature article quotes from Steven Goodman:

“You don’t want to have on the books a conviction for a practice that many scientists do, and in fact think is critical to medical research,” says Steven Goodman, an epidemiologist at Stanford University in California who has filed a brief in support of Harkonen……

Goodman, who was paid by Harkonen to consult on the case, contends that the government’s case is based on faulty reasoning, incorrectly equating an arbitrary threshold of statistical significance with truth. “How high does probability have to be before you’re thrown in jail?” he asks. “This would be a lot like throwing weathermen in jail if they predicted a 40% chance of rain, and it rained.”

I don’t think the case at hand is akin to the exploratory research that Goodman likely has in mind, and the rain analogy seems very far-fetched. (There’s much more to the context, but the links should suffice.) Lawyer Nathan Schachtmen also has an update on his blog today. He and I usually concur, but we largely disagree on this one[i]. I see no new information that would lead me to shift my earlier arguments on the evidential issues. From a Dec. 17, 2012 post on Schachtman (“multiplicity and duplicity”):

So what’s the allegation that the prosecutors are being duplicitous about statistical evidence in the case discussed in my two previous (‘Bad Statistics’) posts? As a non-lawyer, I will ponder only the evidential (and not the criminal) issues involved.

“After the conviction, Dr. Harkonen’s counsel moved for a new trial on grounds of newly discovered evidence. Dr. Harkonen’s counsel hoisted the prosecutors with their own petards, by quoting the government’s amicus brief to the United States Supreme Court in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011). In Matrixx, the securities fraud plaintiffs contended that they need not plead ‘statistically significant’ evidence for adverse drug effects.” (Schachtman’s part 2, ‘The Duplicity Problem – The Matrixx Motion’)

The Matrixx case is another philstat/law/stock example taken up in this blog here, here, and here. Why are the Harkonen prosecutors “hoisted with their own petards” (a great expression, by the way)? Continue reading →

Categories: PhilStatLaw, PhilStock, statistical tests, Statistics | Tags: philstock | 23 Comments

Was Janina Hosiasson pulling Harold Jeffreys’ leg?

Posted on October 5, 2013 by Mayo

Hosiasson 1899-1942

The very fact that Jerzy Neyman considers she might have been playing a “mischievous joke” on Harold Jeffreys (concerning probability) is enough to intrigue and impress me (with Hosiasson!). I’ve long been curious about what really happened. Eleonore Stump, a leading medieval philosopher and friend (and one-time colleague), and I pledged to travel to Vilnius to research Hosiasson. I first heard her name from Neyman’s dedication of Lectures and Conferences in Mathematical Statistics and Probability: “To the memory of: Janina Hosiasson, murdered by the Gestapo” along with around 9 other “colleagues and friends lost during World War II.” (He doesn’t mention her husband Lindenbaum, shot alongside her*.) Hosiasson is responsible for Hempel’s Raven Paradox, and I definitely think we should be calling it Hosiasson’s (Raven) Paradox for much of the lost credit to her contributions to Carnapian confirmation theory[i].

But what about this mischievous joke she might have pulled off with Harold Jeffreys? Or did Jeffreys misunderstand what she intended to say about this howler, or? Since it’s a weekend and all of the U.S. monuments and parks are shut down, you might read this snippet and share your speculations…. The following is from Neyman 1952:

“Example 6.—The inclusion of the present example is occasioned by certain statements of Harold Jeffreys (1939, 300) which suggest that, in spite of my  insistence on the phrase, “probability that an object A will possess the  property B,” and in spite of the five foregoing examples, the definition of  probability given above may be misunderstood.  Jeffreys is an important proponent of the subjective theory of probability  designed to measure the “degree of reasonable belief.” His ideas on the  subject are quite radical. He claims (1939, 303) that no consistent theory of probability is possible without the basic notion of degrees of reasonable belief.  His further contention is that proponents of theories of probabilities alternative to his own forget their definitions “before the ink is dry.” In  Jeffreys’ opinion, they use the notion of reasonable belief without ever  noticing that they are using it and, by so doing, contradict the principles  which they have laid down at the outset.

The necessity of any given axiom in a mathematical theory is something  which is subject to proof. … However, Dr. Jeffreys’ contention that the notion of degrees of reasonable  belief and his Axiom 1are necessary for the development of the theory  of probability is not backed by any attempt at proof. Instead, he considers  definitions of probability alternative to his own and attempts to show by  example that, if these definitions are adhered to, the results of their application would be totally unreasonable and unacceptable to anyone. Some  of the examples are striking. On page 300, Jeffreys refers to an article of  mine in which probability is defined exactly as it is in the present volume.  Jeffreys writes:

 The first definition is sometimes called the “classical” one, and is stated in much  modern work, notably that of J. Neyman.

However, Jeffreys does not quote the definition that I use but chooses  to reword it as follows:

If there are n possible alternatives, for m of which p is true, then the probability of  p is defined to be m/n. 

He goes on to say:

The first definition appears at the beginning of De Moivre’s book (Doctrine of  Chances, 1738). It often gives a definite value to a probability; the trouble is that the  value is one that its user immediately rejects. Thus suppose that we are considering  two boxes, one containing one white and one black ball, and the other one white and  two black. A box is to be selected at random and then a ball at random from that box.  What is the probability that the ball will be white? There are five balls, two of which  are white. Therefore, according to the definition, the probability is 2/5. But most  statistical writers, including, I think, most of those that professedly accept the definition, would give (1/2)•(1/2) + (1/2)•(1/3) = 5/12. This follows at once on the present theory,  the terms representing two applications of the product rule to give the probability of  drawing each of the two white balls. These are then added by the addition rule. But  the proposition cannot be expressed as the disjunction of five alternatives out of twelve.  My attention was called to this point by Miss J. Hosiasson. 

The solution, 2/5, suggested by Jeffreys as the result of an allegedly  strict application of my definition of probability is obviously wrong. The  mistake seems to be due to Jeffreys’ apparently harmless rewording of the  definition. If we adhere to the original wording (p. 4) and, in particular, to the  phrase “probability of an object A having the property B,” then, prior to attempting a solution, we would probably ask ourselves the questions:  ”What are the ‘objects A’ in this particular case?” and “What is the ’property B,’ the probability of which it is desired to compute?” Once  these questions have been asked, the answer to them usually follows and  determines the solution.

In the particular example of Dr. Jeffreys, the objects A are obviously  not balls, but pairs of random selections, the first of a box and the second  of a ball. If we like to state the problem without dangerous abbreviations,  the probability sought is that of a pair of selections ending with a white  ball. All the conditions of there being two boxes, the first with two balls  only and the second with three, etc., must be interpreted as picturesque  descriptions of the F.P.S. of pairs of selections. The elements of this set  fall into four categories, conveniently described by pairs of symbols (1,w), (1,b), (2,w), (2,b), so that, for example, (2,w) stands for a pair of  selections in which the second box was selected in the first instance, and  then this was followed by the selection of the white ball. Denote by n1,w, n1,b, n2,w, and n2,b the (unknown) numbers of the elements of the F.P.S. belonging to each of the above categories, and by n their sum. Then the probability sought is “(Neyman 1952, 10-11).

Then there are the detailed computations from which Neyman gets the right answer (entered 10/9/13):

P{w|pair of selections} = (n1,w + n2,w)/n.

The conditions of the problem imply

P{1|pair of selections} = (n1,w + n1,b)/n = ½,

P{2|pair of selections} = (n2,w + n2,b)/n = ½,

P{w| pair of selections beginning with box No. 1} = n1,w/(n1,w + n1,b) = ½,

P{w| pair of selections beginning with box No. 2} = n2,w/(n2,w + n2,b) = 1/3.

It follows

n1,w = 1/2(n1,w + n1,b) = n/4,

n2,w = 1/3(n2,w + n2,b) = n/6,

P{w|pair of selections} = 5/12.

The method of computing probability used here is a direct enumeration  of elements of the F.P.S. For this reason it is called the “direct method.”  As we can see from this particular example, the direct method is occasionally cumbersome and the correct solution is more easily reached through  the application of certain theorems basic in the theory of probability. These theorems, the addition theorem and the multiplication theorem, are very  easy to apply, with the result that students frequently manage to learn the  machinery of application without understanding the theorems. To check  whether or not a student does understand the theorems, it is advisable to  ask him to solve problems by the direct method. If he cannot, then he  does not understand what he is doing.

Checks of this kind were part of the regular program of instruction in  Warsaw where Miss Hosiasson was one of my assistants. Miss Hosiasson  was a very talented lady who has written several interesting contributions  to the theory of probability. One of these papers deals specifically with  various misunderstandings which, under the high sounding name of paradoxes, still litter the scientific books and journals. Most of these paradoxes originate from lack of precision in stating the conditions of the  problems studied. In these circumstances, it is most unlikely that Miss  Hosiasson could fail in the application of the direct method to a simple  problem like the one described by Dr. Jeffreys. On the other hand, I can  well imagine Miss Hosiasson making a somewhat mischievous joke. 

Some of the paradoxes solved by Miss Hosiasson are quite amusing…….”  (Neyman 1952, 10-13)

What think you? I will offer a first speculation in a comment.

The entire book Neyman (1952) may be found here, in plain text, here.

*June, 2017: I read somewhere today that her husband was killed in 41, so before she was, but all refs I know are sketchy.

[i]Of course there are many good, recent sources on the philosophy and history of Carnap, some of which mention her, but obviously do not touch on this matter. I read that Hosiasson was trying to build a Carnapian-style inductive logic setting out axioms (which to my knowledge Carnap never did). That was what some of my fledgling graduate school attempts had tried, but the axioms always seemed to admit counterexamples (if non-trivial). So much for the purely syntactic approach. But I wish I’d known of her attempts back then, and especially her treatment of paradoxes of confirmation. {I’m sometimes tempted to give a logic for severity, but I fight the temptation.)

REFERENCES

Hosiasson, J. (1931) Why do we prefer probabilities relative to many data? Mind 40 (157): 23-36 (1931)

Hosiasson-Lindenbaum, J. (1940) On confirmation Journal of Symbolic Logic 5 (4): 133-148 (1940)

Hosiasson, J. (1941) Induction et analogie: Comparaison de leur fondement Mind 50 (200): 351-365 (1941)

Hosiasson-Lindenbaum, J. (1948) Theoretical Aspects of the Advancement of Knowledge Synthese 7 (4/5):253 – 261 (1948)

Jeffreys, H. (1939) Theory of Probability (1st ed.). Oxford: The Clarendon Press

Neyman, J. (1952) Lectures and Conferences in Mathematical Statistics and Probability. Graduate School, U.S. Dept. of Agriculture

Categories: Hosiasson, phil/history of stat, Statistics | Tags: Jerzy Neyman, Neyman | 22 Comments

Will the Real Junk Science Please Stand Up? (critical thinking)

Posted on October 3, 2013 by Mayo

Equivocations about “junk science” came up in today’s “critical thinking” class; if anything, the current situation is worse than 2 years ago when I posted this.

Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied. Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!

A given print exposé exploring the distortions of fact on one side or the other routinely showers wild praise on their side’s—their science’s and their policy’s—objectivity, their adherence to the facts, just the facts. How impressed might we be with the text or the group that admitted to its own biases?

Take, say, global warming, genetically modified crops, electric-power lines, medical diagnostic testing. Group 1 alleges that those who point up the risks (actual or potential) have a vested interest in construing the evidence that exists (and the gaps in the evidence) accordingly, which may bias the relevant science and pressure scientists to be politically correct. Group 2 alleges the reverse, pointing to industry biases in the analysis or reanalysis of data and pressures on scientists doing industry-funded work to go along to get along.

When the battle between the two groups is joined, issues of evidence—what counts as bad/good evidence for a given claim—and issues of regulation and policy—what are “acceptable” standards of risk/benefit—may become so entangled that no one recognizes how much of the disagreement stems from divergent assumptions about how models are produced and used, as well as from contrary stands on the foundations of uncertain knowledge and statistical inference. The core disagreement is mistakenly attributed to divergent policy values, at least for the most part. Continue reading →

Categories: critical thinking, junk science, Objectivity | Tags: David Michaels, evidence based policy, Evidence-based medicine, Junk science, risk assessment | 16 Comments

Monthly Archives: October 2013

WHIPPING BOYS AND WITCH HUNTERS

Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)

Bayesian confirmation theory: example from last post…

Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*

Blog Contents: September 2013

Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”

Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock

Was Janina Hosiasson pulling Harold Jeffreys’ leg?

Will the Real Junk Science Please Stand Up? (critical thinking)

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018. All Rights Reserved.