April 3, 2014: We interspersed discussion with slides; these cover the main readings of the day (check syllabus): the Duhem’s Problem and the Bayesian Way, and “Highly probable vs Highly Probed”. syllabus four. Slides are below (followers of this blog will be familiar with most of this, e.g., here). We also did further work on misspecification testing.

Monday, April 7, is an optional outing, “a seminar class trip”

you might say, here at Thebes at which time we will analyze the statistical curves of the mountains, pie charts of pizza, and (seriously) study some experiments on the problem of replication in “the Hamlet Effect in social psychology”. If you’re around please bop in!

Mayo’s slides on Duhem’s Problem and more from April 3 (Day#9):

Mayo: I am trying to solve the hypothesis testing problem from the first few slides and feel like I am missing a bit of information.

You give the priors p(A)=0.6, p(H)=.9

as well as three likelihoods p(e’|A,-H)=1e-3, p(e’|-A,H)=5e-2 and p(e’|-A,-H)=5e-2. What is the fourth likelihood value p(e’|A,H)? I figure I could back-calculate it from the results on page 6, but that wouldn’t be a proper test.

Scratch that. I just can’t do math properly.

West: Good, I was going to just link you to:

http://www.phil.vt.edu/dmayo/personal_website/(1997) Duhem’s Problem the Bayesian Way and Error Statistics or What’s Belief Got to Do with It.pdf

These are the computations from what I call Dorling’s “homework problem”, and I grant they work.

Mayo: The bit I was missing was the assumption that the likelihood p(e’|A,H) = 0. Here are the priors and posteriors for each composite hypothesis:

>>> H,A 0.54 0.000

>>> H,-A 0.36 0.897

>>> -H,-A 0.04 0.100

>>> -H,A 0.06 0.003

These don’t seem like too surprising of results. You have a strong prior for p(H) and while the data doesn’t match (H,A) at all, it provides little support for (-H,*) as well. The auxiliary hypothesis gets thrown out… which seems sensible.

Neither in the lectures or the linked paper, do I see the error statistical method applied to this data (i.e. calculations). Is it possible to do so in this example or do you need additional information?

West: yes it would 0 since it’s an assumed deductive anomaly. My non-probabilistic rendering indeed shows the upshot not to be surprising because it’s essentially what you get deductively. That’s been my point, there’s nothing added with the probs.

I’m not sure what you mean by applying error stat to this data, I don’t agree with this (reconstruction of the) data. I do criticize the reconstruction for allowing A to be blamed solely on grounds of high belief in Newton’s theory. (Thanks to the single probability pie.) More generally, Dorling’s “homework problem” can allow warranting any assignment of blame, whether to auxiliaries or theory, whether well or poorly probed.

Mayo: I have a slightly different formulation of H & A that makes the Bayesian inference seem perfectly reasonable.

>> H: Newtonian theory of gravity (Force ~ 1/r^2)

>> A: experiment accurately measured position of the moon in its orbit

When outside observers are presented with evidence that a rigorously tested physical model is wrong, the reasonable response is to first question whether the observations were done correctly. Quantifying this deductive inference is quite a boon.

Computing the posteriors for the composite hypotheses is not the end of the investigation, but it does point to what things needed further investigation.

With the information you have provided on slide #5, is there a way to quantify “how poorly probed” a single or composite hypothesis is? If not, what additional data is necessary to do so?

The above is really what I am looking for. Sorry to make you recapitulate your class lecture.

West: Yes in formal statistical settings we use error probabilities of procedure to assess and control “how poorly probed” claims are. For example, if a method would very often result in an accordance between data and H even though H is specifiably false, then H has passed a test that lacks probativeness. It’s an insevere test. In informal setting, we identify procedures that block or advance severe probing. You can find many discussions of severity in my articles and on this blog. thanks for your interest.