The comment box was too small for my reply to Sober on falsification, so I will post it here:

I want to understand better Sober’s position on falsification. A pervasive idea to which many still subscribe, myself included, is that the heart of what makes inquiry scientific is the critical attitude: that if a claim or hypothesis or model fails to stand up to critical scrutiny it is rejected as false, and not propped up with various “face-saving” devices. Now

Sober writes “I agree that we can get rid of models that deductively entail (perhaps with the help of auxiliary assumptions) observational outcomes that do not happen. But as soon as the relation is nondeductive, is there ‘falsification’”?

My answer is yes, else we could scarcely retain the critical attitude for any but the most trivial scientific claims. While at one time philosophers imagined that “observational reports” were given, and could therefore form the basis for a deductive falsification of scientific claims, certainly since Popper, Kuhn and the rest of the post-positivists, we recognize that observations are error prone, as are appeals to auxiliary hypotheses. Here is Popper:

*“We say that a theory is falsified only if we have accepted basic statements which contradict it….This condition is necessary but not sufficient; for we have seen that non-reproducible single occurrences are of no significance to science. Thus a few stray basic statements contradicting a theory will hardly induce us to reject it as falsified. We shall take it as falsified only if we discover a reproducible effect which refutes the theory. In other words, we only accept the falsification if a low level empirical hypothesis which describes such an effect is proposed and corroborated. (Popper LSD, 1959, 203).*

To stress that the anomalous evidence is itself a hypothesis, if at a low level, he calls it a *falsifying hypothesis*.

Even in arguing from coincidence to the existence of a “real effect” one is falsifying a hypothesis that the effect is mere chance, due to artifacts, non-reproducible, spurious, or the like. In the GTR high precision null hypotheses tests, bounds for the parameters are inferred by rejecting (or falsifying) discrepancies beyond the indicated limits. So I wonder how restrictive or local or empirical the hypotheses have to be in order for Sober to allow them to be open to genuine falsification.

We have at various times discussed related issues on this blog, e.g.,http://errorstatistics.com/2012/02/01/no-pain-philosophy-skepticism-rationality-popper-and-all-that-part-2-duhems-problem-methodological-falsification/

I can’t answer for the likelihoodists, but I can give my own Bayesian view. Strictly speaking, a hypothesis is only falsified by data that entails its falsehood; otherwise it has positive probability. However, a restricted analysis that gives the catchall hypothesis zero probability provides an upper bound to a hypothesis’s unrestricted posterior probability. Likewise, the restricted analysis provides an upper bound to a hypothesis’s contribution to the posterior predictive distribution. When a hypothesis’s posterior probability is so low that it contributes negligibly to all predictions of interest, it may be regarded as falsified for all practical purposes.

How low is `low` to warrant falsification? Is there a convention?

There’s no convention that I know of, but it really comes down to practical concerns: we’ll only calculate a prediction to finite precision because we can only measure data to finite precision. Under these conditions, there will be some threshold such that if the posterior probability of a hypothesis is lower than the threshold, neglecting it will have a *strictly nil* effect on the prediction.

John: Not sure if you’re asking me or Corey. If me, in some cases, the “falsifying hypothesis” takes the form of inferring a discrepancy of given size (from the claim to be falsified). In other cases, H is falsified by warranting, with reasonable severity, the existence of an effect that is qualitatively at odds with a given claim, e.g.,there is evidence for an effect in the opposite direction to what H predicts. Many other kinds of cases exist. In general the falsification of H goes hand in hand with inferring “not-H” with severity (i.e., inferring the presence of an error that H claims is absent). The only way to uphold H as “fitting” the data in such cases is by means of methods that are highly unreliable.

I was asking Corey. It is a little puzzling, but interesting to see a Bayesian talk about something like a cutoff value. Their philosophy does not seem to warrant it, ever, as far as I can tell. The severity argument is easier for me to understand.

The short version of my statement: it’s possible to collect so much Bayesian evidence against a hypothesis that testing it further is beyond the capabilities of our measuring instruments.

So this kind of “falsification” is relative to what we can practically measure, which can change if our measurement technology improves. Hmm… also, I guess a hypothesis should not be regarded as falsified per se unless the measuring instruments are already very precise.

Corey: Even if one added a falsification rule to the Bayesian account, how do you ever get anything new if one had to distribute the prior probabilities over all possibilities at any give time? Why would scientists probe and criticize and try to find a domain on which to reject some aspect of a theory if adequate prediction were all that was needed? If the posterior probability in Newton, say, was .99, why look for rivals (e.g., relativistic) which are not even accorded probability in the Newtonian universe? Starting out assuming that every proposition has a probability assignment that a scientist will use in inquiry, and in interpreting new data, is just vastly at odds with practice. One can at best retrospectively (“painting by numbers”) make up an assignment that could have been had, but that’s no grounds for such an account of inference.

Mayo: That’s a lot of questions!

“Starting out assuming that every proposition has a probability assignment that a scientist will use in inquiry, and in interpreting new data, is just vastly at odds with practice.”

I knew that eventually we’d reach a point where our interests diverge, and that point is normative versus descriptive accounts of inference. I’m not equipped to philosophize about how science is *actually* done beyond my own personal experience. I find your description in EGEK of how science is generally conducted more plausible than the Bayesian accounts you discuss. (You won’t find me defending Howson and Urbach.) I do have a set of desiderata that determine a calculus of plausible inference that I find satisfactory. Science comes to plausible (to me) conclusions to the extent that its methods approximate that calculus. I think that science doesn’t find accurate descriptions of the world as fast as it could if it used better approximations.

“Even if one added a falsification rule to the Bayesian account, how do you ever get anything new if one had to distribute the prior probabilities over all possibilities at any give time?”

In theory, one doesn’t get anything new. The only hypotheses we humans can handle are computable ones — the ones we can code up as algorithms. The set of algorithms (ok, the set of inputs for a given prefix-free universal Turing machine) is enumerable, so it is possible to assign a prior probability to all hypotheses. In practice, one doesn’t do this — even in theory such an approach would founder on the halting problem. To understand actual practice, it is critical to keep in mind distinction between the prior information and a prior distribution chosen to encode that information. *As an approximation* one usually concentrates all of the prior probability on a set of hypotheses having some structure thought to be useful. But perhaps the data are informative enough to resurrect some “dead” hypothesis/hypotheses that had *nearly* zero actually prior probability and so were assigned zero prior probability in the approximation. (See chapter 4 of Jaynes’s “Probability Theory: The Logic of Science” for a toy example.) Thus the approximation has turned out to be poor, and one expands the hypothesis space and reruns the analysis.

“Why would scientists probe and criticize and try to find a domain on which to reject some aspect of a theory if adequate prediction were all that was needed?”

This one’s a biggie, and I’m out of time. Let’s have lunch if I’m ever in the same city as you.

Corey: I was reviewing comments on the blog, and I find yours quite interesting (and not just this one). The distinction you mention between prior info and “coding” it Bayesianly is relevant to our current discussion, but recall the use of prior info in my recent ESP posts (which consisted of a repertoire of flaws and foibles that prevent any observed successes to count as corroborating ESP. How do we code this? And why bother trying to when we know darn well how to directly use this information for appraising and designing inquiries.

“Let’s have lunch if I’m ever in the same city as you.” Sure. That would be Blacksburg, VA, London, UK, New York City, or Elba, Italy (each with various probabilities).

If your analysis gives you something continuous and you want to make a yes/no-decision, I guess there is no other way than using a cutoff.

Taintt worthwhile to wear a day all out before it comes. by Sarah Orne Jewett. Justin Bieber Supra http://www.2013suprafootwearoutlet.co