Posts Tagged With: severity

Jean Miller: Happy Sweet 16 to EGEK #2 (Hasok Chang Review of EGEK)

Jean Miller here, reporting back from the island. Tonight we complete our “sweet sixteen” celebration of Mayo’s EGEK (1996) with the book review by Dr. Hasok Chang (currently the Hans Rausing Professor of History and Philosophy of Science at the University of Cambridge). His was chosen as our top favorite in the category of ‘reviews by philosophers’. Enjoy!

REVIEW: British Journal of the Philosophy of Science 48 (1997), 455-459
DEBORAH MAYO Error and the Growth of Experimental Knowledge, 
The University of Chicago Press, 1996
By: Hasok Chang

Deborah Mayo’s Error and the Growth of Experimental Knowledge is a rich, useful, and accessible book. It is also a large volume which few people can realistically be expected to read cover to cover. Considering those factors, the main focus of this review will be on providing various potential readers with guidelines for making the best use of the book.

As the author herself advises, the main points can be grasped by reading the first and the last chapters. The real benefit, however, would only come from studying some of the intervening chapters closely. Below I will offer comments on several of the major strands that can be teased apart, though they are found rightly intertwined in the book. Continue reading

Categories: philosophy of science, Statistics | Tags: , , , | 2 Comments

Jean Miller: Happy Sweet 16 to EGEK! (Shalizi Review: “We Have Ways of Making You Talk”)

Jean Miller here.  (I obtained my PhD with D. Mayo in Phil/STS at VT.) Some of us “island philosophers” have been looking to pick our favorite book reviews of EGEK (Mayo 1996; Lakatos Prize 1999) to celebrate its “sweet sixteen” this month. This review, by Dr. Cosma Shalizi (CMU, Stat) has been chosen as the top favorite (in the category of reviews outside philosophy).  Below are some excerpts–it was hard to pick, as each paragraph held some new surprise, or unique way to succinctly nail down the views in EGEK. You can read the full review here. Enjoy.

“We Have Ways of Making You Talk, or, Long Live Peircism-Popperism-Neyman-Pearson Thought!”
by Cosma Shalizi

After I’d bungled teaching it enough times to have an idea of what I was doing, one of the first things students in my introductory physics classes learned (or anyway were taught), and which I kept hammering at all semester, was error analysis: estimating the uncertainty in measurements, propagating errors from measured quantities into calculated ones, and some very quick and dirty significance tests, tests for whether or not two numbers agree, within their associated margins of error. I did this for purely pragmatic reasons: it seemed like one of the most useful things we were supposed to teach, and also one of the few areas where what I did had any discernible effect on what they learnt. Now that I’ve read Mayo’s book, I’ll be able to offer another excuse to my students the next time I teach error analysis, namely, that it’s how science really works.

I exaggerate her conclusion slightly, but only slightly. Mayo is a dues-paying philosopher of science (literally, it seems), and like most of the breed these days is largely concerned with questions of method and justification, of “ampliative inference” (C. S. Peirce) or “non-demonstrative inference” (Bertrand Russell). Put bluntly and concretely: why, since neither can be deduced rigorously from unquestionable premises, should we put more trust in David Grinspoon‘s ideas about Venus than in those of Immanuel Velikovsky? A nice answer would be something like, “because good scientific theories are arrived at by employing thus-and-such a method, which infallibly leads to the truth, for the following self-evident reasons.” A nice answer, but not one which is seriously entertained by anyone these days, apart from some professors of sociology and literature moonlighting in the construction of straw men. In the real world, science is alas fallible, subject to constant correction, and very messy. Still, mess and all, we somehow or other come up with reliable, codified knowledge about the world, and it would be nice to know how the trick is turned: not only would it satisfy curiosity (“the most agreeable of all vices” — Nietzsche), and help silence such people as do, in fact, prefer Velikovsky to Grinspoon, but it might lead us to better ways of turning the trick. Asking scientists themselves is nearly useless: you’ll almost certainly just get a recital of whichever school of methodology we happened to blunder into in college, or impatience at asking silly questions and keeping us from the lab. If this vice is to be indulged in, someone other than scientists will have to do it: namely, the methodologists. Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , | 33 Comments

Going Where the Data Take Us

A reader, Cory J, sent me a question in relation to a talk of mine he once attended:

I have the vague ‘memory’ of an example that was intended to bring out a central difference between broadly Bayesian methodology and broadly classical statistics.  I had thought it involved a case in which a Bayesian would say that the data should be conditionalized on, and supports H, whereas a classical statistician effectively says that the data provides no support to H.  …We know the data, but we also know of the data that only ‘supporting’ data would be given us.  A Bayesian was then supposed to say that we should conditionalize on the data that we have, even if we know that we wouldn’t have been given contrary data had it been available.

That only “supporting” data would be presented need not be problematic in itself; it all depends on how this is interpreted.  There might be no negative results to be had (H might be true) , and thus none to “be given us”.  Your last phrase, however, does describe a pejorative case for a frequentist error statistician, in that, if “we wouldn’t have been given contrary data” to H (in the sense of data in conflict with what H asserts), even “had it been available” then the procedure had no chance of finding or reporting flaws in H.  Thus only data in accordance with H would be presented, even if H is false; so H passes a “test” with minimal stringency or severity. I discuss several examples in papers below (I think the reader had in mind Mayo and Kruse 2001). Continue reading

Categories: double-counting, Statistics | Tags: , , | 4 Comments

Neyman’s Nursery (NN5): Final Post

I want to complete the Neyman’s Nursery (NN) meanderings while we have some numbers before us, and while there is a particular example, test T+, on the table.  Despite my warm and affectionate welcoming of the “power analytic” reasoning I unearthed in those “hidden Neyman” papers (see post from Oct. 22)– admittedly, largely lost in the standard decision-behavior model of tests–, it still retains an unacceptable coarseness: power is always calculated relative to the cut-off point ca for rejecting H0.  But rather than throw out the baby with the bathwater, we should keep the logic and take account of the actual value of the statistically insignificant result.

__________________________________

(For those just tuning in, power analytic reasoning aims to avoid the age-old fallacy of taking a statistically insignificant result as evidence of 0 discrepancy from the null hypothesis, by identifying discrepancies that can and cannot be ruled out.  For our test T+, we reason from insignificant results to inferences of the form:  μ < μ0 + γ.

We are illustrating (as does Neyman) with a one-sided test T+, with μ0 = 0, and α=.025.  Spoze σ = 1, n = 25, so X is statistically significant only if it exceeds .392.

Power-analytic reasoning says (in relation to our test T+):

If X is statistically insignificant and the POW(T+, μ= μ1) is high, then X indicates, or warrants inferring (or whatever phrase you like) that  μ < μ1.)

_______________________________

Suppose one had an insignificant result from test T+  and wanted to evaluate the inference:   μ < .4

(it doesn’t matter why just now, this is an illustration).

Since POW(T+, μ =.4) is hardly more than .5, Neyman would say “it was a little rash” to regard the observed mean as indicating μ < .4 . He would say this regardless of the actual value of the statistically insignificant result.  There’s no place in the power calculation, as defined, to take into account the actual observed value.1

That is why, although  high power to detect  μ as large as  μ1 is sufficient for regarding the data as good evidence that   μ < μ1 , it is too coarse to be a necessary condition.  Spoze, for example, that X = 0.

Were μ as large as .4 we would have observed a larger observed difference from the null than we did with high probability (~.98). Therefore, our data provide evidence that μ < .4.2

We might say that the severity associated with μ < .4 is high.  There are many ways to articulate the associated justification—I have done so at length earlier; and of course it is no different from “power analytic reasoning”.  Why consider a miss as good as a mile?

When I first introduced this idea in my Ph.D dissertation, I assumed researchers already did this, in real life, since it introduces no new logic.  But I’ve been surprised not to find it.

I was (and am) puzzled to discover under “observed power” the Shpower computation, which we have already considered and (hopefully) gotten past—at least for present purposes, namely, reasoning from insignificant results to inferences of the form: μ < μ0 + γ.

Granted, there are some computations which you might say lead to virtually the same results as SEV, e.g., certain confidence limits, but even so there are differences of interpretation.3  Let me know if you think I am wrong, there may well be something out there I haven’t seen….
____________
(1) This does not mean the place to enter it is in the hypothesized value of under which the power is computed (as with Shpower). This is NOT power, and as we have seen in two posts, it is fallacious to equate it to power or to power analytic reasoning. Note that the Shpower associated with X = 0 is .025—that we are interested in μ < .4 does not enter.

(2) It doesn’t matter here if we use ≤ or < .

(3) For differences between SEV and confidence intervals, see Mayo 1996, Mayo and Spanos 2006, 2011.

Categories: Neyman's Nursery, Statistics | Tags: , | Leave a comment

Blog at WordPress.com.