Author Archives: Mayo

King Tut Includes ErrorStatistics in Top 50 Statblogs!

http://www.thebestcolleges.org/best-statistics-blogs/

I didn’t think our little rag tag blog-in-exile was even noticed. I’m glad to discover several other sites I was unaware of (providing yet more grist for our mills).

(Note: I am not at all happy with the way the comments are appearing here; there’s insufficient space.  I will be investigating better solutions…..I’m aware of the problem.)

I will soon be departing from this cushy chateau, where even King Tut reads EGEK.

(1) PSX-Second International Workshop on the Philosophy of Scientific Experimentation, 21-2 October, University of Konstanz

http://www.uni-konstanz.de/FuF/Philo/Philosophie/philosophie/329-1-PSX2.html

(2) Lorentz Center: Error in the Sciences, 24-28 October

http://www.lorentzcenter.nl/lc/web/2011/460/info.php3?wsid=460

Then on to England, in time for Halloween.

My Halloween costume?  You can take a guess to win  an authentic, chef-signed napkin from the Elbar room (no winners last week).

Categories: Statistics | Tags: | 1 Comment

Objectivity #2: The “Dirty Hands” Argument for Ethics in Evidence

Some argue that generating and interpreting data for purposes of risk assessment invariably introduces ethical (and other value) considerations that might not only go beyond, but might even conflict with, the “accepted canons of objective scientific reporting.”  This thesis, we may call it the thesis of ethics in evidence and inference, some think, shows that an ethical interpretation of evidence may warrant violating canons of scientific objectivity, and even that a scientist must choose between norms of morality and objectivity.

The reasoning is that since the scientists’ hands must invariably get “dirty” with policy and other values, they should opt for interpreting evidence in a way that promotes ethically sound values, or maximizes public benefit (in some sense).

I call this the “dirty hands” argument, alluding to a term used by philosopher Carl Cranor (1994).1

I cannot say how far its proponents would endorse taking the argument.2 However, it seems that if this thesis is accepted, it may be possible to regard as “unethical” the objective reporting of scientific uncertainties in evidence.  This consequence is worrisome: in fact, it would conflict with the generally accepted imperative for an ethical interpretation of scientific evidence.

Nevertheless, the “dirty hands” argument as advanced has apparently plausible premises, one or more of which would need to be denied to avoid the conclusion which otherwise follows deductively. It goes roughly as follows:

  1. Whether observed data are taken as evidence of a risk depends on a methodological decision as to when to reject the null hypothesis of no risk  H0 (and infer the data are evidence of a risk).
  2. Thus interpreting data to feed into policy decisions with potentially serious risks to the public, the scientist is actually engaged in matters of policy (what is generally framed as an issue of evidence and science, is actually an issue of policy values, ethics, and politics).
  3.  The public funds scientific research and the scientist should be responsible for promoting the public good, so scientists should interpret risk evidence so as to maximize public benefit.
  4. Therefore, a responsible (ethical) interpretation of scientific data on risks is one that maximizes public benefit–and one that does not do so is irresponsible or unethical.
  5. Public benefit is maximized by minimizing the chance of failing to find a risk.  This leads to the conclusion in 6:
  6. CONCLUSION: In situations of risk assessment the ethical interpreter of evidence will maximize the chance of inferring there is a risk–even if this means inferring a risk when there is none with high probability (or at least a probability much higher than is normally countenanced)

The argument about ethics in evidence is often put in terms of balancing type 1 and 2 errors.

Type I error:test T finds evidence of an increased risk ( H0 is rejected), when in fact the risk is absent (false positive)

Type II error:
test T does not find evidence of an increased risk ( H0 is accepted), when in fact an increased risk δ is present (false negative).

The traditional balance of type I and type II error probabilities, wherein type I errors are minimized, some argue, is unethical. Rather than minimize type I errors, it might be  claimed, an “ethical” tester should minimize type II errors.

I claim that at least 3 of the premises, while plausible-sounding, are false.  What do you think?
_____________________________________________________

(1) Cranor (to my knowledge) was among the first to articulate the argument in philosophy, in relation to statistical significance tests (it is echoed by more recent philosophers of evidence based policy):

Scientists should adopt more health protective evidentiary standards, even when they are not consistent with the most demanding inferential standards of the field.  That is, scientists may be forced to choose between the evidentiary ideals of their fields and the moral value of protecting the public from exposure to toxins, frequently they cannot realize both (Cranor 1994, pp. 169-70).

Kristin Shrader-Frechette has advanced analogous arguments in numerous risk research contexts.

(2) I should note that Cranor is aware that properly scrutinizing statistical tests can advance matters here.

Cranor, C. (1994), “Public Health Research and Uncertainty”, in K. Shrader-Frechette, Ethics of Sciencetific Research.  Rowman and Littlefield, pp. 169-186.

Shrader-Frechette, K. (1994), Ethics of Scientific Research, Rowman and Littlefield

Categories: Objectivity, Objectivity, Statistics | Tags: , , , , | 17 Comments

Objectivity #1. Will the Real Junk Science Please Stand Up?

Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied.

Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!

A given print exposé exploring the distortions of fact on one side or the other routinely showers wild praise on their side’s—their science’s and their policy’s—objectivity, their adherence to the facts, just the facts. How impressed might we be with the text or the group that admitted to its own biases?

Take, say, global warming, genetically modified crops, electric-power lines, medical diagnostic testing. Group 1 alleges that those who point up the risks (actual or potential) have a vested interest in construing the evidence that exists (and the gaps in the evidence) accordingly, which may bias the relevant science and pressure scientists to be politically correct. Group 2 alleges the reverse, pointing to industry biases in the analysis or reanalysis of data and pressures on scientists doing industry-funded work to go along to get along.

When the battle between the two groups is joined, issues of evidence—what counts as bad/good evidence for a given claim—and issues of regulation and policy—what are “acceptable” standards of risk/benefit—may become so entangled that no one recognizes how much of the disagreement stems from divergent assumptions about how models are produced and used, as well as from contrary stands on the foundations of uncertain knowledge and statistical inference. The core disagreement is mistakenly attributed to divergent policy values, at least for the most part.

Over the years I have tried my hand in sorting out these debates (e.g., Mayo and Hollander 1991). My account of testing actually came into being to systematize reasoning from statistically insignificant results in evidence based risk policy: no evidence of risk is not evidence of no risk! (see October 5). Unlike the disputants who get the most attention, I have argued that the current polarization cries out for critical or meta-scientific scrutiny of the uncertainties, assumptions, and risks of error that are part and parcel of the gathering and interpreting of evidence on both sides. Unhappily, the disputants tend not to welcome this position—and are even hostile to it.  This used to shock me when I was starting out—why would those who were trying to promote greater risk accountability not want to avail themselves of ways to hold the agencies and companies responsible when they bury risks in fallacious interpretations of statistically insignificant results?  By now, I am used to it.

This isn’t to say that there’s no honest self-scrutiny going on, but only that all sides are so used to anticipating conspiracies of bias that my position is likely viewed as yet another politically motivated ruse. So what we are left with is scientific evidence having less and less a role in constraining or adjudicating disputes. Even to suggest an evidential adjudication risks being attacked as a paid insider.

I agree with David Michaels (2008, 61) that “the battle for the integrity of science is rooted in issues of methodology,” but winning the battle would demand something that both sides are increasingly unwilling to grant. It comes as no surprise that some of the best scientists stay as far away as possible from such controversial science.

Mayo,D. and Hollander. R. (eds.). 1991. Acceptable Evidence: Science and Values in Risk Management, Oxford.

Mayo. 1991. Sociological versus Metascientific Views of Risk Assessment, in D. Mayo and R. Hollander (eds.), Acceptable Evidence: 249-79.

Michaels, D. 2008. Doubt Is Their Product, Oxford.

Categories: Objectivity, Statistics | Tags: , , , , | 3 Comments

RMM-1: Special Volume on Stat Sci Meets Phil Sci

Little by little the articles on Stat Sci Meets Phil Sci are appearing in “Rationality, Markets and Morals,”  online.

The article “Statistical Science and Philosophy of Science: Where Do/Should They Meet in 2011 (and Beyond)?” has now been published.

Categories: philosophy of science, Philosophy of Statistics, Statistics | Tags: , | 4 Comments

Blogging the (Strong) Likelihood Principle

I am guilty of not having provided the detailed responses that are owed to the several entries in Christian Robert’s blog on Mayo and Spanos (eds.), ERROR AND INFERENCE: Recent Exchanges on Experimental Reasoning Reliability, and the Objectivity and Rationality of Science (E.R.R.O.R.S.)  (2010, CUP).  Today, I couldn’t resist writing a (third) follow-up comment having to do with my argument on the (strong) Likelihood Principle, even though I wasn’t planning to jump into that issue on this blog just yet. Having been lured to react, and even sketch the argument, I direct interested readers to his blog:

http://xianblog.wordpress.com/

As you can guess, hard copies of our book play a useful role in propping open doors to breeze through marble floors in a wheelchair!  Since I’m nearly free of it (thanks to the ministrations of the recovery team here at Chatfield Chateau), a picture seemed in order!

For an interesting, longish review of the book that I just encountered by Adam La Caze (Note Dame Philosophical Reviews) see: http://ndpr.nd.edu/news/24435-error-and-inference-recent-exchanges-on-experimental-reasoning-reliability-and-the-objectivity-and-rationality-of-science/

Categories: Likelihood Principle, Statistics | Tags: , , | Leave a comment

Formaldehyde Hearing: How to Tell the Truth With Statistically Insignificant Results

One of the first examples I came across of problems in construing statistically insignificant (or “negative”) results was a House Science and Technology investigation of an EPA ruling on formaldehyde in the 1980’s. Investigators of the EPA (led by Senator Al Gore!) used rather straightforward, day-to-day reasoning: No evidence of risk is not evidence of no risk. Given the growing interest in science and values both in philosophy and in science and technology studies, I made the “principle” explicit. I thought it was pretty obvious, aside from my Popperian leanings. I’m surprised it’s still an issue.

The case involved the Occupational Safety and Health Administration (OSHA), and possible risks of formaldehyde in the workplace. In 1982, the new EPA assistant administrator, who had come in with Ronald Reagan, “reassessed” the data from the previous administration and, reversing an earlier ruling, announced: “There does not appear to be any relationship, based on the existing data base on humans, between exposure [to formaldehyde] and cancer” (Hearing p. 260). Continue reading

Categories: Statistics | Tags: , , , , | Leave a comment

Part 3: Prionvac: How the Reformers Should Have done Their Job

Here’s how the Prionvac appraisal should have ended:

Prionvac: Our experiments yield a statistically significant increase in survival  among scrapie-infected mice who are given our new vaccine compared to infected mice who are treated with a placebo (p = .01). The data indicate H: an increased survival rate of 9 months, compared to untreated mice.

Reformer: You are exaggerating what your data show. In fact, there is a fairly high probability, more than .5, that your study would produce a p = .01 difference, even if the actual increased rate of survival were only 1 month! (That is, the power to reject the null and infer H: increase of 1 months, is more than .5.) Continue reading

Categories: Reformers: Prionvac, Statistics | Tags: , , , | 3 Comments

Part 2 Prionvac: The Will to Understand Power

As a Nietzschean, I am fond of the statistical notion of power; yet it is often misunderstood by critics of testing. Consider leaders of the reform movement in economics, Ziliac and McCloskey (Michigan, 2009).

In this post, I will adhere precisely to the text, and offer no new interpretation of tests. Type 1 and 2 errors and power are just formal notions with formal definitions.  But we need to get them right (especially if we are giving expert advice).  You can hate them; just define them correctly please.  They write: Continue reading

Categories: Reformers: Prionvac, Statistics | Tags: , , | 9 Comments

Part 1: Imaginary scientist at an imaginary company, Prionvac, and an imaginary Reformer

Prionvac: Our experiments yield a statistically significant increase in survival among scrapie-infected mice who are given our new vaccine (p = .01) compared to infected mice who are treated with a placebo. The data indicate H: an increased survival time of 9 months, compared to untreated mice.* Continue reading

Categories: Reformers: Prionvac, Statistics | Tags: , | 2 Comments

WHIPPING BOYS AND WITCH HUNTERS

In an earlier post I alleged that frequentist hypotheses tests often serve as whipping boys, by which I meant “scapegoats”, for the well-known misuses, abuses, and flagrant misinterpretations of tests (both simple Fisherian significance tests and Neyman-Pearson tests, although in different ways).  Checking the history of this term however, there is a certain disanalogy with at least the original meaning of a of “whipping boy,” namely, an innocent boy who was punished when a medieval prince misbehaved and was in need of discipline.   It was thought that seeing an innocent companion, often a friend, beaten for his own transgressions would supply an effective way to ensure the prince would not repeat the same mistake. But significance tests floggings, rather than a tool for a humbled self-improvement and commitment to avoiding flagrant rule violations, has tended instead to yield declarations that it is the rules that are invalid! The violators are excused as not being able to help it! The situation is more akin to that of witch hunting, that in some places became an occupation in its own right. Continue reading

Categories: Statistics | Tags: , , | 8 Comments

LUCKY 13 (Criticisms)

Given some slight recuperation delays, interested readers might wish to poke around the multiple layers of goodies on the left hand side of this web page, wherein all manner of foundational/statistical controversies are considered. In a recent attempt by Aris Spanos and I to address the age-old criticisms from the perspective of the “error statistical philosophy,” we delineate  13 criticisms.  Here they are:

Ø  (#1) error statistical tools forbid using any background knowledge.

Ø  (#2) All statistically significant results are treated the same.
Ø  (#3) The p-value does not tell us how large a discrepancy is found.
Ø  (#4) With large enough sample size even a trivially small discrepancy from the null can be detected.
Ø  (#5) Whether there is a statistically significant difference from the null depends on which is the null and which is the alternative.
Ø  (#6) Statistically insignificant results are taken as evidence that the null hypothesis is true.
Ø  (#7) Error probabilities are invariably misinterpreted as posterior probabilities.
Ø  (#8) Error statistical tests are justified only in cases where there is a very long (if not infinite) series of repetitions of the same experiment.
Ø  (#9) Specifying statistical tests is too arbitrary.
Ø  (#10) We should be doing confidence interval estimation rather than significance tests.
Ø  (#11) Error statistical methods take into account the intentions of the scientists analyzing the data.
Ø  (#12) All models are false anyway.
Ø  (#13) Testing assumptions involves illicit data-mining.

HAVE WE LEFT ANY OUT?

Mayo & Spanos “Error Statistics” 2011

…more soon.

(for problems  accessing links, please write to: jemille6@vt.edu)

Categories: Statistics | Tags: | 4 Comments

A Highly Anomalous Event

The journey to San Francisco was smooth sailing with no plane delays; within two hours of landing I found myself in the E.R. of St. Francis Hospital (with the philosopher of science Ronald Giere), unable to walk.  I have just described an unexpected, “anomalous”, highly unusual event, but no one would suppose it was anomalous FOR, i.e., evidence against some theory, say, in molecular biology.  Yet I am  getting e-mails (from readers) saying, in effect, that since the improbable coin toss result is very unexpected/anomalous in its own right, it therefore is anomalous for any and all theories, which is patently absurd.  What had happened, in case you want to know, is that just as I lunged forward to grab my (bulging) suitcase off the airline baggage thingy, out of the corner of my eye I saw my computer bag being pulled away by someone on my left, and as I simultaneously yanked it back, I tumbled over—very gently it seemed– twisting my knee in a funny way.  To my surprise/alarm, much as a tried, I could put no weight on my right leg without succumbing to a Geppeto-puppet-like collapse.  The event, of course, could rightly be regarded as anomalous for hypotheses about my invulnerability to such mishaps, because it runs counter to them.  I will assume this issue is now settled for our discussions, yes?

Categories: Statistics | Tags: , , , , | 29 Comments

Getting It Right But for the Wrong Reason

Sitting in the airport . . . a temporary escape from Elba, which I’m becoming more and more loathe to leave.  I fear that some might agree, rightly, that Kadane’s “trivial test” is no indictment of significance tests and yet for the WRONG reason. I don’t want to beat a dead horse, but perhaps a certain confusion is going to obstruct understanding later on. Let us abbreviate “tails” on a coin toss that lands tails 5% of the time, as “a rare coin toss outcome”. Some seem to reason: since a rare coin toss outcome is an event with probability .05 REGARDLESS of the truth or falsity of a hypothesis H, then the test is still a legitimate significance test with significance level .05; it is just a lousy one, with no discriminating ability. I claim it is no significance test at all, and that there is an important equivocation going on (in some letters I’ve received)—one which I hoped would be skirted by the analogy with ordinary hypothesis testing in science. Heading off this confusion was the key rationale for my discussion in the Kuru post. Finding no nucleic acid in prions is inconsistent, or virtually so, under the hypothesis H: all pathogens are transmitted with nucleic acid. The observed results are anomalous for the central dogma H BECAUSE they are counter to what H says we would expect. If you maintain that the “rare coin toss outcome” is anomalous for a statistical null hypothesis H, then you would also have to say they are anomalous for H: all pathogens have nucleic acid. But it is obvious this is false in the case of the scientific hypothesis. It must also be rejected in the case of the statistical hypothesis (Rule #1).

A legitimate statistical test hypothesis must tell us (i.e., let us compute) how improbably far different experimental outcomes are from what would be expected under H. It is correct to regard experimental results as anomalous for a hypothesis H only if, and only because, they run counter to what H tells us would occur in a universe where H is correct. A hypothesis on pathogen transmission, say, does not tell us the improbability of the rare coin toss outcome. Thus it is no significance test at all. As I wrote in the Kuru post:  It is not that infectious protein events are “very improbable” in their own right (however one construes this); it is rather that these events are counter to, and forbidden under, the assumption of the hypothesis H.

Categories: Statistics | Tags: , , , | 4 Comments

SF conferences & E. Lehmann

I’m jumping off the Island for a bit.  Destination: San Francisco, a conference on “The Experimental Side of Modeling” http://www.isabellepeschard.org/ .  Kuru makes a walk on appearance in my presentation, “How Experiment Gets a Life of its Own”.  It does not directly discuss statistics, but I will post my slides.

The last time I was in SF was in 2003 with my econometrician colleague, Aris Spanos.  We were on our way to Santa Barbara to engage in an unusual powwow on statistical foundations at NCEAS*, and stopped off in SF to meet with Erich Lehmann and his wife, Julie Shaffer.   We discussed, among other things, this zany idea of mine to put together a session for the Second Lehmann conference in 2004 that would focus on philosophical foundations of statistics. (Our session turned out to include David Freedman and D.R. Cox). Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , | 1 Comment

In Exile, Clinging to Old Ideas?

To take up the first criticism, we can consider J. Kadane’s new book, Principles of Uncertainty (2011, CRC Press*). Kadane, to his credit, does not beat around the bush as regards his subjective Bayesian perspective; his is a leading Bayesian voice in the tradition of Savage.  He takes up central criticisms of frequentist methods in Chapter 12 called “Exploration of Old Ideas”. So now I am not only in foundational exile, I am clinging to ideas that are in need of Juvederm! Continue reading

Categories: Statistics | Tags: , , , | 5 Comments

KURU

I have been reading about a disorder that intrigues me, Kuru (which means “shaking”) widespread among the Fore people of New Guinea in the 1960s. In around 3-6 months, Kuru victims go from having difficulty walking, to outbursts of laughter, to inability to swallow and death. Kuru, and (what we now know to be) related diseases, e.g., Mad Cow, Crutzfield Jacobs, scrapie) are “spongiform” diseases, causing brains to appear spongy. (They are also called TSEs: transmissible spongiform encephalopathies). Kuru clustered in families, in particular among Fore women and their children, or elderly parents. Continue reading

Categories: philosophy of science, Reformers: Prionvac, Statistics | Tags: , , , , , | Leave a comment

Drilling Rule #1*

A simple rule before getting started: In presenting their arguments, philosophers sometimes appear to go off into far distant islands entirely, and then act as if they have shown something about the case at hand. The mystery evaporates if one keeps in mind the following rule of argument:

  • If one argument is precisely analogous to another, in all relevant respects, and the second argument is pretty clearly fishy, then so is the first. Likewise, if one argument is precisely analogous to another, in all relevant respects, and the second argument passes swimmingly, then so must the first.

If the argument at hand is murky, while the one in the distant land crystal clear, then appealing to the latter is a powerful way to make a point.  Because the relevance for the case at hand seems obvious, details may be left unstated.  Of course you may avoid these conclusions by showing just where the analogies break down.

*Full disclosure:  I own a fair amount of Diamond Offshore (DO), but do not plan to purchase more in the next 72 hours.

Categories: philosophy of science, Statistics | Tags: , , , , | Leave a comment

Overheard at the comedy hour at the Bayesian retreat:

Did you hear the one about the frequentist . . .

  • “who claimed that observing “heads” on a biased coin that lands heads with probability .05 is evidence of a statistically significant improvement over the standard treatment of diabetes, on the grounds that such an event occurs with low probability (.05)?”

or

  • “who defended the reliability of his radiation reading, despite using a broken radiometer, on the grounds that most of the time he uses one that works, so on average he’s pretty reliable?”

Such jests may work for an after-dinner laugh, but if it turns out that, despite being retreads of “straw-men” fallacies, they form the basis of why some reject frequentist methods, then they are not such a laughing matter.   But surely the drubbing of frequentist methods could not be based on a collection of howlers, could it?  I invite the curious reader to stay and find out. Continue reading

Categories: Statistics | Tags: , , , | Leave a comment

Frequentists in Exile: The Purpose of this Blog

Confronted with the position that “arguments for this personalistic theory were so persuasive that anything to any extent inconsistent with that theory should be discarded” (Cox 2006, 196), frequentists might have seen themselves in a kind of exile when it came to foundations, even those who had been active in the dialogues of an earlier period.  Sometime around the late 1990s there were signs that this was changing.  Regardless of the explanation, the fact that it did occur and is occurring is of central importance to statistical philosophy.

Now that Bayesians have stepped off their a priori pedestal, it may be hoped that a genuinely deep scrutiny of the frequentist and Bayesian accounts will occur.  In some corners of practice it appears that frequentist error statistical foundations are being discovered anew.  Perhaps frequentist foundations, never made fully explicit, but at most lying deep below the ocean floor, are finally being disinterred.  But let’s learn from some of the mistakes in the earlier attempts to understand it.  With this goal I invite you to join me in some deep water drilling, here as I cast about on my Isle of Elba.

Cox, D. R. (2006), Principles of Statistical Inference, CUP.

Categories: Statistics | Tags: , , | 1 Comment

Blog at WordPress.com.