Monthly Archives: October 2011

Oxford Gaol: Statistical Bogeymen

Oxford Jail is an entirely fitting place to be on Halloween!

Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba!  My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should I think be shelved, not that names should matter.

In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory.  Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended.   But for Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort)

Criticisms then follow readily: the form of one or both:

  • Error probabilities do not supply posterior probabilities in hypotheses, interpreted as if they do (and some say we just can’t help it), they lead to inconsistencies
  • Methods with good long-run error rates can give rise to counterintuitive inferences in particular cases.
  • I have proposed an alternative philosophy that replaces these tenets with different ones:
  • the role of probability in inference is to quantify how reliably or severely claims (or discrepancies from claims) have been tested
  • the severity goal directs us to the relevant error probabilities, avoiding the oft-repeated statistical fallacies due to tests that are overly sensitive, as well as those insufficiently sensitive to particular errors.
  • Control of long run error probabilities, while necessary is not sufficient for good tests or warranted inferences.

Continue reading

Categories: Statistics | Tags: ,

Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs

Increasingly, I am discovering that one of the biggest sources of confusion about the foundations of statistics has to do with what it means or should mean to use “background knowledge” and “judgment” in making statistical and scientific inferences. David Cox and I address this in our “Conversation” in RMM (2011); it is one of the three or four topics in that special volume that I am keen to take up.

Insofar as humans conduct science and draw inferences, and insofar as learning about the world is not reducible to a priori deductions, it is obvious that “human judgments” are involved. True enough, but too trivial an observation to help us distinguish among the very different ways judgments should enter according to contrasting inferential accounts. When Bayesians claim that frequentists do not use or are barred from using background information, what they really mean is that frequentists do not use prior probabilities of hypotheses, at least when those hypotheses are regarded as correct or incorrect, if only approximately. So, for example, we would not assign relative frequencies to the truth of hypotheses such as (1) prion transmission is via protein folding without nucleic acid, or (2) the deflection of light is approximately 1.75” (as if, as Pierce puts it, “universes were as plenty as blackberries”). How odd it would be to try to model these hypotheses as themselves having distributions: to us, statistical hypotheses assign probabilities to outcomes or values of a random variable. Continue reading

Categories: Statistics | Tags: ,

RMM-3: Special Volume on Stat Scie Meets Phil Sci

The article “Empirical Economic Model Discovery and Theory Evaluation” by Sir David Hendry has now been published in our special volume of the on-line journal, Rationality, Markets, and Morals (Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”)

Economies are so high dimensional and non-constant that many features of models can- not be derived by prior reasoning, intrinsically involving empirical discovery and requiring theory evaluation. Despite important differences, discovery and evaluation in economics are similar to those of science. Fitting a pre-specified equation limits discovery, but automatic methods can formulate much more general initial models with many possible variables, long lag lengths and non-linearities, allowing for outliers, data contamination, and parameter shifts; then select congruent parsimonious-encompassing models even with more candidate variables than observations, while embedding the theory; finally rigorously evaluate selected models to ascertain their viability.

Categories: Philosophy of Statistics, Statistics | Tags: ,

The Will to Understand Power: Neyman’s Nursery (NN1)

Way back when, although I’d never met him, I sent my doctoral dissertation, Philosophy of Statistics, to one person only: Professor Ronald Giere. (And he would read it, too!) I knew from his publications that he was a leading defender of frequentist statistical methods in philosophy of science, and that he’d worked for at time with Birnbaum in NYC. Continue reading

Categories: Neyman's Nursery, Statistics | Tags: ,

ReBlogging the Likelihood Principle #2: Solitary Fishing:SLP Violations

Reblogging from a year ago. The Appendix of the “Cox/Mayo Conversation” (linked below [i]) is an attempt to quickly sketch Birnbaum’s argument for the strong likelihood principle (SLP), and its sins.  Couple of notes: Firstly, I am a philosopher (of science and statistics) not a statistician.  That means, my treatment will show all of the typical (and perhaps annoying) signs of being a trained philosopher-logician.  I’ve no doubt statisticians would want to use different language, which is welcome.  Second, this is just a blog (although perhaps my published version is still too informal for some). Continue reading

Categories: Likelihood Principle | Tags: , ,

RMM-2: "A Conversation Between Sir David Cox & D.G. Mayo"

Published today in Rationality, Markets and Morals

Studies at the Intersection of Philosophy and Economics

 “A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo”

(as recorded, June, 2011)

Categories: Philosophy of Statistics, Statistics | Tags: ,

Objectivity #3: Clean(er) Hands With Metastatistics

I claim that all but the first of the “dirty hands” argument’s five premises are flawed. Even the first premise too directly identifies a policy decision with a statistical report. But the key flaws begin with premise 2. Although risk policies may be based on a statistical report of evidence, it does not follow that the considerations suitable for judging risk policies are the ones suitable for judging the statistical report. They are not. The latter, of course, should not be reduced to some kind of unthinking accept/reject report. If responsible, it must clearly and completely report the nature and extent of (risk-related) effects that are and are not indicated by the data, making plain how the methodological choices made in the generation, modeling, and interpreting of data raise or lower the chances of finding evidence of specific risks. These choices may be called risk assessment policy (RAP) choices. Continue reading

Categories: Objectivity, Objectivity, Statistics | Tags: ,

King Tut Includes ErrorStatistics in Top 50 Statblogs!

I didn’t think our little rag tag blog-in-exile was even noticed. I’m glad to discover several other sites I was unaware of (providing yet more grist for our mills).

(Note: I am not at all happy with the way the comments are appearing here; there’s insufficient space.  I will be investigating better solutions…..I’m aware of the problem.)

I will soon be departing from this cushy chateau, where even King Tut reads EGEK.

(1) PSX-Second International Workshop on the Philosophy of Scientific Experimentation, 21-2 October, University of Konstanz

(2) Lorentz Center: Error in the Sciences, 24-28 October

Then on to England, in time for Halloween.

My Halloween costume?  You can take a guess to win  an authentic, chef-signed napkin from the Elbar room (no winners last week).

Categories: Statistics | Tags:

Objectivity #2: The “Dirty Hands” Argument for Ethics in Evidence

Some argue that generating and interpreting data for purposes of risk assessment invariably introduces ethical (and other value) considerations that might not only go beyond, but might even conflict with, the “accepted canons of objective scientific reporting.”  This thesis, we may call it the thesis of ethics in evidence and inference, some think, shows that an ethical interpretation of evidence may warrant violating canons of scientific objectivity, and even that a scientist must choose between norms of morality and objectivity.

The reasoning is that since the scientists’ hands must invariably get “dirty” with policy and other values, they should opt for interpreting evidence in a way that promotes ethically sound values, or maximizes public benefit (in some sense).

I call this the “dirty hands” argument, alluding to a term used by philosopher Carl Cranor (1994).1

I cannot say how far its proponents would endorse taking the argument.2 However, it seems that if this thesis is accepted, it may be possible to regard as “unethical” the objective reporting of scientific uncertainties in evidence.  This consequence is worrisome: in fact, it would conflict with the generally accepted imperative for an ethical interpretation of scientific evidence.

Nevertheless, the “dirty hands” argument as advanced has apparently plausible premises, one or more of which would need to be denied to avoid the conclusion which otherwise follows deductively. It goes roughly as follows:

  1. Whether observed data are taken as evidence of a risk depends on a methodological decision as to when to reject the null hypothesis of no risk  H0 (and infer the data are evidence of a risk).
  2. Thus interpreting data to feed into policy decisions with potentially serious risks to the public, the scientist is actually engaged in matters of policy (what is generally framed as an issue of evidence and science, is actually an issue of policy values, ethics, and politics).
  3.  The public funds scientific research and the scientist should be responsible for promoting the public good, so scientists should interpret risk evidence so as to maximize public benefit.
  4. Therefore, a responsible (ethical) interpretation of scientific data on risks is one that maximizes public benefit–and one that does not do so is irresponsible or unethical.
  5. Public benefit is maximized by minimizing the chance of failing to find a risk.  This leads to the conclusion in 6:
  6. CONCLUSION: In situations of risk assessment the ethical interpreter of evidence will maximize the chance of inferring there is a risk–even if this means inferring a risk when there is none with high probability (or at least a probability much higher than is normally countenanced)

The argument about ethics in evidence is often put in terms of balancing type 1 and 2 errors.

Type I error:test T finds evidence of an increased risk ( H0 is rejected), when in fact the risk is absent (false positive)

Type II error:
test T does not find evidence of an increased risk ( H0 is accepted), when in fact an increased risk δ is present (false negative).

The traditional balance of type I and type II error probabilities, wherein type I errors are minimized, some argue, is unethical. Rather than minimize type I errors, it might be  claimed, an “ethical” tester should minimize type II errors.

I claim that at least 3 of the premises, while plausible-sounding, are false.  What do you think?

(1) Cranor (to my knowledge) was among the first to articulate the argument in philosophy, in relation to statistical significance tests (it is echoed by more recent philosophers of evidence based policy):

Scientists should adopt more health protective evidentiary standards, even when they are not consistent with the most demanding inferential standards of the field.  That is, scientists may be forced to choose between the evidentiary ideals of their fields and the moral value of protecting the public from exposure to toxins, frequently they cannot realize both (Cranor 1994, pp. 169-70).

Kristin Shrader-Frechette has advanced analogous arguments in numerous risk research contexts.

(2) I should note that Cranor is aware that properly scrutinizing statistical tests can advance matters here.

Cranor, C. (1994), “Public Health Research and Uncertainty”, in K. Shrader-Frechette, Ethics of Sciencetific Research.  Rowman and Littlefield, pp. 169-186.

Shrader-Frechette, K. (1994), Ethics of Scientific Research, Rowman and Littlefield

Categories: Objectivity, Objectivity, Statistics | Tags: , , , ,

Objectivity #1. Will the Real Junk Science Please Stand Up?

Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied.

Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!

A given print exposé exploring the distortions of fact on one side or the other routinely showers wild praise on their side’s—their science’s and their policy’s—objectivity, their adherence to the facts, just the facts. How impressed might we be with the text or the group that admitted to its own biases?

Take, say, global warming, genetically modified crops, electric-power lines, medical diagnostic testing. Group 1 alleges that those who point up the risks (actual or potential) have a vested interest in construing the evidence that exists (and the gaps in the evidence) accordingly, which may bias the relevant science and pressure scientists to be politically correct. Group 2 alleges the reverse, pointing to industry biases in the analysis or reanalysis of data and pressures on scientists doing industry-funded work to go along to get along.

When the battle between the two groups is joined, issues of evidence—what counts as bad/good evidence for a given claim—and issues of regulation and policy—what are “acceptable” standards of risk/benefit—may become so entangled that no one recognizes how much of the disagreement stems from divergent assumptions about how models are produced and used, as well as from contrary stands on the foundations of uncertain knowledge and statistical inference. The core disagreement is mistakenly attributed to divergent policy values, at least for the most part.

Over the years I have tried my hand in sorting out these debates (e.g., Mayo and Hollander 1991). My account of testing actually came into being to systematize reasoning from statistically insignificant results in evidence based risk policy: no evidence of risk is not evidence of no risk! (see October 5). Unlike the disputants who get the most attention, I have argued that the current polarization cries out for critical or meta-scientific scrutiny of the uncertainties, assumptions, and risks of error that are part and parcel of the gathering and interpreting of evidence on both sides. Unhappily, the disputants tend not to welcome this position—and are even hostile to it.  This used to shock me when I was starting out—why would those who were trying to promote greater risk accountability not want to avail themselves of ways to hold the agencies and companies responsible when they bury risks in fallacious interpretations of statistically insignificant results?  By now, I am used to it.

This isn’t to say that there’s no honest self-scrutiny going on, but only that all sides are so used to anticipating conspiracies of bias that my position is likely viewed as yet another politically motivated ruse. So what we are left with is scientific evidence having less and less a role in constraining or adjudicating disputes. Even to suggest an evidential adjudication risks being attacked as a paid insider.

I agree with David Michaels (2008, 61) that “the battle for the integrity of science is rooted in issues of methodology,” but winning the battle would demand something that both sides are increasingly unwilling to grant. It comes as no surprise that some of the best scientists stay as far away as possible from such controversial science.

Mayo,D. and Hollander. R. (eds.). 1991. Acceptable Evidence: Science and Values in Risk Management, Oxford.

Mayo. 1991. Sociological versus Metascientific Views of Risk Assessment, in D. Mayo and R. Hollander (eds.), Acceptable Evidence: 249-79.

Michaels, D. 2008. Doubt Is Their Product, Oxford.

Categories: Objectivity, Statistics | Tags: , , , ,

RMM-1: Special Volume on Stat Sci Meets Phil Sci

Little by little the articles on Stat Sci Meets Phil Sci are appearing in “Rationality, Markets and Morals,”  online.

The article “Statistical Science and Philosophy of Science: Where Do/Should They Meet in 2011 (and Beyond)?” has now been published.

Categories: philosophy of science, Philosophy of Statistics, Statistics | Tags: ,

Blogging the (Strong) Likelihood Principle

I am guilty of not having provided the detailed responses that are owed to the several entries in Christian Robert’s blog on Mayo and Spanos (eds.), ERROR AND INFERENCE: Recent Exchanges on Experimental Reasoning Reliability, and the Objectivity and Rationality of Science (E.R.R.O.R.S.)  (2010, CUP).  Today, I couldn’t resist writing a (third) follow-up comment having to do with my argument on the (strong) Likelihood Principle, even though I wasn’t planning to jump into that issue on this blog just yet. Having been lured to react, and even sketch the argument, I direct interested readers to his blog:

As you can guess, hard copies of our book play a useful role in propping open doors to breeze through marble floors in a wheelchair!  Since I’m nearly free of it (thanks to the ministrations of the recovery team here at Chatfield Chateau), a picture seemed in order!

For an interesting, longish review of the book that I just encountered by Adam La Caze (Note Dame Philosophical Reviews) see:

Categories: Likelihood Principle, Statistics | Tags: , ,

Formaldehyde Hearing: How to Tell the Truth With Statistically Insignificant Results

One of the first examples I came across of problems in construing statistically insignificant (or “negative”) results was a House Science and Technology investigation of an EPA ruling on formaldehyde in the 1980’s. Investigators of the EPA (led by Senator Al Gore!) used rather straightforward, day-to-day reasoning: No evidence of risk is not evidence of no risk. Given the growing interest in science and values both in philosophy and in science and technology studies, I made the “principle” explicit. I thought it was pretty obvious, aside from my Popperian leanings. I’m surprised it’s still an issue.

The case involved the Occupational Safety and Health Administration (OSHA), and possible risks of formaldehyde in the workplace. In 1982, the new EPA assistant administrator, who had come in with Ronald Reagan, “reassessed” the data from the previous administration and, reversing an earlier ruling, announced: “There does not appear to be any relationship, based on the existing data base on humans, between exposure [to formaldehyde] and cancer” (Hearing p. 260). Continue reading

Categories: Statistics | Tags: , , , ,

Part 3: Prionvac: How the Reformers Should Have done Their Job

Here’s how the Prionvac appraisal should have ended:

Prionvac: Our experiments yield a statistically significant increase in survival  among scrapie-infected mice who are given our new vaccine compared to infected mice who are treated with a placebo (p = .01). The data indicate H: an increased survival rate of 9 months, compared to untreated mice.

Reformer: You are exaggerating what your data show. In fact, there is a fairly high probability, more than .5, that your study would produce a p = .01 difference, even if the actual increased rate of survival were only 1 month! (That is, the power to reject the null and infer H: increase of 1 months, is more than .5.) Continue reading

Categories: Reformers: Prionvac, Statistics | Tags: , , ,

Part 2 Prionvac: The Will to Understand Power

As a Nietzschean, I am fond of the statistical notion of power; yet it is often misunderstood by critics of testing. Consider leaders of the reform movement in economics, Ziliac and McCloskey (Michigan, 2009).

In this post, I will adhere precisely to the text, and offer no new interpretation of tests. Type 1 and 2 errors and power are just formal notions with formal definitions.  But we need to get them right (especially if we are giving expert advice).  You can hate them; just define them correctly please.  They write: Continue reading

Categories: Reformers: Prionvac, Statistics | Tags: , ,

Blog at