Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)

Posted on January 4, 2015 by Mayo

too strict/not strict enough

Given the daily thrashing significance tests receive because of how preposterously easy it is claimed to satisfy the .05 significance level requirement, it’s surprising[i] to hear Naomi Oreskes blaming the .05 standard as demanding too high a burden of proof for accepting climate change. “Playing Dumb on Climate Change,” N.Y. Times Sunday Rev. at 2 (Jan. 4, 2015). Is there anything for which significance levels do not serve as convenient whipping boys? Thanks to lawyer Nathan Schachtman for alerting me to her opinion piece today (congratulations to Oreskes!),and to his current blogpost. I haven’t carefully read her article, but one claim jumped out: scientists, she says, “practice a form of self-denial, denying themselves the right to believe anything that has not passed very high intellectual hurdles.” If only! *I add a few remarks at the end. Anyhow here’s Schachtman’s post:

.

“Playing Dumb on Statistical Significance”
by Nathan Schachtman

Naomi Oreskes is a professor of the history of science in Harvard University. Her writings on the history of geology are well respected; her writings on climate change tend to be more adversarial, rhetorical, and ad hominem. See, e.g., Naomi Oreskes,Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming(N.Y. 2010). Oreskes’ abuse of the meaning of significance probability for her own rhetorical ends is on display in today’s New York Times. Naomi Oreskes, “Playing Dumb on Climate Change,” N.Y. Times Sunday Rev. at 2 (Jan. 4, 2015).

Oreskes wants her readers to believe that those who are resisting her conclusions about climate change are hiding behind an unreasonably high burden of proof, which follows from the conventional standard of significance in significance probability. In presenting her argument, Oreskes consistently misrepresents the meaning of statistical significance and confidence intervals to be about the overall burden of proof for a scientific claim:

“Typically, scientists apply a 95 percent confidence limit, meaning that they will accept a causal claim only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20. But it also means that if there’s more than even a scant 5 percent possibility that an event occurred by chance, scientists will reject the causal claim. It’s like not gambling in Las Vegas even though you had a nearly 95 percent chance of winning.”

Although the confidence interval is related to the pre-specified Type I error rate, alpha, and so a conventional alpha of 5% does lead to a coefficient of confidence of 95%, Oreskes has misstated the confidence interval to be a burden of proof consisting of a 95% posterior probability. The “relationship” is either true or not; the p-value or confidence interval provides a probability for the sample statistic, or one more extreme, on the assumption that the null hypothesis is correct. The 95% probability of confidence intervals derives from the long-term frequency that 95% of all confidence intervals, based upon samples of the same size, will contain the true parameter of interest.

Oreskes is an historian, but her history of statistical significance appears equally ill considered. Here is how she describes the “severe” standard of the 95% confidence interval:

“Where does this severe standard come from? The 95 percent confidence level is generally credited to the British statistician R. A. Fisher, who was interested in the problem of how to be sure an observed effect of an experiment was not just the result of chance. While there have been enormous arguments among statisticians about what a 95 percent confidence level really means, working scientists routinely use it.”

First, Oreskes, the historian, gets the history wrong. The confidence interval is due to Jerzy Neyman, not to Sir Ronald A. Fisher. Jerzy Neyman, “Outline of a theory of statistical estimation based on the classical theory of probability,” 236 Philos. Trans. Royal Soc’y Lond. Ser. A 333 (1937). Second, although statisticians have debated the meaning of the confidence interval, they have not wandered from its essential use as an estimation of the parameter (based upon the use of an unbiased, consistent sample statistic) and a measure of random error (not systematic error) about the sample statistic. Oreskes provides a fallacious history, with a false and misleading statistics tutorial.

Oreskes, however, goes on to misidentify the 95% coefficient of confidence with the legal standard known as “beyond a reasonable doubt”:

“But the 95 percent level has no actual basis in nature. It is a convention, a value judgment. The value it reflects is one that says that the worst mistake a scientist can make is to think an effect is real when it is not. This is the familiar “Type 1 error.” You can think of it as being gullible, fooling yourself, or having undue faith in your own ideas. To avoid it, scientists place the burden of proof on the person making an affirmative claim. But this means that science is prone to ‘Type 2 errors’: being too conservative and missing causes and effects that are really there.

Is a Type 1 error worse than a Type 2? It depends on your point of view, and on the risks inherent in getting the answer wrong. The fear of the Type 1 error asks us to play dumb; in effect, to start from scratch and act as if we know nothing. That makes sense when we really don’t know what’s going on, as in the early stages of a scientific investigation. It also makes sense in a court of law, where we presume innocence to protect ourselves from government tyranny and overzealous prosecutors — but there are no doubt prosecutors who would argue for a lower standard to protect society from crime.

When applied to evaluating environmental hazards, the fear of gullibility can lead us to understate threats. It places the burden of proof on the victim rather than, for example, on the manufacturer of a harmful product. The consequence is that we may fail to protect people who are really getting hurt.”

The truth of climate change opinions do not turn on sampling error, but rather on the desire to draw an inference from messy, incomplete, non-random, and inaccurate measurements, fed into models of uncertain validity. Oreskes suggests that significance probability is keeping us from acknowledging a scientific fact, but the climate change data sets are amply large to rule out sampling error if that were a problem. And Oreskes’ suggestion that somehow statistical significance is placing a burden upon the “victim,” is simply assuming what she hopes to prove; namely, that there is a victim (and a perpetrator).

Oreskes’ solution seems to have a Bayesian ring to it. She urges that we should start with our a priori beliefs, intuitions, and pre-existing studies, and allow them to lower our threshold for significance probability:

“And what if we aren’t dumb? What if we have evidence to support a cause-and-effect relationship? Let’s say you know how a particular chemical is harmful; for example, that it has been shown to interfere with cell function in laboratory mice. Then it might be reasonable to accept a lower statistical threshold when examining effects in people, because you already have reason to believe that the observed effect is not just chance.

This is what the United States government argued in the case of secondhand smoke. Since bystanders inhaled the same chemicals as smokers, and those chemicals were known to be carcinogenic, it stood to reason that secondhand smoke would be carcinogenic, too. That is why the Environmental Protection Agency accepted a (slightly) lower burden of proof: 90 percent instead of 95 percent.”

Oreskes’ rhetoric misstates key aspects of scientific method. The demonstration of causality in mice, or only some perturbation of cell function in non-human animals, does not warrant lowering our standard for studies in human beings. Mice and rats are, for many purposes, poor predictors of human health effects. All medications developed for human use are tested in animals first, for safety and efficacy. A large majority of such medications, efficacious in rodents, fail to satisfy the conventional standards of significance probability in randomized clinical trials. And that standard is not lowered because the drug sponsor had previously demonstrated efficacy in mice, or some other furry rodent.

The EPA meta-analysis of passive smoking and lung cancer is a good example of how not to conduct science. The protocol for the EPA meta-analysis called for a 95% confidence interval, but the agency scientists manipulated their results by altering the pre-specified coefficient confidence in their final report. Perhaps even more disgraceful was the selectivity of included studies for the meta-analysis, which biased the agency’s result in a way not reflected in p-values or confidence intervals. See “EPA Cherry Picking (WOE) – EPA 1992 Meta-Analysis of ETA & Lung Cancer – Part 1” (Dec. 2, 2012); “EPA Post Hoc Statistical Tests – One Tail vs Two” (Dec. 2, 2012).

Of course, the scientists preparing for and conducting a meta-analysis on environmental tobacco smoke began with a well-justified belief that active smoking causes lung cancer. Passive smoking, however, involves very different exposure levels and raises serious issues of the human body’s defensive mechanisms to protect against low-level exposure. Insisting on a reasonable quality meta-analysis for passive smoking and lung cancer was not a matter of “playing dumb”; it was a recognition of our actual ignorance and uncertainty about the claim being made for low-exposure effects. The shifty confidence intervals and slippery methodology exemplifies how agency scientists assume their probandum to be true, and then manipulate or adjust their methods to provide the result they had assumed all along.

Oreskes then analogizes not playing dumb on environmental tobacco smoke to not playing dumb on climate change:

“In the case of climate change, we are not dumb at all. We know that carbon dioxide is a greenhouse gas, we know that its concentration in the atmosphere has increased by about 40 percent since the industrial revolution, and we know the mechanism by which it warms the planet.

WHY don’t scientists pick the standard that is appropriate to the case at hand, instead of adhering to an absolutist one? The answer can be found in a surprising place: the history of science in relation to religion. The 95 percent confidence limit reflects a long tradition in the history of science that valorizes skepticism as an antidote to religious faith.”

I will leave substance of the climate change issue to others, but Oreskes’ methodological misidentification of the 95% coefficient of confidence with burden of proof is wrong. Regardless of motive, the error obscures the real debate, which is about data quality. More disturbing is that Oreskes’ error confuses significance and posterior probabilities, and distorts the meaning of burden of proof. To be sure, the article by Oreskes is labeled opinion, and Oreskes is entitled to her opinions about climate change and whatever. To the extent that her opinions, however, are based upon obvious factual errors about statistical methodology, they are entitled to no weight at all.

*I’m not sure Oreskes is guilty of any misinterpretation of p-values, or statistical methodology (except in too closely connecting statistical and substantive conclusions), never minding the bit about Fisher, or the problems with trials on second-hand smoke. I take her to be alluding to an informal standard of proof, at a substantive, not a formal statistical level, and at the level of reaching policy At that level, background knowledge and theories, not to mention costs, do enter–and not in any way that invokes Bayesianism. Whether some scientists are being too strict in this case, is a separate issue.

[i] I almost wrote “refreshing”, since I’m so fed up with blatant chump effects being blamed on significance tests.

Categories: evidence-based policy, science communication, Statistics | 61 Comments

61 thoughts on “Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)”

January 4, 2015

Mayo

I don’t think she’s guilty of assigning a .95 probability to the claims in question, I think she’s alluding to the observed relationships or observed correlations, and the probs they would be produced without the underlying causes operating. This is reminiscent of my posts about “probable flukes” being entirely kosher.

Reply
January 4, 2015

Tom

Nathan, I think you are indulging in just the same kind of ad-hominem attack on Dr. Oreskes that you accuse her of perpetrating. You write, “First, Oreskes, the historian, gets the history wrong.” Rather than simply state that the statement in question is incorrect, you said that she, herself, was wrong, and emphasized the personal nature of the attack by adding “the historian”.

Furthermore, your statement regarding the origins of the confidence interval is irrelevant. Dr. Oreskes never once uses the phrase “confidence interval”, though you attribute it to her multiple times. Furthermore, she is describing the origin of the 0.05 significance level, not significance testing per se, and not confidence intervals. There is no universal agreement on the matter, but I believe Dr. Oreskes is not far from the mark in her accounting of where the 0.05 came from historically.

In the end, policy on climate change might reasonably be evaluated in the context of decision theory rather than significance testing, even if science itself utilizes the latter. Indeed, what could be more obvious than balancing the costs and the uncertainties in a systematic and comprehensive manner?

Reply

January 4, 2015

Mayo

Tom: She does refer to confidence limits, the .95 of CIs and .05 of significance tests are often interchanged this way.

One thing the logician in me wants to clear up. I don’t see there’s any ad hominem saying, “person X got it wrong in stating S”. Especially in going on to give reasons why S is incorrect. That is, there is no slip of discrediting S, by discrediting the person X, or claiming S is incorrect because of some irrelevant characteristic of person X who has uttered S.
Or is it you think Schachtman is subjecting her to a bit of ridicule in saying “Oreskes, the historian, gets the history wrong” ?

Reply

January 4, 2015

Nathan Schachtman

Mayo,

Thanks for the cross-post. The problem in my view is that Oreskes is, as you suggest, talking about standard of proof, but she has confused this was evaluation of random error and estimation in the use of confidence intervals. As for informal, non-quantified standards of proof, I agree that we modify our standard based upon values, costs, etc., but not in the completely elastic way that Oreskes suggests.

Oreskes does not frame her argument in terms of p-values, but rather in terms of confidence intervals, which she mistakenly attributes to Fisher than than Neyman. Her description of the operation of C.I.s is no better in my view. Data analysts do not accept or reject a causal claim based solely upon observational studies. None of Oreskes’ concerns revolve around randomized clinical trials, but very noisy epidemiologic studies on passive smoking, and even noisier research on climate change. Second, scientists of any stripe do not reject a causal claim simply because the p-value > 0.05; they may have failed to reject the null, but without more they would not accept the null and thus reject an association. (Oreskes again reduces evaluation of an association to a binary decision for accepting or rejecting causation. Third, Oreskes do not apply a 95% C.I. to accept a causal claim “only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20.” To me, it is at best ambiguous, and at worst just wrong to suggest describe a C.I. this way to a lay audience, which will not appreciate that the odds are based upon the assumption that the null hypothesis is correct., and that the p-value “relationship” at issue is a cumulative tail probability based upon the sample estimate. There is no probability that the parameter lies within the limits of the single confidence interval. So all in all, a rather bad performance, and a misguided critique of those who would not agree with her on passive smoking or on climate change.

Best.

Nathan

Reply

January 4, 2015

Mayo

Nate: yes, I added a remark at the end after rereading you and her. You may be right that there’s some ambiguity in saying “only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20″ but, as you know, I defend what is meant by such locutions fairly extensively (my posts on probability the results are a stat fluke).

But on the larger way of framing her position: it is common in literature by Cranor and others. Contolling a small type 1 error prob is kind of a metaphor for viewing science as being so afraid of false positives as to overly increase the chance of false negatives. This is at odds, it is claimed, with precautionary stands.

Coincidentally, the “trade-off” fact is discussed in my previous post in explaining where Ziliac and McCloskey go wrong in their discussion of power.

Reply
January 5, 2015

Mayo

Nathan: I hadn’t paid much attention to her remark:” It’s like not gambling in Las Vegas even though you had a nearly 95 percent chance of winning.” With that analogy she is assigning a posterior of .95 to the non-null, as it were.So your criticism would find its mark here.

Reply

January 4, 2015

matloff

Deborah, you seem to have a knack for finding instances of statistical illiteracy in the NYT. Good for you! (And Nathan.)

Of course, my own position is that the problem is less one of alpha levels than in using tests instead of confidence intervals. (And by CIs I mean using the intervals intelligently, not just checking whether they contain 0.) As long as the consumer knows the alpha level, and has an estimate of the size of the effect (center of the CI), it’s up to the consumer to decide how strong the evidence is. As Orestes points out, the 0.05 level, though traditional, is arbitrary.

That all assumes that the consumer understands the basic process. Orestes may or may not understand it, but in any case we can’t let her off the hook because she is “only” a historian. First of all, she is writing publicly about statistics, so she should get no free pass, right?

Second, I believe that every educated person should have a good understanding of the basic concepts of statistics. That’s actually a rather high bar — I contend that many who teach statistics really don’t have a good understanding — but it is a reasonable one, as it affects all of us. Every university should make statistics a firm requirement for graduation for all students. Once that is in place, we’ll address the question of how to find instructors who know what they are talking about. 🙂

Reply

January 4, 2015

Mayo

Matloff: Well I don’t know if I have a knack, people send me things….

CIs require testing reasoning to avoid fallacies:
https://errorstatistics.com/2013/06/05/do-cis-avoid-fallacies-of-tests-reforming-the-reformers-reblog-51712/

CIs alone are inadequate without the justification for essentially applying testing reasoning to inter[ret them.

Severity reasoning will do, e.g., to justify looking at the upper bound in interpreting a one-sided lower interval, to justify computing several benchmarks, for making sense of wide intervals, etc.

Reply

January 4, 2015

Nathan Schachtman

Norm, Tom, Mayo,

Thanks for the comments above. I confess that my original post was perhaps a bit “ad hominem” in the sense that I was taking Professor Oreskes to task about a historical statment. She is, after all, an historian of science, and I think there is a professional issue of accuracy and scholarship in the misattribution, but that was hardly the reason I wrote.

As for Oreskes’ not ever mentioning “confidence interval,” I think we should interpret her opinion piece as sympathetically as possible, while striving for accuracy, and not straining at a gnat, and swallowing a camel. When Oreskes states that “[t]ypically, scientists apply a 95 percent confidence limit,” and goes on at length about 95% something or other, my interpreting the “confidence limit,” as what statisticians and scientists call a “confidence interval,” is as sympthatic as I can get. I am open to suggestions about what “confidence limit” means if not a confidence interval. And then, whence comes the 95% burden proof if not from a mistaken identification between the coefficient of confidence and the posterior probability for the claim, given all the available evidence?

Without going Bayesian crazy, I acknowledge that we must make decisions at various levels of personal, social, and political organization to preserve our environment, and that those decisions must take into account the costs of deciding and not deciding. My larger point was that assessment of random error was just one aspect of any scientific conclusion or decision process; assessing data quality is often more important, and Oreskes unfortunately never addresses data quality in any of the examples she provides. There are qualitative and quantitative approaches to assessing bias and confounding, but for observational studies (such as are involved in climate change, and passive smoking), many of the confounders are residual and unquantified, and many of the biases

When we are not dealing with randomized experiments (as Fisher was, typically), we should not lose sight of the fact that random error is a minimal statement of total error or uncertainty, and that bias and confounding often swamp any estimate of error due to sampling. For a helpful, recent statement of the issues, see Timothy L. Lash, et al., “Good practices for quantitative bias analysis,” 43 Internat’l J. Epidem. 1969 (2014).

Nathan

Reply

January 5, 2015

john byrd

My read of this NYT piece is that she is applying public pressure on scientists to just declare the answer and stop fretting over the stats. The same argument could have been made a few years ago to the scientists trying to test for the Higgs boson. Perhaps funding agencies would see better value in lowered standards– at !east in the short run. I think Nathan had it right, and I also think that she does not really understand how the stat tests embed in the overall research program. They should not be a final answer, but more an opportunity to find fault with the hypotheses in question. The standard for concluding fault should be set by those who understand the overall issue (to include messy data, etc).

Reply

January 5, 2015

john byrd

I just remembered that some statisticians have made that criticism against the Higgs scientists as discussed in this blog.

Reply
January 5, 2015

Mayo

John: Ironically, Oreskes gives cover to scientists in a way that is radically at odds with the current move to label skeptics as “deniers”. Here ‘they’re given a justification in upholding the high standards of science. I really don’t see it as like the Higgs. Maybe some people were saying that years ago, but it just shows they didn’t grasp the particle physicist’s attitude. They already expected/believed in a Higgs of some sort, but it was and is majorly important to figure out its properties. This year, the machine goes back on, and the hope is to find something more interesting than the plain vanilla standard Higgs.

Reply

January 6, 2015

john byrd

I do not understand your response, partly because it is unclear what is quoted, and partly because my read is that she is criticizing what she believes to be too high a standard. Some said the same about the 5 sigma standard…

Reply

January 6, 2015

Mayo

John: For starters, they announced their discovery based on the 5 sigma, whereas I’ve never heard any “climate skeptic” say, we just haven’t reach 2.5 (or .05) sigma yet. Second, the Higgs’ scientists found the significant result (at least after July 4, 2012). She’s implying the climate scientists have not. There’s a difference.

Reply

January 6, 2015

Keith O'Rourke

Why why is your post not mainly about why 95% coverage or whatever is totally meaningless in the setting of Epidemiological studies?

You have given one of the best references (Lash paper) and Larry/NormalDeviate has provided one way to see this.

They way I put the point when I was teaching undergrads at Duke was to consider what happens to the coverage of a standard confidence interval from an Epi study as the sample size increases.

As it concentrates about the unknown effect + bias and we know the bias is not zero, the coverage approaches zero.

The same point about p_values can be seen in that under the null they are not any where near uniformly distributed. Even some well published statistical authors seem to have a hard time getting this.

Sander Greenland has written about this often, essentially it seems that most statistically trained can’t help but generalise statistical techniques appropriate with randomisation to non-randomised settings where they are total inappropriate.

Reply

January 6, 2015

Mayo

Keith: Sorry, are you asking why Nathan’s post isn’t about epi studies? Oh you must be referring to the second hand smoke business Schachtman mentions. You’re saying all epi studies yield confidence intervals with terrible coverage? Anyway I’ll let Schachtman answer.

Reply

January 6, 2015

phaneron0

Mayo: My comment was to Nathan Schachtman – I clicked the reply button beneath his comment where he gave the Lash reference 😦

Assessment of climate change mostly involves epidemiological type studies (hard to randomise areas to decreases or increases in what ever and get independent responses).

I am mostly just restating Larry’s point.

Reply

January 4, 2015

normaldeviate

I think there is a more fundamental problem with her
op-ed.
I wrote the following letter to the editor of the NY times:

Naomi Oreskes (“Playing Dumb on Climate Change”, NY times Jan 4, 2105)
has confused two different issues. She writes that scientists
“… will accept a causal claim only if they can show that the odds
of the relationship’s occurring by chance are no more than one in 20.”
Ms. Oreskes is confusing statistical significance (is it due to
chance?) with the problem of causation (correlations due to unobserved
confounding factors, not due to causation). If we had an infinite
amount of data, the limitations of statistical significance would
disappear, as they are due to finite samples. But the problem that
“correlation does not imply causation” would remain. The former is
an issue related to variability; the latter is about bias.
They are unrelated issues.

Of course, there is also the question of data quality that
you mention and I agree that that is indeed an issue.

Larry

Reply

January 4, 2015

Mayo

ND: So glad to have you on the blog, long time no comment. I’m impressed you wrote to the NYT on this. You are right of course. Still,especially in the context of the topic being discussed, I took it she was talking of applying the p-value threshold idea to the substantive causal claim. It isn’t as if one never uses p-values in assessing causes, and given that no one would dream of literally reaching conclusions about climate change via significance tests alone, I took her to be making the point that many make when arguing for lowering the (assumed) scientific levels of stringency required to infer risks in the context of areas with dire consequences.,

Reply

January 4, 2015

diffanon

Oh this is an absolutely classic line from Wasserman. I got to remember this one:

“If we had an infinite amount of data, the limitations of statistical significance would disappear, as they are due to finite samples.”

that’s about like saying:

“If I had infinite mouths, the limitations of eating hot lava would disappear, as they are due to my having just one mouth.”

Reply

January 5, 2015

Mayo

diffanon: I take it this refers to a “different” anonymous, which means you’re not a different anonymous at all, but the same.
I guess N.D. didn’t think your remark worth his response. I frankly didn’t get the relevance of the infinite mouths eating hot lava either.

Reply
January 6, 2015

normaldeviate

you missed my point which was in the following sentence.
She (the NY times writer) was confusing a sampling issue
(significance) with the problem of causation vs correlation.
The latter is independent of finite sample issues.
It would persist even with an infinite sample.

Reply
January 6, 2015

Nathan Schachtman

Keith, and All,

My original post was a response to Professor Oreskes’ opinion piece in the New York Times. Her focus was mostly on climate change debate, but she generalized to scientific claims generally, and wandered into the passive smoking/lung cancer issue. I was surprised by Oreskes’ focus on the interpretation of statistical evidence in the climate-change issues because I did not see statistical significance as a problem in the realm of “big data” that comes out of climate studies, but I am not terribly familiar with the models or the data in climate science.

I am considerably more familiar with the passive smoking/lung cancer issue, and I responded to Oreskes’ on this score as well. Again, this is not an area in which the data are sparse in the sense that we should, in my view, be worried about a Type II random error result. In stead, the problem with this epidemiologic example is the internal and external validity of the studies involved, substantial bias and confounding concerns, and obvious cherry-picking of studies by the EPA in its meta-analysis. In the context of a meta-analysis that aggregates a huge amount of data, for an outcome that is hardly rare, the move from the pre-specified 95% to a 90% confidence interval, was a a fairly shameless post-hoc, ad hoc maneuver. Just sayin’. I believe that one of the Peto brothers has written that in the context of a meta-analysis, we should be looking for a more stringent test to exclude random error as an explanation for an observed data discrepancy from a null hypothesis.

So, any way, my post was not mainly about epidemiology because Oreskes’ piece was not exclusively about epidemiology, although she certainly was painting with a broad brush. I am not sure that the conventional standard embedded in the p-value is “totally” meaningless. I don’t think Tim Lash would agree, and even Sander Greenland would probably insist that sampling error is something that must be evaluated in some form or another. Both would likely insist that in many areas of epidemiology, bias and confounding are much greater threats to validity than random error. I believe I said something similar in pointing out that random error is a minimum of uncertainty for any conclusion based upon observational studies. And I acknowledge that many randomized studies can have systematic errors as well.

I do, however, agree that many writers inappropriately interpret observational epidemiologic data with the same statistical lens as they use for randomized trials. Sander Greenland’s work in this area is helpful; his testimony as an expert witness less so. (In the interest of partial disclosure, Sander has been an expert witness for plaintiffs in thimerosal, silicone breast implant, and other litigations, and I have been defense counsel in some of those cases.)

As for what happens to the coverage of the C.I. as n (sample size) increases, I do think that is a valuable exercise, and I do something similar in my class, but I point out that we are not entitled to believe that effect size remains constant as the sample size increases. See, e.g., John Wood et al., “Trap of trends to statistical significance: Likelihood of near significant P-value becoming more significant with extra data,” 348 Brit. Med. J. 2215 (2014). If we have an unbiased, consistent estimator, then yes, coverage approaches zero as n becomes very large. I agree with your suggestion that this idealized model never exists in observational epidemiology, but there are papers with well-controlled logistic regressions, or with propensity score controls, which require us to take the reported associations seriously, and sometimes, and when they have passed muster with something akin to Bradford Hill’s factors, take as showing causality.

Nathan

Reply

January 6, 2015

phaneron0

Nathan:

> “In stead, the problem with this epidemiologic example is the internal and external validity of the studies involved, substantial bias and confounding concerns”

> “I am not terribly familiar with the models or the data in climate science.”

I will agree with the above and the below

> “idealized model never exists in observational epidemiology, but there are papers with well-controlled logistic regressions, or with propensity score controls, which require us to take the reported associations seriously”

with the caveat rarely in my experience to reducing the uncertainty to that which is quantified in type 1 and 2 errors.

So why not make folks clearly aware of this?

Keith

Reply

January 6, 2015

Nathan Schachtman

Keith,

Good question. I would hope that professional epidemiologists carry around with them an understanding of these limitations, caveats, etc., and that they apply them to any study they pick up and analyze. (Rothman has made this point with respect to not correcting for multiple testing.) In my experience, some epidemiologists and journal editors are not so well trained, and that creates problems. More problematic is what happens when journalists, press agents, lawyers, and others get their hands on studies. I know that my life would be much easier (but probably less busy) if the limitations were clearly stated again within each published work.

Nathan

Reply

January 6, 2015

Sander

This is a narrow two-part reply to Nathan’s comment “Sander Greenland’s work in this area is helpful; his testimony as an expert witness less so. In the interest of partial disclosure, Sander has been an expert witness for plaintiffs in thimerosal, silicone breast implant, and other litigations, and I have been defense counsel in some of those cases.”
Certainly my testimony has been less helpful to defense lawyers than my educational writings. But beware of partial disclosures, and beware of legal testimony – in that process, you are essentially being edited by opposing lawyers to look bad via the this mechanism: They choose the questions and you are limited in your reply.

Please Nathan, the next time you or your colleagues cross-examine me, how about asking up front whether I actually think the treatment at issue is harmful based on the evidence as I know it. I routinely forewarn those who ask me to testify that they may not like the answers I give. Interestingly, this has only seemed to turn away defense lawyers, not plaintiff lawyers.

Unfortunately, defense lawyers often claim that P>0.05 means the null of no harm is supported (if not proven) by the data. I am happy to testify that this sort of statistical misinference is false – even when I would rule against the plaintiffs were I the trier of fact. And when as happens expert witnesses make this false claim, it’s statistical malpractice and needs to be called out, even if it seems to aid a meritless case to do so; see Greenland, Preventive Medicine 2011;53:225-228 and Annals of Epid 2012;22:364-368. The 2012 article was rejected by Pharmacoepidemiology and Drug Safety solely because it identified a statistics professor who used this claim in expert testimony; I have the editor’s e-mail to prove that (perhaps they were afraid of getting sued).

Reply

January 6, 2015

Sander

Part II, the cases Nathan mentioned, both of which illustrate nicely how the judicial system is not a friendly environment for honest testimony about the subtleties debated in this blog…

For thimerosal (which involved petitioners, not plaintiffs): the government never deposed me to see what I thought beforehand, and didn’t even ask until the end of my vaccine-court appearance whether I thought thimerosal causes autism – when they asked that as the very last question of my cross examination, I answered “no.” I agreed to testify for the petitioners solely regarding the government claim that the epidemiologic data showed there was no elevated risk in the subgroup delineated by the petitioners theory. The claim was rubbish – and all you’d have to do is compute the 95% confidence interval for that subgroup (which was NOT cherry picked from some ensemble of subgroups) to see that the epi data was not by itself definitive about anything for that subgroup. Why? Small numbers.

For breast implants: I refused to testify on behalf of any claimant or case. The plaintiff’s committee still asked that I testify at the 1996 rule 706 hearing about what I thought. I did because I had been a consultant to defendant Dow Corning’s own meta-analysis, which produced an estimate of 80% risk increase for certain rare connective-tissue diseases. I said that needed to be studied further. And it was: Our subsequent Dow-Corning funded studies observed that sort of risk elevation for reconstructive breast implants; see Greenland & Finkle, Ann Epid 2000;10:205-213, table 3. Dow Corning pulled our funding after that finding.

Would you, Nathan, have more to disclose more about these or other cases?

Reply

January 6, 2015

Nathan A. Schachtman

Sander,

Thanks for the cross-examination tips; they might come in handy. I did not have a representation in the thimerosal litigation. For the silicone gel breast implant litigation, I had a good deal of responsibility for Bristol-Myers Squibb’s defense. And so I can confirm that when pressed in proceedings before the court-appointed expert witnesses, you did acknowledge that you could not endorse the causal conclusion that was being advanced by plaintiffs. As I recall, you expressed a concern that a decision by the court might prematurely end research interest in the causal hypothesis. (When I walked out of that Alabama court room, I had a microphone thrust in my face, with TV cameras rolling, and I was asked what I thought about your testimony. I said that you were a very capable biostatistician, but I hoped their audience heard that you said that you could not endorse the plaintiffs causal claims as scientific conclusions. The microphone instantly disappeared, and the TV lights went off. Within 10 seconds, I was standing all alone at the top of the stairs to the courthouse.)

Your points are fair comment, but remember that a defense verdict, or a defense outcome in a so-called Daubert hearing does not require a showing that the exposure (drug, workplace exposure, etc.) does NOT cause the outcome of interest. It suffices for the defense to have the court, or the jury, accept that the plaintiffs cannot make out their causal claim (assuming that they have the burden of proof) by reliable or accepted methods. In my view, your concern was misplaced given what the legal process can and does accomplish.

As for my own involvement in litigation, I have been involved almost entirely on the defense side of pharmaceutical and other tort litigations. Of course, when I succeed in cutting off legal claims at an early stage of the legal process, I put myself out of business. (And indeed, there are some defense lawyers who are less than zealous in their pursuit of so-called Daubert motions, acting perhaps in their own financial self-interest. Dunno for sure; I am not a psychoanalyst, but I have suspicions.)

If I didn’t say it earlier (although I think I did), your methodological works, and especially the the papers you have written about errors and fallacies in testimony in legal proceedings have generally been very helpful. I have cited them here, on my blog, and I urge my students to read them. From your perspective, you see erroneous testimony given by defense expert witnesses but not plaintiffs’ expert witnesses. I would be happy to provide examples of errors of statistical interpretation on the other side. Regardless of which side’s expert witnesses are being criticized, error is error wherever you find it.

I find the low level of statistical sophistication in testifying expert witnesses on both sides often to be very distressing. I don’t think you were at all wrong to name the defense expert witness who so butchered his interpretation of significance probability. Your editor was probably concerned about defamation liability, which in the U.K. is a big problem. Recall the mischief that William McBride made with his suits against Robert Brent and others, over claims arising from the Bendectin litigation. My only criticism of your naming the defense expert witness is that you might easily have found some examples of plaintiffs’ expert witnesses who give erroneous interpretations to statistical concepts.

And yes, legal proceedings are often not friendly fora for expressing nuanced views and opinions. For my part, I have advocated greater use of court-appointed expert witnesses, or even science courts. On occasion, the IOM has been helpful, but usually judges are not paying attention, and take a “let-it-all in” attitude, which I have also criticized.

Best.

Nathan

Reply

January 6, 2015

Mayo

Nate: I will come back to study your comments, just to say I’m very impressed you threw any potential legal cautions to the wind in the interest of furthering our understanding of these matters. (Thinking about the “no comments” on your blog, I mean.) You can be free here, and it’s extremely valuable for readers to get these perspectives from the nitty gritty players. Thank you!

Reply
January 6, 2015

Sander

Thanks Nathan for your thoughtful response. I know I’ve received a biased sample of expert reports to examine, but I seem to have been on the go-to list when plaintiff lawyers spot defense-expert distortions, yet on a DNC list for defense (or at least no call back after I recite my preconditions for retention: that I will reach whatever conclusion I reach and retain the right to publish that conclusion and my reasons for it, whether it helps or hurts their case).

I initially found it paradoxical that plaintiffs would seek me out and defense would walk, because, as you note, I’ve long written about methodological problems in reaching any conclusion, and the burden of proof is on the plaintiff. My subsequent experience has helped explain this paradox, however. In any event, I have not to my knowledge implied that only defense experts commit statistical distortions, and I would be grateful if you sent me some clear and egregious examples of statistical malpractice by plaintiff expert statisticians and epidemiologists to balance out my accounts.

I certainly agree that the system would be far better if the courts, not the lawyers, hired the expert witnesses, subject to acceptance of their choices by both sides (perhaps in a way analogous to jury selection). That would not only help the court; it would also help the defendants in cases (as I’ve witnessed) where the defense lawyers seemed to be dragging out the litigation to the point that it would have cost the defendant less to settle early.

Reply

January 6, 2015

Nathan Schachtman

Sander,

Understood. I would welcome the chance to share some examples of what I think are egregious testimony. Here is a start:

http://schachtmanlaw.com/pritchard-v-dow-agro-gatekeeping-exemplified/

I can send you the opinions of the courts if you like, and I will keep you in mind for other examples.

I also understand that it is sometimes politically difficult to call out expert witnesses or judges in cases in which you are involved. The ethical boundaries are not well defined for legal counsel, but most lawyers, on both sides of the “v.” will usually refrain from commenting upon a pending case. For my part, I have tried to be an equal opportunity curmudgeon, but truth be told, I have criticized plaintiffs’ expert witnesses more often than I have defense witnesses.

Nathan

Reply

January 7, 2015

Mayo

Nathan: I see the source of my record hits stemmed from this outfit recommending your piece on Oreskes and linking to me:
http://bishophill.squarespace.com/blog/2015/1/6/oreskes-on-statistics.html

There are numerous comments that I think readers would find interesting (haven’t read them all).

Reply
January 7, 2015

Sander

Thanks Nathan…your link is a bit disappointing though because the expert there was merely a county medical examiner, and his junk analysis was duly struck. The expert I quoted in my citations was a full professor of biostatistics at a major public university, a Fellow of the American Statistical Association, a holder of large NIH grants, and his analysis (more subtle in its transgressions) was admitted. What I would like to get are equivalent examples involving similarly well-credentialed, professionally accomplished plaintiff experts whose testimony was likewise admitted – in other words, where the gatekeeper was unable to recognize the malpractice thanks to its relative subtlety (even though participants on this blog would have spotted it immediately).

Reply

January 7, 2015

Nathan A. Schachtman

Sander,

For sure, your target was of a higher quality, and perhaps more in need of public correction, but from the perspective of the legal system, the danger of deception and error exists whenever an expert witness takes the stand. The professor you called out in your published article should have known better. My goal is to make sure that lawyers know better before submitting such reports, and your articles (cited above) are helpful to avoid such embarrassments.

The expert witness at issue in the Pritchard case was a pathologist, currently a coroner, but he also has an MPH, and holds himself out as an epidemiologist. You may recognize him as the fellow who claims to have identified the problem of chronic traumatic encephalopathy (CTE) among pro football players. Lesser qualified witnesses testify all the time about statistical issues. Often the witnesses have strong credentials in clinical medicine but are ignoramuses when it comes to statistics and probability. Consider the example of Sir Roy Meadow, who testified in Regina v. Sally Clark, to the dismay of the Royal Statistical Society.

On the internet, no one knows you’re a dog. In the courtroom, no one knows you’re not a real expert. The legal criteria for expertise are low.

Nathan

Reply

January 7, 2015

Keith O'Rourke

Nathan: Fascinating discussion (I once did an independent study of a law school’s material on civil dispute processing supervised by a lawyer for a semiotics project).

> “Often the witnesses have strong credentials in clinical medicine but are ignoramuses when it comes to statistics and probability.”

This does suggest that those retaining the experts do not see it in their best interests to find the most qualified or do not know how to. Sander’s comments suggest this is systematically different between plaintiff and defense counsel. You seem to disagree – at least somewhat.

p.s. thanks for your answer to my last question.

January 5, 2015

diffanon

Inspired by John Byrd’s comment:

I know this is sacrilege on this blog, but I strongly reject the whole “stat testing is an opportunity to find fault with the hypothesis in question” line or Mayo’s severity requirement for statistical tests. Those are just her opinions (or Popper’s opinion or whatever) and not ground truth. They’re open to debate.

Fundamentally, experiments can be good/sensitive/sever or bad/wasteful/ineffectual tests of a theory, but the statistical analysis only serves to process the data. The statistical analysis isn’t a “test” at all; either literally or figuratively. It’s just a way to get at the information inherent in the data.

Let me make this concrete with a realistic example of good science. Imagine I propose a modification to the law of gravity which says that “mass” needs to be replaced by “mass+mu” where mu is a small term representing some new effect.

Loosely speaking I could say I want a “severe” test of this theory but that has nothing to do with the statistical analysis. It means I want an experimental setup/equipment capable of measuring mu precisely. If the experimental setup isn’t sensitive enough, I wont get a resolution to my quandary either way. Again: it’s a property of my equipment not the statistical test!!!

Suppose then I’m a good experimenter and get a precise measurement of mu. One of two things will happen. The logic used here is important, because regardless of whether I accept or reject the new theory my reasoning in no way uses what Mayo claims is a requirement for good “statistical testing” or anything that was taught in stat 101 for that matter.

First case: suppose mu is found to be within the range 1 +/- 10^{-25}. This is clear evidence that mu isn’t zero, so hooray for my new theory.

Case two: suppose mu is found to be 0 +/- 10^{-25}. In this case, mu may not be zero*, but it is has to be so close to zero that if I just use mu=0 my future gravity equations will still be highly accurate. In other words, regardless of whether the new theory is true or not there’s no harm in using the old theory.

Notice that all this talk about “fail-to-reject” “statistical significance”, the severity of the statistical test is completely irrelevant. The only thing I need the statistical analysis to do is give me a interval which contains all values of mu reasonably consistent with the experimental data and everything else I may now about mu**.

And that is exactly what the Bayesian credibility interval for mu does: and what CI’s do not do except in simple cases where it mimics the Bayesian answer.

So if I’m doing this kind of good science—which I claim is far superior to what’s going on in even the best of social sciences—then why exactly should I care about “severity of my statistical tests” which everyone around here claims is essential?

*We could suppose for the sake of argument that all Null Hypothesis are wrong if desired.

**obviously mu can’t be large because that would radically change the observed properties of the solar

Reply

January 5, 2015

Mayo

diffanon: I think your main complaint isn’t sacrilege or even radical–just semantics.
“Fundamentally, experiments can be good/sensitive/sever or bad/wasteful/ineffectual tests of a theory, but the statistical analysis only serves to process the data.”
Statistical tests rarely directly test a substantive theory, but I have written quite a lot on relating stat tests to learning about & testing claims in experimental general relativity and other areas, As I show, progress is in terms of setting essentially intervals for parameters of substantive theory, e.g., deflection of light, by means of parameters in statistical models.
Properties of tests ARE properties of the “equipment” used to probe the phenomenon.
Reporting the values reasonably consistent with data is fine but it’s not really enough for an inference (but I’d have to see the full warrant.) You don’t have to call it a test when you produce an interval of values, but you’d need to tell me why your interval is warranted, and then you’d be giving what I deem testing criteria. But I’d require more: even if I was interested in
reporting such an interval, I wouldn’t want to treat all the
values in the interval on par. That’s what’s wrong with confidence intervals, at least without a supplement leading to some benchmarks. Some of the values within the interval are poorly warranted.

I don’t see anything to require a Bayesian prior in your analysis, by the way. Is it a non-subjective prior or a subjective prior? If the latter, why do I want to mix that with the data?

So basically, I see your remark as making some “word usage” points which likely reflect your not being familiar with my general philosophy of statistics.Thanks.
Sorry I’m in a moving vehicle and have had to correct some flaws in the first version of this comment.

Reply
January 6, 2015

Mayo

Sander: Thanks so much for your very informative comment. Maybe my blog can improve or in some way alter legal cross-examinations, that would be interesting:

“Please Nathan, the next time you or your colleagues cross-examine me, how about asking up front whether I actually think the treatment at issue is harmful based on the evidence as I know it.”

Reply
January 7, 2015

john byrd

“The logic used here is important, because regardless of whether I accept or reject the new theory my reasoning in no way uses what Mayo claims is a requirement for good “statistical testing” or anything that was taught in stat 101 for that matter.”. This statement is followed by , “First case: suppose mu is found to be within the range 1 +/- 10^{-25}. This is clear evidence that mu isn’t zero, so hooray for my new theory.”. Clear evidence, because it is based on the same reasoning as error statistics, which is same reasoning as the 5 sigma standard for the Higgs… I do not think you have sorted out the underlying principles for the statistics.

” The only thing I need the statistical analysis to do is give me a interval which contains all values of mu reasonably consistent with the experimental data and everything else I may now about mu**.”. Thus statement seems out of place with the other points, which seem to me uncontroversial. Building an interval that must be based on what you think you know sounds like a self-licking ice cream cone.

Reply

Pingback: Naomi Oreskes Plays Dumb On Statistics And Climate Change | William M. Briggs
January 6, 2015

Mayo

I see that Briggs has a post on Oreske and links back to my post. Too bad Briggs commits a number of misstatements about error probabilities himself! No time to explain, (though readers of this blog will spot them), have to celebrate my birthday.
http://wmbriggs.com/blog/?p=15179

Reply
January 6, 2015

Mayo

For some reason I have a record # of hits today, higher than any post for the past 3 years. Maybe because it’s my birthday or something.

Reply

January 8, 2015

Russell Cook (@questionAGW)

Perhaps because Marc Morano’s ClimateDepot.com linked to this post. It is what brings me here.

If you and Mr Schachtman are unaware of it, Ms Oreskes has serious and unreported problems with her ad hominem attacks of skeptic climate scientists as ‘paid fossil fuel industry shills’. Please see my best efforts to get that unreported story out: “Naomi Oreskes’ Problems, pt 1” http://gelbspanfiles.com/?p=2009 and “Pt II” http://gelbspanfiles.com/?p=2039

Reply

January 6, 2015

alQpr

Thank you Nathan for admitting that challenging Oreskes’ claim regarding the origin of “confidence levels” is effectively an ad hominem when it comes to the issue of how they should be used.

Not that ad hominems are always improper. Something that undermines the credibility of a source, while not actually invalidating their argument, may well be taken as a good reason for not bothering to read it – but in this case your attack backfires and it is your own credibility that is undermined.

What Oreskes actually said is “The 95 percent confidence level is generally credited to the British statistician R. A. Fisher” and this is undoubtedly true, for even though Fisher was not the originator of confidence *intervals* , long before they were invented he did so much to popularize p=5% as an appropriate indicator of significance that our friend Briggs exemplifies the masses by saying “That rotten 95-percent ‘confidence’ came from Fisher… ”

P.S. Speaking of Briggs, I see a lot of nonsense there but nothing I can clearly identify as meaningful “misstatements about error probabilities”. Perhaps after a well deserved birthday break DMayo can give dimwits like me a pointer to something he says on that score that is both meaningful and wrong.

Reply

January 6, 2015

Mayo

alQpr: As I said in reply to Tom, I fail to see the “ad hominem” here, especially as reasons are given for the claim itself. A bit of ridicule, perhaps.
*Addition: Rereading, I see the allegation was that Schachtman was taking her flawed claim on the origins of method M as grounds to criticize her use of method M, or something like that, but I didn’t see him doing that. I’d be glad to “give dimwits like [the author of the comment] a pointer to something [Briggs] says on [error probs] that is both meaningful and wrong”. Of course, these things are constantly discussed and clarified on my blog. Thanks for your comment.

Reply

January 7, 2015

alQpr

Thanks for the reply. With regard to the “ad hominem” I guess we just disagree about Nathan’s intent in commenting (I think a bit unfairly) on Oreskes’ history. But in my defense he did acknowledge (in the Jan 4 comment) that his post was “perhaps a bit” ad hominem. And I actually think that, although not logically compelling, an ad hominem can sometimes legitimately be used to attack the “credibility of the witness” when people have to decide how much time to devote to following an argument. To my mind the relevance of Oreskes’ error could have been more clearly established by excusing her *historical* error as resulting from a confusion (which Nathan failed to effectively point out) betweeen the concept of a confidence *interval* and the complement of the significance *level* with which we reject a null hypothesis. But in the end I think the point is moot since her article struck me largely as confused technobabble in support of a point which could have been expressed much more simply.

On Briggs, I am aware of some of your differences and of what I too see as “misstatements about error probabilities” elsewhere in his work. I couldn’t see what you were referring to in his particular post on Oreskes, but perhaps you were speaking more generally (or I have missed something which I *would* appreciate having pointed out if you can find the time).

Reply

Pingback: alQpr » Blog Archive » Significance Levels and Climate Change
January 7, 2015

Richard S.J. Tol

NormalDeviate rightly notes that Oreskes mixes up significance and causation.

Oreskes overlooks that greenhouse gas concentrations and climate are non-stationary, so her talk of correlations is misguided.

There is a third categorical error: Although we can a statistics discussion about the relationship between greenhouse gas emissions and climate, and infer whether or not climate change is (at least partly) human-made, we cannot deduce from that, as Oreskes does, whether climate change is dangerous or not.

Data on the impacts of climate change are of much lower quality than data on climate change itself, the issue of confounding variables is much more important, and different impacts have different sign and sizes.

Reply

January 7, 2015

Mayo

Richard: Thanks for your comment. It would advance the level of climate debate if the disputants focused on the distinction you note:

“Data on the impacts of climate change are of much lower quality than data on climate change itself, the issue of confounding variables is much more important, and different impacts have different sign and sizes.”

Reply

January 7, 2015

Nathan Schachtman

Keith,

Last comment, and then I must get back to the pleasant task of reading students’ papers.

When it comes to statistical evidence, I think that both plaintiffs’ and defendants’ counsel exhibit an aversion to statisticians and to statistically sophisticated epidemiologists. To be sure, I am generalizing about how lawyers make selections, and there are counter examples on both sides, to be sure. (I would like to think that I have worked with some outstandingly qualified epidemiologists and statisticians, and that they have worked hard to make their testimonies understandable without being inaccurate; not always an easy task.)

The reasons for the lawyers’ aversion are complex. In the first place, juries and judges generally do not understand the statistical issues, and they have little patience with attempts to have them explained. The trial process, which now frequently includes “chess-clock” time limits makes trials an unfriendly forum for addressing these issues carefully. Second, lawyers believe, with some justification, that clinicians (physicians with specialization in the area of medicine at issue) are more credible with jurors than statisticians who lack medical training, and who are not involved in patience care and treatment. Judges do not realize that many physicians have never had even a basic course in statistics. (I believe that the new MCAT exam now covers some statistical concepts for the first time.) Third, there is a further problem in that communication skills and “jury appeal” do not always correlate with statistical acumen. I could go on, but you get the picture.

There are asymmetries between plaintiffs’ and defense counsel, to which Sander alluded. For one thing, a products liability case is about more than medical causation. Plaintiff typically must show that the product carried a risk of serious harm, which was not adequately warned about, and that the manufacturer should have known of the product’s ability to cause harm at the time of marketing. For drugs and medical devices, the warning is owed to the prescribing physician, not the ultimate consumer. And the product must have been the cause in fact of the plaintiffs’ claimed harm. You would be surprised how often the defense is not whether the drug can cause the harm (because it does) but whether it caused plaintiffs’ harm, or whether the warning was adequate, or whether the physician would have prescribed the drug even with the most Draconian warning. The point is that plaintiffs must show every element of the case they need to prevail, whereas the defense need only defeat one of the necessary elements. In my view, this asymmetry leads to more indiscriminate use of expert witness opinion testimony by plaintiffs. There are, however, plenty of examples of questionable testimony on the defense side.

One last point. Sander noted that his retention agreement insists upon reserving the right to publish about the case. I have never refused such a request. When expert witnesses have expressed the intent to publish their work, begun in litigation, or their observations about the case, I have been careful to remind them about the importance of disclosing their status as a retained and compensated expert witness, and to avoid sharing any manuscripts with me before acceptance for publication.

OK; back to work.

Nathan

Reply
January 8, 2015

Miner49er

Leave it to a “social scientist” to get the “scientist” part wrong.

Reply
January 8, 2015

gofigure560

A worthy debate, no doubt, but this obfuscation needs to be bypassed to deal with the claims.

The physicist Richard Feynman said that it doesn’t matter how smart or powerful you are, if your hypothesis is contradicted by the empirical data, you need a new hypothesis. The anthropogenic global warming (AGW) hypothesis claims that the increasing level of carbon dioxide (co2) due to human activities causes global warming. However, there is no evidence attributing co2 increase to global warming. During the past few decades, as co2 continually increased, there have been both cooling and warming periods. In fact, with co2 now at its highest level in a very long time, there’s been no additional increase in global temperature for almost two decades. The co2 level has been steadily increasing since the mid 1800s, but the science is clear that co2 capability to contribute to warming (if it indeed does) diminishes as its level increases. An old experiment which showed an increased temperature by adding more co2 to an enclosed container is hardly adequate for making assumptions about co2 influence on weather related activities in the open atmosphere.

“The seas are rising!”. The seas have been rising for the past 18,000 years, since the last (real) ice age began melting (except possibly for a few hundred years of reversal during the more recent Little Ice Age). Sea level has risen 400+ feet. The current annual sea level rise is a miniscule 1 to 3 mm per YEAR! (1 mm = .0393701 inches.) The rate of annual increase has actually been dropping for the past several thousand years.

The Antarctic ice extent has been growing since satellite measurements began. While the Arctic lost some of its ice earlier it has regained most of that in the past couple of years, continues to increase, and is now within 2 standard deviations of its long term average. The western portion of the Antarctic, a small component of Antarctica, is apparently being influenced otherwise by one or more underwater active volcanos.

Some perspective helps. There have been 13 ice ages in the past 1.3 million years, average duration of each being 90,000 years. Each ice age was followed by a warming period, (commonly called an interglacial period, such as the one we now enjoy) average duration 10,000 years. When there is no further increase in sea level and when glaciers are no longer shrinking, it’s a good bet that the next ice age or, at least a Little Ice Age, is underway. (Cooling is much less desirable than a modest warming.)

Actual data for the past few decades clearly demonstrates that extreme weather events (typhoons, hurricanes, tornados, floods, droughts) have been less frequent and less severe. The only conclusion that can be drawn is that weather is more pleasant when the earth is warmer. Even most scientists who otherwise belief AGW is significant are embarrassed by the un-informed folks who continue, in the face of facts, to blame these common weather events on human-caused global warming. The UN’s IPCC, (Intergovernmental Panel on Climate Change) in its most recent report, admitted there is no indication that co2 level has any impact on our climate insofar as bizarre weather events.

The IPCC has also now (once again) recognized that our global temperature is now at a record level over the past 800 years. That declaration is a back-handed admission that the Medieval Warming Period (MWP) was both a global event and was warmer than now. No surprise, since there have been numerous studies from around the globe confirming this. Watch out for significant variations between the IPCC actual report and the “summary” report it supplies to politicians and the news media. This should not be surprising once you understand that the IPCC is basically a political organization tasked specifically to find anthropogenic warming. Its funding would instantly disappear if it admitted that our current warming was mostly due to natural climate variation.

Human activity was clearly not responsible for climate change during the MWP. At that time the co2 level was constant (and lower, around 280 ppmv.) What’s more, earlier warming periods (before the MWP) and during this interglacial were at even higher temperatures. Our current warming is well within the bounds of natural climate variation.

The beginning of our current warming (such as it is) is invariably associated with the beginning of the industrial revolution (mid 1800s) and the associated rise in co2 level. But there is no justification for that cherry-picked start-date. Our current warming actually began, by definition, at the bottom (the low temperature) of the Little Ice Age, which took place during the mid 1600s. That implies two centuries of natural warming BEFORE the industrial revolution and before co2 began to increase.

The only known correlation between global temperature and co2 variation is over geologic periods and during that era temperature variations always occurred first, and were reflected hundreds of years later, by similar variations in co2 level. That is the carbon cycle at work. Because of oceans’ much greater heat capacity than the atmosphere oceans cool and warm much more slowly. Oceans outgas when warmer, and absorb gas when cooler. Notice that the two recent periods of cooling or flat temperature both cover significantly longer durations than was needed for alarmists to begin claiming that the warming was due to human activity. Those same alarmists now argue that two decades of temperature “hiatus” are not sufficient to refute their AGW hypothesis. In the more distant past co2 has been 10 to 20 times higher than now. Co2 has also been much higher during two ice ages and going into once ice age, so neither does there appear to be any nearby “trigger”.

Is the greenhouse gas theory applicable to our atmosphere? The barriers in a real greenhouse confine heat much more effectively than our open atmosphere and satellite measurements show that heat is escaping to space. What’s more, the oceans (70% of the earth surface) are basically impervious to the long wave radiation that supposedly heats the planet.

All the computer models projecting global warming assume that the real greenhouse gas culprit is water vapor. Water vapor, according to the models’ authors, supposedly provides a positive feedback, bringing on temperature increases 2 to 3 times greater than was brought on by increasing co2. But this feedback assumption is speculative at best because NO ONE yet understands climate feedbacks. Cloud cover, one aspect of water vapor, likely provides a negative (offsetting) feedback. Sufficient time has now passed to show that practically every imaginable computer model scenario has grossly over-estimated actual temperature increase. In any case, computer model output is NOT evidence of anything apart from demonstrating the understanding (and perhaps confirmation bias) of its authors!

Co2 is a trace gas, and represents 4/100 of one percent, by volume, of the atmosphere. This is also referred to as 400 parts per million by volume (ppmv), or .0004. The recent average annual increase in co2 is about 2 ppmv (.000002). The administration is promising to reduce US emissions by 17% over the next several years. But the economic analysis, using the alarmists’ own numbers, indicates that the cost to our economy (which does not take into account the impact on other countries) would be enormous, and, even assuming success, would have an impact on temperature too miniscule to measure. Such a policy would lead to massive disruption – hundreds of billions (if not trillions) in cost and NO IMPROVEMENT! This has to be obvious to everyone but rent-seekers.

There has not been even one coherent attempt at rebuttal of the issues raised herein. In fact, many of the alarmists still deny that global temperature has stalled or that bizarre weather is still natural climate variation. So far the only response to questions or criticism of AGW involve circular logic, name-calling, “appeals to authority” (hardly relevant when it is “authority” which is in question), “consensus” claims, or that the science is “settled”. Michael Mann (infamous “hockey stick graph” author) responds to scientific criticisms by ignoring the facts presented, and instead asks whether the reader prefers to have their gall bladder taken out by a dentist rather than a surgeon. Science is never settled and certainly not decided by votes. The “consensus” claims are invariably based on completely debunked surveys. In any other situation such ‘consensus” studies would have long-since been exposed as major embarrassments for both the authors and the involved institutions.

When defenders of any theory supposedly based on science will not debate the science, instead attacking skeptics or providing irrelevant “rebuttals”, the proponents deserve no credibility. Government agencies keep announcing that the latest year, or month, has set a new high temperature record. But (1) they are using the terrestrial global temperature data which covers less than 30% of the earth and most of it requires revisions because of station environment, and the differences involve hundredths of a degree which is miniscule, so useless because it is smaller than known error in temp measurement, and (2) satellite temperature recordings (which agree with weather balloon measurements) cover most of the globe. One of the satellite measurements shows no additional warming for the past 18+ years. The average across the two satellite data-sets plus the three terrestrial data-sets show no additional warming for the past 13+ years. Readers should ask why any supposed scientific government agency would continue using the land thermometer data for global temperature measurement when even NASA has long since admitted that satellite readings are more accurate so preferable.

It is clear that human activity is contributing to the increase in carbon dioxide. However, some perspective, again. By 2099 the co2 level is projected to reach 600 ppmv (this assumes a continuation of the annual increase of about 2ppmv per year). A crowded gym with poor venting is likely to be at 1,000 ppmv. Submarine crews work, for months, in atmospheres of 3,000 to 5,000+ ppmv. Plants LOVE the increased co2 level and, in that environment, require less water and provide more oxygen. Scientists have also acknowledged that lifeforms not unlike our own survived in co2 levels which were many times higher than now. Some scientists have concluded that the optimum level for co2 would be about 4 times higher than it is now.

Skeptics of these alarmist claims are at least as interested in saving the planet as the alarmists – we have grandchildren too. There is time, and technology will likely come up with sensible solutions, if needed, long before the co2 level becomes a problem. In the meantime, we will all enjoy a healthier environment, at least until the arrival of the next ice age. Invoking the “precautionary principle” to address an implausible hypothesis would bring on catastrophic economic results. Do not permit the politicians and alarmists to foist this hobgoblin on us!

Reply

January 8, 2015

Mayo

go figure: We had a recent post with some of these points:
https://errorstatistics.com/2014/12/13/s-stanley-young-are-there-mortality-co-benefits-to-the-clean-power-plan-it-depends-guest-post/

Maybe there are signs the debate will become more evidence oriented and less political. I’m not up on it, but I am bothered by the politicization.

Reply
January 9, 2015

cleanwater2

gofigure560
Thank you for posting some scientific facts.
From an environmental engineer with 50+ years of design and construction experience and a knowledge of quantum physics.

Reply
January 9, 2015

john byrd

I am not read in on all of these studies, but did a lot of work in archaeology in the past. The general public is largely unaware of the normal climatic shifts we have seen on geologic and archaeological time scales. It is indeed the norm. What scares me is the possibility of combining a natural warm trend with human-induced warming, if such can indeed occur. In any event, I find it disturbing to see scientists on either side of the debate subjected to personal attacks. This practice will compromise our ability to reach good policy.

Reply

January 9, 2015

cleanwater2

As more and more evidence becomes accepted that the Greenhouse gas effect does not exist-all of Naomi Oreskes writings will have to be moved from science fiction section to the children’s fairy tales shelve along side Pinocchio.

Reply
Pingback: The Rhetoric of Playing Dumb on Statistical Significance – Further Comments on Oreskes | Schachtman Law
January 20, 2015

Mayo

Schachtman further comments on Oreskes in a new blogpost:

http://schachtmanlaw.com/the-rhetoric-of-playing-dumb-on-statistical-significance-further-comments-on-oreskes/

Reply
Pingback: Sander Greenland on “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics” | Schachtman Law
December 28, 2017

climatewise101

Thank you for your post! Dr. Oreskes appears to be unaware that Svante Arrhenius amended his view on climate change in a paper he wrote in German in 1906 – translation here: https://www.friendsofscience.org/assets/documents/Arrhenius%201906,%20final.pdf She frequently claims there is ‘dark money’ out there funding ‘denial’ but clearly there is much bigger money funding ‘consensus.’ https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3029939 – Michelle Stirling, Communications Manager

Reply

January 24, 2018

Roger Jones

This misrepresents Arrhenius’ revision. He reduced the change in doubling CO2 from 5 degrees to 3.9 degrees rounded to 4. This is a “look there’s a squirrel” tactic. It makes no difference to Oreskes’ underlying position on the science, irrespective of how her view on the statistics of warming could be interpreted.

Reply

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension. Cancel reply

Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)

Post navigation

61 thoughts on “Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018. All Rights Reserved.

Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)

Related

Post navigation

61 thoughts on “Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018. All Rights Reserved.