# Posts Tagged With: Evidence-based medicine

## Excerpts from S. Senn’s Letter on “Replication, p-values and Evidence”

.

I first blogged this letter here. Below the references are some more recent blog links of relevance to this issue.

Dear Reader:  I am typing in some excerpts from a letter Stephen Senn shared with me in relation to my April 28, 2012 blogpost.  It is a letter to the editor of Statistics in Medicine  in response to S. Goodman. It contains several important points that get to the issues we’ve been discussing. You can read the full letter here. Sincerely, D. G. Mayo

STATISTICS IN MEDICINE, LETTER TO THE EDITOR

From: Stephen Senn*

Some years ago, in the pages of this journal, Goodman gave an interesting analysis of ‘replication probabilities’ of p-values. Specifically, he considered the possibility that a given experiment had produced a p-value that indicated ‘significance’ or near significance (he considered the range p=0.10 to 0.001) and then calculated the probability that a study with equal power would produce a significant result at the conventional level of significance of 0.05. He showed, for example, that given an uninformative prior, and (subsequently) a resulting p-value that was exactly 0.05 from the first experiment, the probability of significance in the second experiment was 50 per cent. A more general form of this result is as follows. If the first trial yields p=α then the probability that a second trial will be significant at significance level α (and in the same direction as the first trial) is 0.5. Continue reading

Categories: 4 years ago!, reproducibility, S. Senn, Statistics |

## Will the Real Junk Science Please Stand Up?

Junk Science (as first coined).* Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. (Yes, this was the first popular use of “junk science”, to my knowledge.) For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied.

Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!

A given print exposé exploring the distortions of fact on one side or the other routinely showers wild praise on their side’s—their science’s and their policy’s—objectivity, their adherence to the facts, just the facts. How impressed might we be with the text or the group that admitted to its own biases? Continue reading

Categories: 4 years ago!, junk science, Objectivity, Statistics |

## Stephen Senn: Randomization, ratios and rationality: rescuing the randomized clinical trial from its critics

.

Stephen Senn
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

This post first appeared here. An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say:

Even if there is only a small probability that an individual factor is unbalanced, given that there are indefinitely many possible confounding factors, then it would seem to follow that the probability that there is some factor on which the two groups are unbalanced (when remember randomly constructed) might for all anyone knows be high. (Worrall J. What evidence is evidence-based medicine? Philosophy of Science 2002; 69: S316-S330: see p. S324 )

It seems to me, however, that this overlooks four matters. The first is that it is not indefinitely many variables we are interested in but only one, albeit one we can’t measure perfectly. This variable can be called ‘outcome’. We wish to see to what extent the difference observed in outcome between groups is compatible with the idea that chance alone explains it. The indefinitely many covariates can help us predict outcome but they are only of interest to the extent that they do so. However, although we can’t measure the difference we would have seen in outcome between groups in the absence of treatment, we can measure how much it varies within groups (where the variation cannot be due to differences between treatments). Thus we can say a great deal about random variation to the extent that group membership is indeed random. Continue reading

Categories: RCTs, S. Senn, Statistics |

## Will the Real Junk Science Please Stand Up? (critical thinking)

Equivocations about “junk science” came up in today’s “critical thinking” class; if anything, the current situation is worse than 2 years ago when I posted this.

Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied. Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!

A given print exposé exploring the distortions of fact on one side or the other routinely showers wild praise on their side’s—their science’s and their policy’s—objectivity, their adherence to the facts, just the facts. How impressed might we be with the text or the group that admitted to its own biases?

Take, say, global warming, genetically modified crops, electric-power lines, medical diagnostic testing. Group 1 alleges that those who point up the risks (actual or potential) have a vested interest in construing the evidence that exists (and the gaps in the evidence) accordingly, which may bias the relevant science and pressure scientists to be politically correct. Group 2 alleges the reverse, pointing to industry biases in the analysis or reanalysis of data and pressures on scientists doing industry-funded work to go along to get along.

When the battle between the two groups is joined, issues of evidence—what counts as bad/good evidence for a given claim—and issues of regulation and policy—what are “acceptable” standards of risk/benefit—may become so entangled that no one recognizes how much of the disagreement stems from divergent assumptions about how models are produced and used, as well as from contrary stands on the foundations of uncertain knowledge and statistical inference. The core disagreement is mistakenly attributed to divergent policy values, at least for the most part. Continue reading

Categories: critical thinking, junk science, Objectivity |

## Stephen Senn: Randomization, ratios and rationality: rescuing the randomized clinical trial from its critics

Stephen Senn
Head of the Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS), Luxembourg

An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say:

Even if there is only a small probability that an individual factor is unbalanced, given that there are indefinitely many possible confounding factors, then it would seem to follow that the probability that there is some factor on which the two groups are unbalanced (when remember randomly constructed) might for all anyone knows be high. (Worrall J. What evidence is evidence based medicine. Philosophy of Science 2002; 69: S316-S330: see page S324 )

It seems to me, however, that this overlooks four matters. The first is that it is not indefinitely many variables we are interested in but only one, albeit one we can’t measure perfectly. This variable can be called ‘outcome’. We wish to see to what extent the difference observed in outcome between groups is compatible with the idea that chance alone explains it. The indefinitely many covariates can help us predict outcome but they are only of interest to the extent that they do so. However, although we can’t measure the difference we would have seen in outcome between groups in the absence of treatment, we can measure how much it varies within groups (where the variation cannot be due to differences between treatments). Thus we can say a great deal about random variation to the extent that group membership is indeed random.

The second point is that in the absence of a treatment effect, where randomization has taken place, the statistical theory predicts probabilistically how the variation in outcome between groups relates to the variation within. Continue reading

Categories: Statistics |

## PhilStatLaw: “Let’s Require Health Claims to Be ‘Evidence Based'” (Schachtman)

I see that Nathan Schachtman has had many interesting posts during the time I was away.  His recent post endorses the idea of “a hierarchy of evidence”–but philosophers of “evidence-based” medicine generally question or oppose it, at least partly because of disagreement as to where to place RCTs in the hierarchy.  What do people think?

Litigation arising from the FDA’s refusal to approval “health claims” for foods and dietary supplements is a fertile area for disputes over the interpretation of statistical evidence.  A ‘‘health claim’’ is ‘‘any claim made on the label or in labeling of a food, including a dietary supplement, that expressly or by implication … characterizes the relationship of any substance to a disease or health-related condition.’’ 21 C.F.R. § 101.14(a)(1); see also 21 U.S.C. § 343(r)(1)(A)-(B).

Unlike the federal courts exercising their gatekeeping responsibility, the FDA has committed to pre-specified principles of interpretation and evaluation. By regulation, the FDA gives notice of standards for evaluating complex evidentiary displays for the ‘‘significant scientific agreement’’ required for approving a food or dietary supplement health claim.  21 C.F.R. § 101.14.  SeeFDA – Guidance for Industry: Evidence-Based Review System for the Scientific Evaluation of Health Claims – Final (2009).

If the FDA’s refusal to approve a health claim requires pre-specified criteria of evaluation, then we should be asking ourselves why have the federal courts failed to develop a set of criteria for evaluating health effects claims as part of its Rule 702 (“Daubert“) gatekeeping responsibilities.  Why, after close to 20 years after the Supreme Court decided Daubert, can lawyers make “health claims” without having to satisfy evidence-based criteria?

Categories: philosophy of science, Statistics |

## Excerpts from S. Senn’s Letter on “Replication, p-values and Evidence,”

Dear Reader:  I am typing in some excerpts from a letter Stephen Senn shared with me in relation to my April 28, 2012 blogpost.  It is a letter to the editor of Statistics in Medicine  in response to S. Goodman. It contains several important points that get to the issues we’ve been discussing, and you may wish to track down the rest of it. Sincerely, D. G. Mayo

Statist. Med. 2002; 21:2437–2444  https://errorstatistics.files.wordpress.com/2013/12/goodman.pdf

STATISTICS IN MEDICINE, LETTER TO THE EDITOR

A comment on replication, p-values and evidence: S.N. Goodman, Statistics in Medicine 1992; 11:875–879

From: Stephen Senn*

Some years ago, in the pages of this journal, Goodman gave an interesting analysis of ‘replication probabilities’ of p-values. Specifically, he considered the possibility that a given experiment had produced a p-value that indicated ‘significance’ or near significance (he considered the range p=0.10 to 0.001) and then calculated the probability that a study with equal power would produce a significant result at the conventional level of significance of 0.05. He showed, for example, that given an uninformative prior, and (subsequently) a resulting p-value that was exactly 0.05 from the first experiment, the probability of significance in the second experiment was 50 per cent. A more general form of this result is as follows. If the first trial yields p=α then the probability that a second trial will be significant at significance level α (and in the same direction as the first trial) is 0.5. Continue reading

Categories: Statistics |

## Objectivity #1. Will the Real Junk Science Please Stand Up?

Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied.

Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!

A given print exposé exploring the distortions of fact on one side or the other routinely showers wild praise on their side’s—their science’s and their policy’s—objectivity, their adherence to the facts, just the facts. How impressed might we be with the text or the group that admitted to its own biases?

Take, say, global warming, genetically modified crops, electric-power lines, medical diagnostic testing. Group 1 alleges that those who point up the risks (actual or potential) have a vested interest in construing the evidence that exists (and the gaps in the evidence) accordingly, which may bias the relevant science and pressure scientists to be politically correct. Group 2 alleges the reverse, pointing to industry biases in the analysis or reanalysis of data and pressures on scientists doing industry-funded work to go along to get along.

When the battle between the two groups is joined, issues of evidence—what counts as bad/good evidence for a given claim—and issues of regulation and policy—what are “acceptable” standards of risk/benefit—may become so entangled that no one recognizes how much of the disagreement stems from divergent assumptions about how models are produced and used, as well as from contrary stands on the foundations of uncertain knowledge and statistical inference. The core disagreement is mistakenly attributed to divergent policy values, at least for the most part.

Over the years I have tried my hand in sorting out these debates (e.g., Mayo and Hollander 1991). My account of testing actually came into being to systematize reasoning from statistically insignificant results in evidence based risk policy: no evidence of risk is not evidence of no risk! (see October 5). Unlike the disputants who get the most attention, I have argued that the current polarization cries out for critical or meta-scientific scrutiny of the uncertainties, assumptions, and risks of error that are part and parcel of the gathering and interpreting of evidence on both sides. Unhappily, the disputants tend not to welcome this position—and are even hostile to it.  This used to shock me when I was starting out—why would those who were trying to promote greater risk accountability not want to avail themselves of ways to hold the agencies and companies responsible when they bury risks in fallacious interpretations of statistically insignificant results?  By now, I am used to it.

This isn’t to say that there’s no honest self-scrutiny going on, but only that all sides are so used to anticipating conspiracies of bias that my position is likely viewed as yet another politically motivated ruse. So what we are left with is scientific evidence having less and less a role in constraining or adjudicating disputes. Even to suggest an evidential adjudication risks being attacked as a paid insider.

I agree with David Michaels (2008, 61) that “the battle for the integrity of science is rooted in issues of methodology,” but winning the battle would demand something that both sides are increasingly unwilling to grant. It comes as no surprise that some of the best scientists stay as far away as possible from such controversial science.

Mayo,D. and Hollander. R. (eds.). 1991. Acceptable Evidence: Science and Values in Risk Management, Oxford.

Mayo. 1991. Sociological versus Metascientific Views of Risk Assessment, in D. Mayo and R. Hollander (eds.), Acceptable Evidence: 249-79.

Michaels, D. 2008. Doubt Is Their Product, Oxford.

Categories: Objectivity, Statistics |