Judea Pearl* wrote to me to invite readers of *Error Statistics Philosophy* to comment on a recent post of his (from his Causal Analysis blog here) pertaining to a guest post by Stephen Senn (“Being a Statistician Means never Having to Say You Are Certain”.) He has added a special addendum for us.[i]

Challenging the Hegemony of Randomized Controlled Trials: Comments on Deaton and Cartwright

Judea PearlI was asked to comment on a recent article by Angus Deaton and Nancy Cartwright (D&C), which touches on the foundations of causal inference. The article is titled: “Understanding and misunderstanding randomized controlled trials,” and can be viewed here: https://goo.gl/x6s4Uy

My comments are a mixture of a welcome and a puzzle; I welcome D&C’s stand on the status of randomized trials, and I am puzzled by how they choose to articulate the alternatives.

D&C’s main theme is as follows: “We argue that any special status for RCTs is unwarranted. Which method is most likely to yield a good causal inference depends on what we are trying to discover as well as on what is already known.” (Quoted from their introduction)

As a veteran challenger of the supremacy of the RCT, I welcome D&C’s challenge wholeheartedly. Indeed, “The Book of Why” (forthcoming, may 2018, http://bayes.cs.ucla.edu/WHY/) quotes me as saying:

If our conception of causal effects had anything to do with randomized experiments, the latter would have been invented 500 years before Fisher.

In this, as well as in my other writings I go so far as claiming that the RCT earns its legitimacy by mimicking the do-operator, not the other way around. In addition, considering the practical difficulties of conducting an ideal RCT, observational studies have a definite advantage: they interrogate populations at their natural habitats, not in artificial environments choreographed by experimental protocols.

Deaton and Cartwright’s challenge of the supremacy of the RCT consists of two parts:

- The first (
internal validity) deals with the curse of dimensionality and argues that, in any single trial, the outcome of the RCT can be quite distant from the target causal quantity, which is usually the average treatment effect (ATE). In other words, this part concerns imbalance due to finite samples, and reflects the traditional bias-precision tradeoff in statistical analysis and machine learning.- The second part (
external validity) deals with biases created by inevitable disparities between the conditions and populations under study versus those prevailing in the actual implementation of the treatment program or policy. Here, Deaton and Cartwright propose alternatives to RCT, calling all out for integrating a web of multiple information sources, including observational, experimental, quasi-experimental, and theoretical inputs, all collaborating towards the goal of estimating “what we are trying to discover”.My only qualm with D&C’s proposal is that, in their passion to advocate the integration strategy, they have failed to notice that, in the past decade, a formal theory of integration strategies has emerged from the brewery of causal inference and is currently ready and available for empirical researchers to use. I am referring of course to the theory of Data Fusion which formalizes the integration scheme in the language of causal diagrams, and provides theoretical guarantees of feasibility and performance. (see http://www.pnas.org/content/pnas/113/27/7345.full.pdf )

Let us examine closely D&C’s main motto: “Which method is most likely to yield a good causal inference depends on what we are trying to discover as well as on what is already known.” Clearly, to cast this advice in practical settings, we must devise notation, vocabulary, and logic to represent “what we are trying to discover” as well as “what is already known” so that we can infer the former from the latter. To accomplish this nontrivial task we need tools, theorems and algorithms to assure us that what we conclude from our integrated study indeed follows from those precious pieces of knowledge that are “already known.” D&C are notably silent about the language and methodology in which their proposal should be carried out. One is left wondering therefore whether they intend their proposal to remain an informal, heuristic guideline, similar to Bradford Hill’s Criteria of the 1960’s, or be explicated in some theoretical framework that can distinguish valid from invalid inference? If they aspire to embed their integration scheme within a coherent framework, then they should celebrate; Such a framework has been worked out and is now fully developed.

To be more specific, the Data Fusion theory described in http://www.pnas.org/content/pnas/113/27/7345.full.pdf provides us with notation to characterize the nature of each data source, the nature of the population interrogated, whether the source is an observational or experimental study, which variables are randomized and which are measured and, finally, the theory tells us how to fuse all these sources together to synthesize an estimand of the target causal quantity at the target population. Moreover, if we feel uncomfortable about the assumed structure of any given data source, the theory tells us whether an alternative source can furnish the needed information and whether we can weaken any of the model’s assumptions.[i]

You can read the rest of Pearl’s original article here.

…..

Addendum to ” Challenging the Hegemony of RCTs”

March 11, 1018

—————–

Upon re-reading the post above I realized that I have assumed readers to be familiar with Data Fusion theory. This Addendum aims at readers who are not familiar with the theory, and who would probably be asking: “Who needs a new theory to do what statistics does so well?” “Once we recognize the importance of diverse sources of data, statistics can be helpful in making decisions and quantifying uncertainty.” [Quoted from Andrew Gelman’s blog]. The reason I question the sufficiency of statistics to manage the integration of diverse sources of data is that statistics lacks the vocabulary needed for the job. Let us demonstrate it in a couple of toy examples, taken from BP-2015

Example 1

————–

Suppose we wish to estimate the average causal effect of X on Y, and we have two diverse sources of data:(1) an RCT in which Z, not X, is randomized, and

(2) an observational study in which X Y and Z are measured.What substantive assumptions are needed to facilitate a solution to our problem? Put another way, how can be sure that, once we make those assumptions, we can solve our problem.

Example 2

————-

Suppose we wish to estimate the average causal effect ACE of X on Y, and we have two diverse sources of data:(1) an RCT in which the effect of X on both Y and Z is measured, but the recruited subjects had non-typical values of Z.

(2) an observational study conducted in the target population, in which both X and Z (but not Y) were measured.What substantive assumptions would enable us to estimate ACE, and how should we combine data from the two studies so as to synthesize a consistent estimate of ACE.

The nice thing about a toy example is that the solution is known to us in advance, and so, we can check any alternative solution for correctness. Curious readers can find the solutions for these two examples in http://ftp.cs.ucla.edu/pub/stat_ser/r450-reprint.pdf. More ambitious readers will probably try to solve them using statistic techniques, such as meta analysis or partial pooling. The reason I am confident that the second group will end up with disappointment comes from a profound statement made by Nancy Cartwright in 1989: “No Causes In, No Causes Out”. It means not only that you need substantive assumptions to derive causal conclusions; it also means that the vocabulary of statistical analysis, since it is built entirely on properties of distribution functions, is inadequate for expressing those substantive assumptions that are needed for getting causal conclusions.

In our examples, although part of the data is provided by an RCT, hence it is causal, one can still show that the needed assumptions must invoke causal vocabulary; distributional assumptions are insufficient. As someone versed in both graphical modeling and counterfactuals, I would go even further and state that it would be a miracle if anyone succeeds in translating the needed assumptions into a comprehensible language other than causal diagrams. (See http://ftp.cs.ucla.edu/pub/stat_ser/r452-reprint.pdf Appendix, Scenario 3.)

Armed with these examples and findings, we can go back and examine why D&C do not embrace the Data Fusion methodology in their quest for integrating diverse sources of data. The answer, I conjecture, is that D&C were not intimately familiar with what this methodology offers and how vastly different it is from previous attempts to operationalize Cartwright’s dictum: “No causes in, no causes out”.

[i] Pearl’s blog post, originally posted here, ends with the following; I hope that readers take him up on his invitation:

I would be very interested in seeing other readers reaction to D&C’s article, as well as to my optimistic assessment of what causal inference can do for us in this day and age. I have read the reactions of Andrew Gelman (on his blog) and Stephen J. Senn (on Deborah Mayo’s blog https://errorstatistics.com/2018/01/), but they seem to be unaware of the latest developments in Data Fusion analysis. I also invite Angus Deaton and Nancy Cartwright to share a comment or two on these issues. I hope they respond positively.

* Chancellor’s Professor of Computer Science and Statistics,

Director, Cognitive Systems Laboratory

University of California Los Angeles,

http://www.cs.ucla.edu/~judea/

http://bayes.cs.ucla.edu/csl_papers.html

Re: “RCT earns its legitimacy by mimicking the do-operator, not the other way around”

Is there a short definition of the do-operator? Sorry, busy with other business, a few seconds search on JP’s blog did not enlighten.

“Interventions and counterfactuals are defined through a mathematical operator called do(x), which simulates physical interventions by deleting certain functions from the model, replacing them with a constant X = x, while keeping the rest of the model unchanged.”

(I posted this reply with a link to arxiv preprint 1210.4852 yesterday, but it is awaiting moderation. Maybe because of the link, we’ll see…)

Carlos,

Thanks for looking that up. I’m still in do(ubt) what it says, but it seems to say that doing(x) in a model is a substitute for doing something in reality, in other words, that reality mimics the model rather than the other way around. That sounds like a fundamental (but very familiar) error to me. It calls to mind a statement of John Dewey, one I regard as calling attention to the reality of the information dimension —

“If doubt and indeterminateness were wholly within the mind — whatever that may signify — purely mental processes ought to get rid of them. But experimental procedure signifies that actual alteration of an external situation is necessary to effect the conversion. A

situationundergoes, through operations directed by thought, transition from problematic to settled, from internal discontinuity to coherency and organization.” (Dewey •The Quest for Certainty).☞ https://www.academia.edu/1266493/Interpretation_as_Action_The_Risk_of_Inquiry

I would say that doing(x) is used to represent in the model the fact that something is being done in reality. I’m not sure why do you think it’s an error. Don’t you find it worthwhile to have a way to distinguish in the model whether an association is the result of an intervention or an observation on the “natural” state of the system?

A blog commenter pointed me to the above post which includes the following remark by Pearl:

I think Pearl and I are in agreement. I don’t think that statistical data analysis is

sufficientto manage the integration of diverse sources of data. Substantive knowledge is needed too. What I wrote above (and what Pearl quoted) is that statisticscan be helpful, not that statistics is all that is needed.To put it another way, I don’t think it’s statistical or causal thinking; I think it’s statistical

andcausal thinking.I don’t see the need for a “painful transition from statistical to causal thinking.” Causal thinking is important, but casual thinking does not remove various purely statistical issues relating to sampling and measurement. Causal reasoning does not make statistical reasoning obsolete; it just makes statistical reasoning more relevant to real-world causal problems of interest.

I agree with Gelman.

Andrew does not see the need for a

“painful transition fro statistical to causal

thinking”. The fact that this transition has

been painful (in fact super-traumatic) to most

statisticians is attested by the facts that: (1) most

statisticians do not speak the language of causality,

(2) the overwhelming majority of statistics textbooks

do not have “causal” in their index, (3) most

statisticians cannot write a mathematical expression

for even the simplest causal relation, say that

“The rooster crow does not cause the sun to rise.”

Saying that “Causal reasoning does not make

statistical reasoning obsolete” is like saying

that calculus does not make arithmetics obsolete.

Of course it does not, but this would not justify

giving PhD degrees in mathematics to students who took

no class in calculus. Most PhD’s in statistics

do not take a class in causal reasoning, and most readers

on your and Deborah’s blogs, for example, would not understand a formula in

causal calculus (my conjecture).

True, the transition from statistical to causal thinking

should not be painful to those who decide to make it,

but it has been painful to the field as a

whole, which still provides no incentives to those who

have not made the transition yet, and who are totally bewildered

by recent discusions like the one by Deaton and Cartwright.

What these good people need is an incentive to study causal

modeling, not an assurance that statistics is not obsolete.

Judea

I appreciate Judea coming over to our blog, in seek of feedback. Judea seems very concerned that statistics has been delinquent “in neglecting causality” and his evidence for this is that causality isn’t reducible to probability distributions. My response to this (in our informal email exchange) is: I don’t see how statistical science can be said to be “delinquent …in neglecting causality” or particle physics or any number of other areas to which statistics is applied, on grounds that they aren’t reducible to formal “properties of distribution functions”. If you have a method for arriving at causal inferences that is highly capable of ruling out ways a causal claim may be false, then the claim passes with severity. If your method reliably blocks causal inferences whenever flaws haven’t been probed and ruled out, then it satisfies a minimal condition for warranted inference. It is methods, inferences & inquiries that are qualified probabilistically when one uses statistics to study problem X, it is not that X itself needs to be reducible to probability distributions. I suggested that he might be equating statistical inference with a type of Bayesian inference (where background knowledge must enter by way of prior distributions.)

Randomization is part of a philosophy of statistics that employs experimental design to create and control a probabilistic connection between data and claims of interest. If you don’t care about vouchsafing and using error probabilities you might not care about randomization, although plenty of default Bayesians are keen to find ways to justify deeming randomization of value.

What Judea mentions is a formal system for articulate causal assumptions and making causal claims. you can’t do that with statistics alone, that’s a formal statement that can be easily proved. You can’t have a method to reach causal claims without causal assumptions, and since we use these causal assumptions everyday, we need an efficient and formal language to encode snd discuss them.

Statistical method is relevant for inquiring into many domains without itself supplying all the background knowledge for the inquiry. It’s absurd to spoze otherwise.

You can’t have a method that applies to particle physics without assumptions about particle physics. So what? A truism.

By the way, I’m not saying one way or another whether Pearl’s causal fusion method is good–it might be great, and may offer the clearest way to represent certain causal claims & emulate experimental causal inquiry when that’s not possible. But I agree with Senn’s very important retort to the criticisms of randomization.

Mayo, your comment doesn’t answer the following question, however: after you randomized and have results from population A, which assumptions do you need to have to allow you transporting these results to population B? How can you tell if those assumptions are sufficient? Or necessary? Or what is the correct formula for transportation of the results?

David Cox and Nancy Cartwright were both contributors to a conference I ran in 2010, and Cox wrote something (for background for the conference) special to address her concerns about this. i think it is well known that randomization enables learning about causes in the experimental population, and that unless it’s a random sample from some larger population, it’s fallacious to make that inference. However, learning about causes in the experimental population is extremely important.

There are two trivial answers for transportability: we can either claim the populations are equal (or like you said, one is a random sample of the other), and results are trivially transportable; or we can claim they are different in every aspect, so results are trivially not transportable. I think we can agree this adds nothing to our current knowledge of external validity. Now let us consider the interesting cases, where the populations are equal in some aspects and different in others. Then my previous question remains: what are the assumptions that allows one to transport results from A to B? When it is transportable, how can we find the correct adjustment to be done?

Deborah,

You have to decide if causality resides within

the province of statistics proper, or outside it.

Your analogy of statistics and partilce physics

suggests that you view causality as residing outside

statistics, but then I have hard time understanding

why all revered statisticians, from Pearson to

Neyman to Dempster, to Freedman, to Cox felt the need to

claim that they have something to say about causality.

There are many times more papers titled: “A statistician’s

view on causality” than “A statistician’s view on

particle physics”. Why?

If nothing else, the public and the scientific community

as a whole, expect statisticians to say something

meaningful about causal relations, especially when

we wish such relations to be supported by data.

Can they?

Please inspect your most favorite statistics textbook;

does it have “causal” in the index?

What do you call this kind of negligence if not deliquent?

I hope the Book of Why will change some of it

http://bayes.cs.ucla.edu/WHY/

The idea of “inside/outside” when it comes to statistics (or philosophy, for that matter) is silly. Statistical science, while it has a theoretical side, is applied, and the inquiries to which it is applied needn’t be “reducible” to probability distributions. Insofar as statistical causality is involved, statistical methods are relevant to control, assess and model causal flaws.

Mayo, I don’t understand your position though. Do you think it’s necessary to have a formal language to discuss causality? Or should we just discuss things informally in science, when talking about cause and effect? When someone says we can transport experimental result from population A to population B, how are we to discipline the conversation and understand: (a) whether the assumptions that person is giving are sufficient/necessary for the task; (b) after establishing that, how to judge whether the assumptions are warranted; (c) how to properly adjust after we conclude it’s possible; (d) decide where to gather more information, if we judge it’s not possible. And so on. This has been formalized, and I don’t understand if what you are saying is that you believe the formalization is not necessary, and scientists should keep discussing these things informally.

John: I was just responding to Pear’s charge that statistical science was “delinquent” and unable to help with causal inquiries. The fact that a scientific domain isn’t reducible to formal statistics doesn’t entail that statistical methods aren’t relevant for inquiry into that domain.

Ok, with that I think we all agree — statistics can help us tell signal from noise in small samples, experimental design literature can help us design efficient experiments and so on. What I believe Pearl means is that associational (aka statistical) language does not give us a proper semantics for discussing causality, encoding causal assumptions and formally/efficiently discussing the logical consequences of these assumptions.

Deborah,

The idea of “inside/outside” when it comes to statistics

and causality would indeed be silly were statisticians

not to claim authoritative leadership, if not sole ownership over

causal inference. But from the days of Pearson, statisticians have

insisted on such ownership, as is attested

for examples in their vicious attacks on Sewall Wright,

starting by Niles in the 1920’s and ending

with Karlin in 1983 (See chapter 2 of The Book of

Why http://bayes.cs.ucla.edu/WHY/).

Or take Berkeley’s sex discrimination in admission, as another example.

Who did Berkeley’s Dean call for help

when Simpson’s paradox struck Berkeley’s admission records?

He called Peter Bickel, a statistician. He did not call

a logician, or an educator, or a philosopher. And to this very

day, who is considered most qualified to comment on

Deaton and Cartwright paper on RCT? Statisticians,

of course, even though the issues involved are

causal: the target quantity to be estimated is

causal (i.e., “the causal effect”), hence the methods required

for deciding whether RCT brings us any closer to our target

are causal.

Given this historical background, I think you would agree that

statisticians clinging to causation cannot be

compared to their peripheral interest in particle physics.

There is a qualitative difference between the two.

Now, given this clinging, I think you would also

agree that my use of the term “negligence” is not

an exaggeration. Those who cling to A, and see A

as part of their professional responsibility,

should learn to talk the langauge of A.

I am not speaking about reducting one language to another;

I am speaking about learning the universal language of

causation which transcends specific applications and

which, like Boolean calculus, needs to be mastered if you want

to speak logic.

Judea

People used to do things without formal probability as well, ignorance is not a valid argument whatsoever.

Oliver’s claim, however, is incorrect, physics uses structural quantities all the time, that’s why they can tell correlation from causation, not due to passive observation of data.

Now let me ask you a simple question: exactly how you would teach a machine to reason about cause and effect without a formal language?

Here’s what David Colquhoun wrote a few years ago about randomization (which, by the way, is our palindrome word for March):

http://www.dcscience.net/2011/10/28/why-philosophy-is-largely-ignored-by-science/

Having long since decided that it was Fisher, rather than philosophers, who had the answers to my questions, why bother to write about philosophers at all? It was precipitated by joining the London Evidence Group. Through that group I became aware that there is a group of philosophers of science who could, if anyone took any notice of them, do real harm to research. It seems surprising that the value of randomisation should still be disputed at this stage, and of course it is not disputed by anybody in the business.

It was thoroughly established after the start of small sample statistics at the beginning of the 20th century. Fisher’s work on randomisation and the likelihood principle put inference on a firm footing by the mid-1930s. His popular book, The Design of Experiments made the importance of randomisation clear to a wide audience, partly via his famous example of the lady tasting tea. The development of randomisation tests made it transparently clear (perhaps I should do a blog post on their beauty). By the 1950s. the message got through to medicine, in large part through Austin Bradford Hill.

Despite this, there is a body of philosophers who dispute it. And of course it is disputed by almost all practitioners of alternative medicine (because their treatments usually fail the tests). Here are some examples.

“Why there’s no cause to randomise” is the rather surprising title of a report by Worrall (2004; see also Worrall, 2010), from the London School of Economics. The conclusion of this paper is

“don’t believe the bad press that ‘observational studies’ or ‘historically controlled trials’ get – so long as they are properly done (that is, serious thought has gone in to the possibility of alternative explanations of the outcome), then there is no reason to think of them as any less compelling than an RCT.”

In my view this conclusion is seriously, and dangerously, wrong –it ignores the enormous difficulty of getting evidence for causality in real life, and it ignores the fact that historically controlled trials have very often given misleading results in the past, as illustrated by the diet problem.. Worrall’s fellow philosopher, Nancy Cartwright (Are RCTs the Gold Standard?, 2007), has made arguments that in some ways resemble those of Worrall.

Many words are spent on defining causality but, at least in the clinical setting the meaning is perfectly simple. If the association between eating bacon and colorectal cancer is causal then if you stop eating bacon you’ll reduce the risk of cancer. If the relationship is not causal then if you stop eating bacon it won’t help at all. No amount of Worrall’s “serious thought” will substitute for the real evidence for causality that can come only from an RCT: Worrall seems to claim that sufficient brain power can fill in missing bits of information. It can’t.

@Mayo Thanks for reposting that. It was written in 2011, and I wouldn’t change it substantially now.

Pearl sent me this response to Colquhoun:

David,

No one will dispute the fact that, if textbook RCT’s

can be performed, they provide a indisputable evidence for

causality.

But lets not lose sight of the fact that it was

“brain power”, not RCTs that provided the decisive

connection between smoking and lung cancer in the 1960s.

It was Cornfield sensitivity argument about how implausibly

strong a “tobacco gene” would have to be to explain the

data that convinced the public and the experts,

not Fisherian experiments. plausibility judgment

is none other but “brain power.” If we accepted this

brain power in 1960, why not today?

My argument with Deaton and Cartwright addresses the

imperfections in RCTs. Can observational studies

(on heterogeneous sources) assist us in mitigating those

imperfection?

I am curious to hear your opinion on this question,

because observational studies rest on subject matter

assumptions and such assumptions come from

“brain power”, not data.

So how do we manage imperfections?

Guilty as charged! I am woefully ignorant of fusion analysis but look forward to learning about it. However, I am surprised to see Professor Pearl repeat without comment Cartwright and Deaton’s confounders claim since I dealt with it. If I maybe permitted a little joke it’s a problem that people ‘see’ very readily and are then stuck with because they don’t ‘do’ anything to understand it. If they tried to simulate the problem they would soon realise that 1) it is the total joint effect of confounders that matters 2) there is one such total not indefinitely many 3) the effect of this is bounded 4) it effects not only between but within variability. Thus Fisher’s prescription is seen to be rather more profound than many suppose.