Judea Pearl* wrote to me to invite readers of Error Statistics Philosophy to comment on a recent post of his (from his Causal Analysis blog here) pertaining to a guest post by Stephen Senn (“Being a Statistician Means never Having to Say You Are Certain”.) He has added a special addendum for us.[i]
I was asked to comment on a recent article by Angus Deaton and Nancy Cartwright (D&C), which touches on the foundations of causal inference. The article is titled: “Understanding and misunderstanding randomized controlled trials,” and can be viewed here: https://goo.gl/x6s4Uy
My comments are a mixture of a welcome and a puzzle; I welcome D&C’s stand on the status of randomized trials, and I am puzzled by how they choose to articulate the alternatives.
D&C’s main theme is as follows: “We argue that any special status for RCTs is unwarranted. Which method is most likely to yield a good causal inference depends on what we are trying to discover as well as on what is already known.” (Quoted from their introduction)
As a veteran challenger of the supremacy of the RCT, I welcome D&C’s challenge wholeheartedly. Indeed, “The Book of Why” (forthcoming, may 2018, http://bayes.cs.ucla.edu/WHY/) quotes me as saying:
If our conception of causal effects had anything to do with randomized experiments, the latter would have been invented 500 years before Fisher.
In this, as well as in my other writings I go so far as claiming that the RCT earns its legitimacy by mimicking the do-operator, not the other way around. In addition, considering the practical difficulties of conducting an ideal RCT, observational studies have a definite advantage: they interrogate populations at their natural habitats, not in artificial environments choreographed by experimental protocols.
Deaton and Cartwright’s challenge of the supremacy of the RCT consists of two parts:
- The first (internal validity) deals with the curse of dimensionality and argues that, in any single trial, the outcome of the RCT can be quite distant from the target causal quantity, which is usually the average treatment effect (ATE). In other words, this part concerns imbalance due to finite samples, and reflects the traditional bias-precision tradeoff in statistical analysis and machine learning.
- The second part (external validity) deals with biases created by inevitable disparities between the conditions and populations under study versus those prevailing in the actual implementation of the treatment program or policy. Here, Deaton and Cartwright propose alternatives to RCT, calling all out for integrating a web of multiple information sources, including observational, experimental, quasi-experimental, and theoretical inputs, all collaborating towards the goal of estimating “what we are trying to discover”.
My only qualm with D&C’s proposal is that, in their passion to advocate the integration strategy, they have failed to notice that, in the past decade, a formal theory of integration strategies has emerged from the brewery of causal inference and is currently ready and available for empirical researchers to use. I am referring of course to the theory of Data Fusion which formalizes the integration scheme in the language of causal diagrams, and provides theoretical guarantees of feasibility and performance. (see http://www.pnas.org/content/pnas/113/27/7345.full.pdf )
Let us examine closely D&C’s main motto: “Which method is most likely to yield a good causal inference depends on what we are trying to discover as well as on what is already known.” Clearly, to cast this advice in practical settings, we must devise notation, vocabulary, and logic to represent “what we are trying to discover” as well as “what is already known” so that we can infer the former from the latter. To accomplish this nontrivial task we need tools, theorems and algorithms to assure us that what we conclude from our integrated study indeed follows from those precious pieces of knowledge that are “already known.” D&C are notably silent about the language and methodology in which their proposal should be carried out. One is left wondering therefore whether they intend their proposal to remain an informal, heuristic guideline, similar to Bradford Hill’s Criteria of the 1960’s, or be explicated in some theoretical framework that can distinguish valid from invalid inference? If they aspire to embed their integration scheme within a coherent framework, then they should celebrate; Such a framework has been worked out and is now fully developed.
To be more specific, the Data Fusion theory described in http://www.pnas.org/content/pnas/113/27/7345.full.pdf provides us with notation to characterize the nature of each data source, the nature of the population interrogated, whether the source is an observational or experimental study, which variables are randomized and which are measured and, finally, the theory tells us how to fuse all these sources together to synthesize an estimand of the target causal quantity at the target population. Moreover, if we feel uncomfortable about the assumed structure of any given data source, the theory tells us whether an alternative source can furnish the needed information and whether we can weaken any of the model’s assumptions.[i]
You can read the rest of Pearl’s original article here.
Addendum to ” Challenging the Hegemony of RCTs”
March 11, 1018
Upon re-reading the post above I realized that I have assumed readers to be familiar with Data Fusion theory. This Addendum aims at readers who are not familiar with the theory, and who would probably be asking: “Who needs a new theory to do what statistics does so well?” “Once we recognize the importance of diverse sources of data, statistics can be helpful in making decisions and quantifying uncertainty.” [Quoted from Andrew Gelman’s blog]. The reason I question the sufficiency of statistics to manage the integration of diverse sources of data is that statistics lacks the vocabulary needed for the job. Let us demonstrate it in a couple of toy examples, taken from BP-2015
Suppose we wish to estimate the average causal effect of X on Y, and we have two diverse sources of data:
(1) an RCT in which Z, not X, is randomized, and
(2) an observational study in which X Y and Z are measured.
What substantive assumptions are needed to facilitate a solution to our problem? Put another way, how can be sure that, once we make those assumptions, we can solve our problem.
Suppose we wish to estimate the average causal effect ACE of X on Y, and we have two diverse sources of data:
(1) an RCT in which the effect of X on both Y and Z is measured, but the recruited subjects had non-typical values of Z.
(2) an observational study conducted in the target population, in which both X and Z (but not Y) were measured.
What substantive assumptions would enable us to estimate ACE, and how should we combine data from the two studies so as to synthesize a consistent estimate of ACE.
The nice thing about a toy example is that the solution is known to us in advance, and so, we can check any alternative solution for correctness. Curious readers can find the solutions for these two examples in http://ftp.cs.ucla.edu/pub/stat_ser/r450-reprint.pdf. More ambitious readers will probably try to solve them using statistic techniques, such as meta analysis or partial pooling. The reason I am confident that the second group will end up with disappointment comes from a profound statement made by Nancy Cartwright in 1989: “No Causes In, No Causes Out”. It means not only that you need substantive assumptions to derive causal conclusions; it also means that the vocabulary of statistical analysis, since it is built entirely on properties of distribution functions, is inadequate for expressing those substantive assumptions that are needed for getting causal conclusions.
In our examples, although part of the data is provided by an RCT, hence it is causal, one can still show that the needed assumptions must invoke causal vocabulary; distributional assumptions are insufficient. As someone versed in both graphical modeling and counterfactuals, I would go even further and state that it would be a miracle if anyone succeeds in translating the needed assumptions into a comprehensible language other than causal diagrams. (See http://ftp.cs.ucla.edu/pub/stat_ser/r452-reprint.pdf Appendix, Scenario 3.)
Armed with these examples and findings, we can go back and examine why D&C do not embrace the Data Fusion methodology in their quest for integrating diverse sources of data. The answer, I conjecture, is that D&C were not intimately familiar with what this methodology offers and how vastly different it is from previous attempts to operationalize Cartwright’s dictum: “No causes in, no causes out”.
[i] Pearl’s blog post, originally posted here, ends with the following; I hope that readers take him up on his invitation:
I would be very interested in seeing other readers reaction to D&C’s article, as well as to my optimistic assessment of what causal inference can do for us in this day and age. I have read the reactions of Andrew Gelman (on his blog) and Stephen J. Senn (on Deborah Mayo’s blog https://errorstatistics.com/2018/01/), but they seem to be unaware of the latest developments in Data Fusion analysis. I also invite Angus Deaton and Nancy Cartwright to share a comment or two on these issues. I hope they respond positively.
* Chancellor’s Professor of Computer Science and Statistics,
Director, Cognitive Systems Laboratory
University of California Los Angeles,