Part 2 (of 2) (Please begin with part 1)
The first thing one wants to know about a search method is what it is searching for, what would count as getting it right. One might want to estimate a probability distribution, or get correct forecasts of some probabilistic function of the distribution (e.g., out-of-sample means), or a causal structure, or some probabilistic function of the distribution resulting from some class of interventions. Secondly, one wants to know about what decision theorists call a loss function, but less precisely, what is the comparative importance of various errors of measurement, or, in other terms, what makes some approximations better than others. Third, one wants a limiting consistency proof: sufficient conditions for the search to reach the goal in the large sample limit. There are various kinds of consistency—pointwise versus uniform for example—and one wants to know which of those, if any, hold for a search method under what assumptions about the hypothesis space and the sampling distribution. Fourth, one wants to know as much as possible about the behavior of the search method on finite samples. In simple cases of statistical estimation there are analytic results; more often for search methods only simulation results are possible, but if so, one wants them to explore the bounds of failure, not just easy cases. And, of course, one wants a rationale for limiting the search space, as well as, some sense of how wrong the search can be if those limits are violated in various ways.
There are other important economic features of search procedures. Probability distributions (or likelihood functions) can instantiate any number of constraints—vanishing partial correlations for example, or inequalities of correlations. Suppose the hypothesis space delimits some big class of probability distributions. Suppose the search proceeds by testing constraints (the points that follow apply as well if the procedure computes posterior probabilities for particular hypotheses and applies a decision rule.) There is a natural partial ordering of classes of constraints: B is weaker than A if and only if every distribution that satisfies class A satisfies class B. Other things equal, a weakest class might be preferred because it requires fewer tests. But more important is what the test of a constraint does in efficiently guiding the search. A test that eliminates a particular hypothesis is not much help. A test that eliminates a big class of hypotheses is a lot of help.
Other factors: the power of the requisite tests; the numbers of tests (or posterior probability assessments) required; the computational requirements of individual tests (or posterior probability assessments.) And so on. And, finally, search algorithms have varying degrees of generality. For example, there are general algorithms, such as the widely used PC search algorithm for graphical causal models, that are essentially search schema: stick in whatever decision procedure for conditional independence and PC becomes a search procedure using that conditional independence oracle. By contrast, some searches are so embedded in a particular hypothesis space that it is difficult to see the generality.
I am sure I am not qualified to comment on the details of Hendry’s search procedure, and even if I were, for reasons of space his presentation is too compressed for that. Still, I can make some general remarks. I do not know from his essay the answers to many of the questions pertinent to evaluating a search procedure that I raised above. For example, his success criterion is “congruence” and I have no idea what that is. That is likely my fault, since I have read only one of his books, and that long ago.
David Hendry dismisses “priors,” meaning, I think, Bayesian methods, with an argument from language acquisition. Kids don’t need priors to learn a language. I am not sure of Hendry’s logic. Particular grammars within a parametric “universal grammar” could in principle be learned by a Bayesian procedure, although I have no reason to think they are. But one way or the other, that has no import for whether Bayesian procedures are the most advantageous for various search problems by any of the criteria I have noted above. Sometimes they may be, sometimes not, there is no uniform answer, in part because computational requirements vary. I could give examples, but space forbids.
Abstractly, one could think there are two possible ways of searching when the set of relationships to be uncovered may form a complex web: start by positing all possible relationships and eliminate from there, or start by positing no relationships and build up. Hendry dismisses the latter, with what generality I do not know. What I do know is that the relations between “bottom-up” and “top-down” or “forward” and “backward” search can be intricate, and in some cases one may need both for consistency. Sometimes either will do. Graphical models, for example can be searched starting with the assumption that every variable influences every other and eliminating, or starting with the assumption that no variable influences any other and adding. There are pointwise consistent searches in both directions. The real difference is in complexity.
Finally, I am struck by a substantive assumption that seems to vary with discipline, and I am not sure why. Like Spanos, from whom I learned what little I know of econometrics, Hendry wants to Normalize distributions, at least enough so that tests of hypotheses based on the Normal distribution assumption can be used. In linear systems, when searching for causal relations Normal distributions are the worst case, not the best. In non-Gaussian distributions for linear systems higher moments provide information about causal direction that cannot be recovered from Gaussian distributions. The residuals and the recorded variables in psychological studies with magnetic resonance imaging time series are not Gaussian, and that fact has proved critical in reliably estimating directions of influence.
In general, I think it is hard to make sound informative generalizations about search strategies. Just as with ordinary statistical estimation, appropriateness or optimality (by whatever criteria) judgements require a careful analysis of the very structure of the search problem. Hendry’s generalizations may be right about search in econometrics, and indeed about some other domains as well, but deciding one way or the other, would require a very different forum than this. Until then, I congratulate him for helping defeat the dogmas of the context distinction, and urge him to recognize that he has done that very thing.
*Hendry, D. (2011) “Empirical Economic Model Discovery and Theory Evaluation“, in Rationality, Markets and Morals, Volume 2 Special Topic: Statistical Science and Philosophy of Science, Edited by Deborah G. Mayo, Aris Spanos and Kent W. Staley: 115-145.