*Note: I find this to be an intriguing, if perhaps little-known, discussion, long before the conflicts reflected in the three articles (the “triad”) below, Here Fisher links his tests to the Neyman and Pearson lemma in terms of power. I invite your deconstructions/comments.*

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

To Thomas Bayes must be given the credit of broaching the problem of using the concepts of mathematical probability in discussing problems of inductive inference, in which we argue from the particular to the general; or, in statistical phraselogy, argue from the sample to the population, from which, *ex hypothesi*, the sample was drawn. Bayes put forward, with considerable caution, a method by which such problems could be reduced to the form of problems of probability. His method of doing this depended essentially on postulating *a priori *knowledge, not of the particular population of which our observations form a sample, but of an imaginary population of populations from which this population was regarded as having been drawn at random. Clearly, if we have possession of such *a priori* knowledge, our problem is not properly an inductive one at all, for the population under discussion is then regarded merely as a particular case of a general type, of which we already possess exact knowledge, and are therefore in a position to draw exact deductive inferences.

To the merit of broaching a fundamentally important problem, Bayes added that of perceiving, much more clearly than some of his followers have done, the logical weakness of the form of solution he put forward. Indeed we are told that it was his doubts respecting the validity of the postulate needed for establishing the method of inverse probability that led to his withholding his entire treatise from publication. Actually it was not published until after his death…(285-6)

As an axiom this supposition of Bayes fails, since the truth of an axiom should be manifest to all who clearly apprehend its meaning and to many writers, including, it would seem, Bayes himself, the truth of the supposed axiom has not been apparent. It has, however been frequently pointed out that, even if our assumed form for [the prior density] be somewhat inaccurate, our conclusions, if based on a considerable sample of observations, will not greatly be affected; and, indeed, subject to certain restrictions as to the true form of [the prior], it may be shown that our errors from this cause will tend to zero as the sample of observations is increased indefinitely. The conclusions drawn will depend more and more entirely on the facts observed, and less and less upon the supposed knowledge *a priori* introduced into the argument. This property of increasingly large samples has been sometimes put forward as a reason for accepting the postulate of knowledge *a priori*. It appears, however, more natural to infer from it that it should be possible to draw valid conclusions from the data alone, and without *a priori* assumptions. If the justification for any particular form of [prior for θ] is merely that it makes no difference whether the form is right or wrong, we may well ask what the expression is doing in our reasoning at all, and whether, if it were altogether omitted, we could not without its aid draw whatever inferences may, with validity, be inferred from the data. In particular we may question whether the whole difficulty has not arisen in an attempt to express in terms of the single concept of mathematical probability, a form of reasoning which requires for its exact statement different though equally well-defined concepts.(286-7)…

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H_{0} is more powerful than any other equivalent test, with regard to an alternative hypothesis H_{1}, when it rejects H_{0} in a set of samples having an assigned aggregate frequency ε when H_{0} is true, and the greatest possible aggregate frequency when H_{1} is true.

If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H_{1} is less than that of any other group of samples outside the region, but is not less on the hypothesis H_{0}, then the test can evidently be made more powerful by substituting the one group for the other.

Consequently, for the most powerful test possible the ratio of the probabilities of occurrence on the hypothesis H_{0} to that on the hypothesis H_{1} is less in all samples in the region of rejection than in any sample outside it. For samples involving continuous variation the region of rejection will be bounded by contours for which this ratio is constant. The regions of rejection will then be required in which the likelihood of H_{0} bears to the likelihood of H_{1}, a ratio less than some fixed value defining the contour. (295)…

It is evident, at once, that such a system is only possible when the class of hypotheses considered involves only a single parameter θ, or, what come to the same thing, when all the parameters entering into the specification of the population are definite functions of one of their number. In this case, the regions defined by the uniformly most powerful test of significance are those defined by the estimate of maximum likelihood, T. For the test to be uniformly most powerful, moreover, these regions must be independent of θ showing that the statistic must be of the special type distinguished as sufficient. Such sufficient statistics have been shown to contain all the information which the sample provides relevant to the value of the appropriate parameter θ . It is inevitable therefore that if such a statistic exists it should uniquely define the contours best suited to discriminate among hypotheses differing only in respect of this parameter; and it is surprising that Neyman and Pearson should lay it down as a preliminary consideration that ‘the tesitng of statistical hypotheses cannot be treated as a problem in estimation.’ When tests are considered only in relation to sets of hypotheses specified by one or more variable parameters, the efficacy of the tests can be treated directly as the problem of estimation of these parameters. Regard for what has been established in that theory, apart from the light it throws on the results already obtained by their own interesting line of approach, should also aid in treating the difficulties inherent in cases in which no sufficient statistics exists. (296)

I think that the mention of power by Fisher is not so surprising if one interprets the difference between Fisher and Neyman and Pearson like this. Fisher’s point of view (which I share as regards this)is that likelihood is fundamental and power is a concomitant bonus in some cases. In fixed sample problems with no nuisance parameters and continuous outcomes there is no conflict. Neyman’s point of view is that power is fundamental. In the very early days of the theory there may have been an insufficient appreciation that in more complex cases there would be a divergence in practice. (I say “may have”, one would need to study the written record more carefully than I have.) One interpretation of Fisher’s attitude would be that he tolerated power until the point at which Neyman asserted that it was more fundamental than likelihood. (My evidence that Neyman did think so is partly indirect and is influence by Constance Reid’s book but it is Lehmann speaking. See P 92, so unfortunately it is Reid’s account of Lehmann’s view of Neyman!)

If my interpretation is correct, it is in particular the fact that Neyman’s peculiar analysis

of Latin squares originated from the same period that power began to be asserted as being more important than likelihood that led to the breakdown in relations between Fisher and Neyman.

Hadn’t noticed this, but was just rereading this material. Actually I think one could say there is evidence that it’s somewhat the reverse, that Fisher didn’t want to mention power, even where the concept seemed relevant, after the breakdown. I see this point also in Lehmann’s (2011) book, Fisher, Neyman and the Creation of Classical Statistics.