Today is R.A. Fisher’s birthday. I’ll reblog some Fisherian items this week with a few new remarks. This paper comes just before the conflicts with Neyman and Pearson (N-P) erupted. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see Fisher and N-P as ending up in a similar place while starting from different origins, as David Cox might say [1]. Unfortunately, the blow-up that occurred soon after is behind today’s misdirected war vs statistical significance tests.* I quote just the most relevant portions…the full article is linked below.** Happy Birthday Fisher! Continue reading
Posts Tagged With: induction
Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
Today is R.A. Fisher’s birthday. I will post some Fisherian items this week in recognition of it*. This paper comes just before the conflicts with Neyman and Pearson erupted. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. We may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!
“Two New Properties of Mathematical Likelihood“
by R.A. Fisher, F.R.S.
Proceedings of the Royal Society, Series A, 144: 285-307 (1934)
The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading
Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
Today is R.A. Fisher’s birthday. I’ll post some Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!
“Two New Properties of Mathematical Likelihood“
by R.A. Fisher, F.R.S.
Proceedings of the Royal Society, Series A, 144: 285-307 (1934)
The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading
Guest Blog: ARIS SPANOS: The Enduring Legacy of R. A. Fisher
By Aris Spanos
One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most remarkable, but least recognized, achievement was to initiate the recasting of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:
Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)
where the distribution of the sample f(x;θ) ‘encapsulates’ the probabilistic information in the statistical model.
Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily confined to the description of the distributional features of the data in hand using the histogram and the first few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.
Fisher was able to recast statistical inference by turning Karl Pearson’s approach, proceeding from data x0 in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespecified Mθ(x) (a ‘hypothetical infinite population’), and view x0 as a ‘typical’ realization thereof; see Spanos (1999). Continue reading
R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
Today is R.A. Fisher’s birthday. I’ll post some different Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!
“Two New Properties of Mathematical Likelihood“
by R.A. Fisher, F.R.S.
Proceedings of the Royal Society, Series A, 144: 285-307 (1934)
The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. Continue reading
R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’: Just before breaking up (with N-P)
In recognition of R.A. Fisher’s birthday tomorrow, I will post several entries on him. I find this (1934) paper to be intriguing –immediately before the conflicts with Neyman and Pearson erupted. It represents essentially the last time he could take their work at face value, without the professional animosities that almost entirely caused, rather than being caused by, the apparent philosophical disagreements and name-calling everyone focuses on. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see them as ending up in a very similar place (no pun intended) while starting from different origins. I quote just the most relevant portions…the full article is linked below. I’d blogged it earlier here. You may find some gems in it.
‘Two new Properties of Mathematical Likelihood’
by R.A. Fisher, F.R.S.
Proceedings of the Royal Society, Series A, 144: 285-307(1934)
The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true.
If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading
R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
Exactly 1 year ago: I find this to be an intriguing discussion–before some of the conflicts with N and P erupted. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below.
by R.A. Fisher, F.R.S.
Proceedings of the Royal Society, Series A, 144: 285-307 (1934)
The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true.
If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading
Aris Spanos: The Enduring Legacy of R. A. Fisher
More Fisher insights from A. Spanos, this from 2 years ago:
One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most remarkable, but least recognized, achievement was to initiate the recasting of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:
Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)
where the distribution of the sample f(x;θ) ‘encapsulates’ the probabilistic information in the statistical model.
Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily confined to the description of the distributional features of the data in hand using the histogram and the first few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn). As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.
Fisher was able to recast statistical inference by turning Karl Pearson’s approach, proceeding from data x0 in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespecified Mθ(x) (a ‘hypothetical infinite population’), and view x0 as a ‘typical’ realization thereof; see Spanos (1999).
In my mind, Fisher’s most enduring contribution is his devising a general way to ‘operationalize’ errors by embedding the material experiment into Mθ(x), and taming errors via probabilification, i.e. to define frequentist error probabilities in the context of a statistical model. These error probabilities are (a) deductively derived from the statistical model, and (b) provide a measure of the ‘effectiviness’ of the inference procedure: how often a certain method will give rise to correct inferences concerning the underlying ‘true’ Data Generating Mechanism (DGM). This cast aside the need for a prior. Both of these key elements, the statistical model and the error probabilities, have been refined and extended by Mayo’s error statistical approach (EGEK 1996). Learning from data is achieved when an inference is reached by an inductive procedure which, with high probability, will yield true conclusions from valid inductive premises (a statistical model); Mayo and Spanos (2011). Continue reading
R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
I find this to be an intriguing discussion–before some of the conflicts with N and P erupted. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below.
by R.A. Fisher, F.R.S.
Proceedings of the Royal Society, Series A, 144: 285-307 (1934)
The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true.
If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other.
Consequently, for the most powerful test possible the ratio of the probabilities of occurrence on the hypothesis H0 to that on the hypothesis H1 is less in all samples in the region of rejection than in any sample outside it. For samples involving continuous variation the region of rejection will be bounded by contours for which this ratio is constant. The regions of rejection will then be required in which the likelihood of H0 bears to the likelihood of H1, a ratio less than some fixed value defining the contour. (295)…
It is evident, at once, that such a system is only possible when the class of hypotheses considered involves only a single parameter θ, or, what come to the same thing, when all the parameters entering into the specification of the population are definite functions of one of their number. In this case, the regions defined by the uniformly most powerful test of significance are those defined by the estimate of maximum likelihood, T. For the test to be uniformly most powerful, moreover, these regions must be independent of θ showing that the statistic must be of the special type distinguished as sufficient. Such sufficient statistics have been shown to contain all the information which the sample provides relevant to the value of the appropriate parameter θ . It is inevitable therefore that if such a statistic exists it should uniquely define the contours best suited to discriminate among hypotheses differing only in respect of this parameter; and it is surprising that Neyman and Pearson should lay it down as a preliminary consideration that ‘the tesitng of statistical hypotheses cannot be treated as a problem in estimation.’ When tests are considered only in relation to sets of hypotheses specified by one or more variable parameters, the efficacy of the tests can be treated directly as the problem of estimation of these parameters. Regard for what has been established in that theory, apart from the light it throws on the results already obtained by their own interesting line of approach, should also aid in treating the difficulties inherent in cases in which no sufficient statistics exists. (296)
Clark Glymour: The Theory of Search Is the Economics of Discovery (part 2)
“Some Thoughts Prompted by David Hendry’s Essay *, (RMM) Special Topic: Statistical Science and Philosophy of Science,” by Professor Clark Glymour
Part 2 (of 2) (Please begin with part 1)
The first thing one wants to know about a search method is what it is searching for, what would count as getting it right. One might want to estimate a probability distribution, or get correct forecasts of some probabilistic function of the distribution (e.g., out-of-sample means), or a causal structure, or some probabilistic function of the distribution resulting from some class of interventions. Secondly, one wants to know about what decision theorists call a loss function, but less precisely, what is the comparative importance of various errors of measurement, or, in other terms, what makes some approximations better than others. Third, one wants a limiting consistency proof: sufficient conditions for the search to reach the goal in the large sample limit. There are various kinds of consistency—pointwise versus uniform for example—and one wants to know which of those, if any, hold for a search method under what assumptions about the hypothesis space and the sampling distribution. Fourth, one wants to know as much as possible about the behavior of the search method on finite samples. In simple cases of statistical estimation there are analytic results; more often for search methods only simulation results are possible, but if so, one wants them to explore the bounds of failure, not just easy cases. And, of course, one wants a rationale for limiting the search space, as well as, some sense of how wrong the search can be if those limits are violated in various ways.
There are other important economic features of search procedures. Probability distributions (or likelihood functions) can instantiate any number of constraints—vanishing partial correlations for example, or inequalities of correlations. Suppose the hypothesis space delimits some big class of probability distributions. Suppose the search proceeds by testing constraints (the points that follow apply as well if the procedure computes posterior probabilities for particular hypotheses and applies a decision rule.) There is a natural partial ordering of classes of constraints: B is weaker than A if and only if every distribution that satisfies class A satisfies class B. Other things equal, a weakest class might be preferred because it requires fewer tests. But more important is what the test of a constraint does in efficiently guiding the search. A test that eliminates a particular hypothesis is not much help. A test that eliminates a big class of hypotheses is a lot of help.
Other factors: the power of the requisite tests; the numbers of tests (or posterior probability assessments) required; the computational requirements of individual tests (or posterior probability assessments.) And so on. And, finally, search algorithms have varying degrees of generality. For example, there are general algorithms, such as the widely used PC search algorithm for graphical causal models, that are essentially search schema: stick in whatever decision procedure for conditional independence and PC becomes a search procedure using that conditional independence oracle. By contrast, some searches are so embedded in a particular hypothesis space that it is difficult to see the generality.
I am sure I am not qualified to comment on the details of Hendry’s search procedure, and even if I were, for reasons of space his presentation is too compressed for that. Still, I can make some general remarks. I do not know from his essay the answers to many of the questions pertinent to evaluating a search procedure that I raised above. For example, his success criterion is “congruence” and I have no idea what that is. That is likely my fault, since I have read only one of his books, and that long ago.
David Hendry dismisses “priors,” meaning, I think, Bayesian methods, with an argument from language acquisition. Kids don’t need priors to learn a language. I am not sure of Hendry’s logic. Particular grammars within a parametric “universal grammar” could in principle be learned by a Bayesian procedure, although I have no reason to think they are. But one way or the other, that has no import for whether Bayesian procedures are the most advantageous for various search problems by any of the criteria I have noted above. Sometimes they may be, sometimes not, there is no uniform answer, in part because computational requirements vary. I could give examples, but space forbids.
Abstractly, one could think there are two possible ways of searching when the set of relationships to be uncovered may form a complex web: start by positing all possible relationships and eliminate from there, or start by positing no relationships and build up. Hendry dismisses the latter, with what generality I do not know. What I do know is that the relations between “bottom-up” and “top-down” or “forward” and “backward” search can be intricate, and in some cases one may need both for consistency. Sometimes either will do. Graphical models, for example can be searched starting with the assumption that every variable influences every other and eliminating, or starting with the assumption that no variable influences any other and adding. There are pointwise consistent searches in both directions. The real difference is in complexity.
Peter Grünwald: Follow-up on Cherkassky’s Comments
A comment from Professor Peter Grünwald
Head, Information-theoretic Learning Group, Centrum voor Wiskunde en Informatica (CWI)
Part-time full professor at Leiden University.
This is a follow-up on Vladimir Cherkassky’s comments on Deborah’s blog. First of all let me thank Vladimir for taking the time to clarify his position. Still, there’s one issue where we disagree and which, at the same time, I think, needs clarification, so I decided to write this follow-up.[related posts 1]
The issue is about how central VC (Vapnik-Chervonenkis)-theory is to inductive inference.
I agree with Vladimir that VC-theory is one of the most important achievements in the field ever, and indeed, that it fundamentally changed our way of thinking about learning from data. Yet I also think that there are many problems of inductive inference to which it has no direct bearing. Some of these are concerned with hypothesis testing, but even when one is concerned with prediction accuracy – which Vladimir considers the basic goal – there are situations where I do not see how it plays a direct role. One of these is sequential prediction with log-loss or its generalization, Cover’s loss. This loss function plays a fundamental role in (1) language modeling, (2) on-line data compression, (3a) gambling and (3b) sequential investment on the stock market (here we need Cover’s loss). [a superquick intro to log-loss as well as some references are given below under [A]; see also my talk at the Ockham workshop (slides 16-26 about weather forecasting!) )
Vladimir Cherkassky Responds on Foundations of Simplicity
I thank Dr. Vladimir Cherkassky for taking up my general invitation to comment. I don’t have much to add to my original post[i], except to make two corrections at the end of this post. I invite readers’ comments.
As I could not participate in the discussion session on Sunday, I would like to address several technical issues and points of disagreement that became evident during this workshop. All opinions are mine, and may not be representative of the “machine learning community.” Unfortunately, the machine learning community at large is not very much interested in the philosophical and methodological issues. This breeds a lot of fragmentation and confusion, as evidenced by the existence of several technical fields: machine learning, statistics, data mining, artificial neural networks, computational intelligence, etc.—all of which are mainly concerned with the same problem of estimating good predictive models from data.
Occam’s Razor (OR) is a general metaphor in the philosophy of science, and it has been discussed for ages. One of the main goals of this workshop was to understand the role of OR as a general inductive principle in the philosophy of science and, in particular, its importance in data-analytic knowledge discovery for statistics and machine learning.
Data-analytic modeling is concerned with estimating good predictive models from finite data samples. This is directly related to the philosophical problem of inductive inference. The problem of learning (generalization) from finite data had been formally investigated in VC-theory ~ 40 years ago. This theory starts with a mathematical formulation of the problem of learning from finite samples, without making any assumptions about parametric distributions. This formalization is very general and relevant to many applications in machine learning, statistics, life sciences, etc. Further, this theory provides necessary and sufficient conditions for generalization. That is, a set of admissible models (hypotheses about the data) should be constrained, i.e., should have finite VC-dimension. Therefore, any inductive theory or algorithm designed to explain the data should satisfy VC-theoretical conditions. Continue reading
Elliott Sober Responds on Foundations of Simplicity
Here are a few comments on your recent blog about my ideas on parsimony. Thanks for inviting me to contribute!
You write that in model selection, “’parsimony fights likelihood,’ while, in adequate evolutionary theory, the two are thought to go hand in hand.” The second part of this statement isn’t correct. There are sufficient conditions (i.e., models of the evolutionary process) that entail that parsimony and maximum likelihood are ordinally equivalent, but there are cases in which they are not. Biologists often have data sets in which maximum parsimony and maximum likelihood disagree about which phylogenetic tree is best.
You also write that “error statisticians view hypothesis testing as between exhaustive hypotheses H and not-H (usually within a model).” I think that the criticism of Bayesianism that focuses on the problem of assessing the likelihoods of “catch-all hypotheses” applies to this description of your error statistical philosophy. The General Theory of Relativity, for example, may tell us how probable a set of observations is, but its negation does not. I note that you have “usually within a model” in parentheses. In many such cases, two alternatives within a model will not be exhaustive even within the confines of a model and of course they won’t be exhaustive if we consider a wider domain.
Deviates, Sloths, and Exiles: Philosophical Remarks on the Ockham’s Razor Workshop*
My flight out of Pittsburgh has been cancelled, and as I may be stuck in the airport for some time, I will try to make a virtue of it by jotting down some of my promised reflections on the “simplicity and truth” conference at Carnegie Mellon (organized by Kevin Kelly). My remarks concern only the explicit philosophical connections drawn by (4 of) the seven non-philosophers who spoke. For more general remarks, see blogs of: Larry Wasserman (Normal Deviate) and Cosma Shalizi (Three-Toed Sloth). (The following, based on my notes and memory, may include errors/gaps, but I trust that my fellow bloggers and sloggers, will correct me.)
First to speak were Vladimir Vapnik and Vladimir Cherkassky, from the field of machine learning, a discipline I know of only formally. Vapnik, of the Vapnik Chervonenkis (VC) theory, is known for his seminal work here. Their papers, both of which addressed directly the philosophical implications of their work, share enough themes to merit being taken up together.
Vapnik and Cherkassky find a number of striking dichotomies in the standard practice of both philosophy and statistics. They contrast the “classical” conception of scientific knowledge as essentially rational with the more modern, “data-driven” empirical view:
The former depicts knowledge as objective, deterministic, rational. Ockham’s razor is a kind of synthetic a priori statement that warrants our rational intuitions as the foundation of truth with a capital T, as well as a naïve realism (we may rely on Cartesian “clear and distinct” ideas; God does not deceive; and so on). The latter empirical view, illustrated by machine learning, is enlightened. It settles for predictive successes and instrumentalism, views models as mental constructs (in here, not out there), and exhorts scientists to restrict themselves to problems deemed “well posed” by machine-learning criteria.
But why suppose the choice is between assuming “a single best (true) theory or model” and the extreme empiricism of their instrumental machine learner? Continue reading
The Error Statistical Philosophy and The Practice of Bayesian Statistics: Comments on Gelman and Shalizi
The following is my commentary on a paper by Gelman and Shalizi, forthcoming (some time in 2013) in the British Journal of Mathematical and Statistical Psychology* (submitted February 14, 2012).
_______________________
“The Error Statistical Philosophy and the Practice of Bayesian Statistics: Comments on A. Gelman and C. Shalizi: Philosophy and the Practice of Bayesian Statistics”**
Deborah G. Mayo
- Introduction
I am pleased to have the opportunity to comment on this interesting and provocative paper. I shall begin by citing three points at which the authors happily depart from existing work on statistical foundations.
First, there is the authors’ recognition that methodology is ineluctably bound up with philosophy. If nothing else “strictures derived from philosophy can inhibit research progress” (p. 4). They note, for example, the reluctance of some Bayesians to test their models because of their belief that “Bayesian models were by definition subjective,” or perhaps because checking involves non-Bayesian methods (4, n4).
Second, they recognize that Bayesian methods need a new foundation. Although the subjective Bayesian philosophy, “strongly influenced by Savage (1954), is widespread and influential in the philosophy of science (especially in the form of Bayesian confirmation theory),”and while many practitioners perceive the “rising use of Bayesian methods in applied statistical work,” (2) as supporting this Bayesian philosophy, the authors flatly declare that “most of the standard philosophy of Bayes is wrong” (2 n2). Despite their qualification that “a statistical method can be useful even if its philosophical justification is in error”, their stance will rightly challenge many a Bayesian.
Two New Properties of Mathematical Likelihood
Note: I find this to be an intriguing, if perhaps little-known, discussion, long before the conflicts reflected in the three articles (the “triad”) below, Here Fisher links his tests to the Neyman and Pearson lemma in terms of power. I invite your deconstructions/comments.
by R.A. Fisher, F.R.S.
Proceedings of the Royal Society, Series A, 144: 285-307 (1934)
To Thomas Bayes must be given the credit of broaching the problem of using the concepts of mathematical probability in discussing problems of inductive inference, in which we argue from the particular to the general; or, in statistical phraselogy, argue from the sample to the population, from which, ex hypothesi, the sample was drawn. Bayes put forward, with considerable caution, a method by which such problems could be reduced to the form of problems of probability. His method of doing this depended essentially on postulating a priori knowledge, not of the particular population of which our observations form a sample, but of an imaginary population of populations from which this population was regarded as having been drawn at random. Clearly, if we have possession of such a priori knowledge, our problem is not properly an inductive one at all, for the population under discussion is then regarded merely as a particular case of a general type, of which we already possess exact knowledge, and are therefore in a position to draw exact deductive inferences.
Guest Blogger. ARIS SPANOS: The Enduring Legacy of R. A. Fisher
By Aris Spanos
One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most remarkable, but least recognized, achievement was to initiate the recasting of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:
Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)
where the distribution of the sample f(x;θ) ‘encapsulates’ the probabilistic information in the statistical model.
Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily confined to the description of the distributional features of the data in hand using the histogram and the first few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.
No-Pain Philosophy: Skepticism, Rationality, Popper and All That (part 2): Duhem’s problem & methodological falsification
(See Part 1)
5. Duhemian Problems of Falsification
Any interesting case of hypothesis falsification, or even a severe attempt to falsify, rests on both empirical and inductive hypotheses or claims. Consider the most simplistic form of deductive falsification (an instance of the valid form of modus tollens): “If H entails O, and not-O, then not-H.” (To infer “not-H” is to infer H is false, or, more often, it involves inferring there is some discrepancy in what H claims regarding the phenomenon in question). Continue reading