# Posts Tagged With: induction

## Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I’ll reblog some Fisherian items this week with a few new remarks. This paper comes just before the conflicts with Neyman and Pearson (N-P) erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see Fisher and N-P as ending up in a similar place while starting from different origins, as David Cox might say [1]. Unfortunately, the blow-up that occurred soon after is behind today’s misdirected war vs statistical significance tests.* I quote just the most relevant portions…the full article is linked below.** Happy Birthday Fisher! Continue reading

Categories: Fisher, phil/history of stat |

## Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I will post some Fisherian items this week in recognition of it*. This paper comes just before the conflicts with Neyman and Pearson erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  We may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

Categories: Fisher, phil/history of stat, Phil6334/ Econ 6614, Statistics |

## Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I’ll post some Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307  (1934)

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

Categories: Fisher, phil/history of stat, Statistics |

## Guest Blog: ARIS SPANOS: The Enduring Legacy of R. A. Fisher

By Aris Spanos

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most re­markable, but least recognized, achievement was to initiate the recast­ing of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the proba­bilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily conﬁned to the description of the distributional features of the data in hand using the histogram and the ﬁrst few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Fisher was able to recast statistical inference by turning Karl Pear­son’s approach, proceeding from data x0 in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespeciﬁed Mθ(x) (a ‘hypothetical inﬁnite population’), and view x0 as a ‘typical’ realization thereof; see Spanos (1999). Continue reading

Categories: Fisher, Spanos, Statistics |

## R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I’ll post some different Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. Continue reading

Categories: Fisher, phil/history of stat, Statistics |

## R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’: Just before breaking up (with N-P)

17 February 1890–29 July 1962

In recognition of R.A. Fisher’s birthday tomorrow, I will post several entries on him. I find this (1934) paper to be intriguing –immediately before the conflicts with Neyman and Pearson erupted. It represents essentially the last time he could take their work at face value, without the professional animosities that almost entirely caused, rather than being caused by, the apparent philosophical disagreements and name-calling everyone focuses on. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a very similar place (no pun intended) while starting from different origins. I quote just the most relevant portions…the full article is linked below. I’d blogged it earlier here.  You may find some gems in it.

‘Two new Properties of Mathematical Likelihood’

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307(1934)

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true.

If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

Categories: Fisher, phil/history of stat, Statistics |

## R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Exactly 1 year ago: I find this to be an intriguing discussion–before some of the conflicts with N and P erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below.

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true.

If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

Categories: Fisher, phil/history of stat, Statistics |

## Aris Spanos: The Enduring Legacy of R. A. Fisher

More Fisher insights from A. Spanos, this from 2 years ago:

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most re­markable, but least recognized, achievement was to initiate the recast­ing of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the proba­bilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily conﬁned to the description of the distributional features of the data in hand using the histogram and the ﬁrst few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn). As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Fisher was able to recast statistical inference by turning Karl Pear­son’s approach, proceeding from data x0 in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespeciﬁed Mθ(x) (a ‘hypothetical inﬁnite population’), and view x0 as a ‘typical’ realization thereof; see Spanos (1999).

In my mind, Fisher’s most enduring contribution is his devising a general way to ‘operationalize’ errors by embedding the material ex­periment into Mθ(x), and taming errors via probabiliﬁcation, i.e. to deﬁne frequentist error probabilities in the context of a statistical model. These error probabilities are (a) deductively derived from the statistical model, and (b) provide a measure of the ‘eﬀectiviness’ of the inference procedure: how often a certain method will give rise to correct in­ferences concerning the underlying ‘true’ Data Generating Mechanism (DGM). This cast aside the need for a prior. Both of these key elements, the statistical model and the error probabilities, have been reﬁned and extended by Mayo’s error statistical approach (EGEK 1996). Learning from data is achieved when an inference is reached by an inductive procedure which, with high probability, will yield true conclusions from valid inductive premises (a statistical model); Mayo and Spanos (2011). Continue reading

Categories: Fisher, phil/history of stat, Statistics |

## R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

I find this to be an intriguing discussion–before some of the conflicts with N and P erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below.

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true.

If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other.

Consequently, for the most powerful test possible the ratio of the probabilities of occurrence on the hypothesis H0 to that on the hypothesis H1 is less in all samples in the region of rejection than in any sample outside it. For samples involving continuous variation the region of rejection will be bounded by contours for which this ratio is constant. The regions of rejection will then be required in which the likelihood of H0 bears to the likelihood of H1, a ratio less than some fixed value defining the contour. (295)…

It is evident, at once, that such a system is only possible when the class of hypotheses considered involves only a single parameter θ, or, what come to the same thing, when all the parameters entering into the specification of the population are definite functions of one of their number.  In this case, the regions defined by the uniformly most powerful test of significance are those defined by the estimate of maximum likelihood, T.  For the test to be uniformly most powerful, moreover, these regions must be independent of θ showing that the statistic must be of the special type distinguished as sufficient.  Such sufficient statistics have been shown to contain all the information which the sample provides relevant to the value of the appropriate parameter θ . It is inevitable therefore that if such a statistic exists it should uniquely define the contours best suited to discriminate among hypotheses differing only in respect of this parameter; and it is surprising that Neyman and Pearson should lay it down as a preliminary consideration that ‘the tesitng of statistical hypotheses cannot be treated as a problem in estimation.’ When tests are considered only in relation to sets of hypotheses specified by one or more variable parameters, the efficacy of the tests can be treated directly as the problem of estimation of these parameters.  Regard for what has been established in that theory, apart from the light it throws on the results already obtained by their own interesting line of approach, should also aid in treating the difficulties inherent in cases in which no sufficient statistics exists. (296)

Categories: phil/history of stat, Statistics |

## Clark Glymour: The Theory of Search Is the Economics of Discovery (part 2)

“Some Thoughts Prompted by David Hendry’s Essay * (RMM) Special Topic: Statistical Science and Philosophy of Science,” by  Professor Clark Glymour

Part 2 (of 2) (Please begin with part 1)

The first thing one wants to know about a search method is what it is searching for, what would count as getting it right. One might want to estimate a probability distribution, or get correct forecasts of some probabilistic function of the distribution (e.g., out-of-sample means), or a causal structure, or some probabilistic function of the distribution resulting from some class of interventions.  Secondly, one wants to know about what decision theorists call a loss function, but less precisely, what is the comparative importance of various errors of measurement, or, in other terms, what makes some approximations better than others. Third, one wants a limiting consistency proof: sufficient conditions for the search to reach the goal in the large sample limit. There are various kinds of consistency—pointwise versus uniform for example—and one wants to know which of those, if any, hold for a search method under what assumptions about the hypothesis space and the sampling distribution. Fourth, one wants to know as much as possible about the behavior of the search method on finite samples. In simple cases of statistical estimation there are analytic results; more often for search methods only simulation results are possible, but if so, one wants them to explore the bounds of failure, not just easy cases. And, of course, one wants a rationale for limiting the search space, as well as, some sense of how wrong the search can be if those limits are violated in various ways.

There are other important economic features of search procedures. Probability distributions (or likelihood functions) can instantiate any number of constraints—vanishing partial correlations for example, or inequalities of correlations. Suppose the hypothesis space delimits some big class of probability distributions. Suppose the search proceeds by testing constraints (the points that follow apply as well if the procedure computes posterior probabilities for particular hypotheses and applies a decision rule.) There is a natural partial ordering of classes of constraints: B is weaker than A if and only if every distribution that satisfies class A satisfies class B.  Other things equal, a weakest class might be preferred because it requires fewer tests.  But more important is what the test of a constraint does in efficiently guiding the search. A test that eliminates a particular hypothesis is not much help. A test that eliminates a big class of hypotheses is a lot of help.

Other factors: the power of the requisite tests; the numbers of tests (or posterior probability assessments) required; the computational requirements of individual tests (or posterior probability assessments.) And so on.  And, finally, search algorithms have varying degrees of generality. For example, there are general algorithms, such as the widely used PC search algorithm for graphical causal models, that are essentially search schema: stick in whatever decision procedure for conditional independence and PC becomes a search procedure using that conditional independence oracle. By contrast, some searches are so embedded in a particular hypothesis space that it is difficult to see the generality.

I am sure I am not qualified to comment on the details of Hendry’s search procedure, and even if I were, for reasons of space his presentation is too compressed for that. Still, I can make some general remarks.  I do not know from his essay the answers to many of the questions pertinent to evaluating a search procedure that I raised above. For example, his success criterion is “congruence” and I have no idea what that is. That is likely my fault, since I have read only one of his books, and that long ago.

David Hendry dismisses “priors,” meaning, I think, Bayesian methods, with an argument from language acquisition. Kids don’t need priors to learn a language. I am not sure of Hendry’s logic.  Particular grammars within a parametric “universal grammar” could in principle be learned by a Bayesian procedure, although I have no reason to think they are. But one way or the other, that has no import for whether Bayesian procedures are the most advantageous for various search problems by any of the criteria I have noted above. Sometimes they may be, sometimes not, there is no uniform answer, in part because computational requirements vary. I could give examples, but space forbids.

Abstractly, one could think there are two possible ways of searching when the set of relationships to be uncovered may form a complex web: start by positing all possible relationships and eliminate from there, or start by positing no relationships and build up.  Hendry dismisses the latter, with what generality I do not know. What I do know is that the relations between “bottom-up” and “top-down” or “forward” and “backward” search can be intricate, and in some cases one may need both for consistency.  Sometimes either will do. Graphical models, for example can be searched starting with the assumption that every variable influences every other and eliminating, or starting with the assumption that no variable influences any other and adding.  There are pointwise consistent searches in both directions. The real difference is in complexity.

## Peter Grünwald: Follow-up on Cherkassky’s Comments

Peter Grünwald

A comment from Professor Peter Grünwald

Head, Information-theoretic Learning Group, Centrum voor Wiskunde en Informatica (CWI)
Part-time full professor  at Leiden University.

This is a follow-up on Vladimir Cherkassky’s comments on Deborah’s blog. First of all let me thank Vladimir for taking the time to clarify his position. Still, there’s one issue where we disagree and which, at the same time, I think, needs clarification, so I decided to write this follow-up.[related posts 1]

The issue is about how central VC (Vapnik-Chervonenkis)-theory is to inductive inference.

I agree with Vladimir that VC-theory is one of the most important achievements in the field ever, and indeed, that it fundamentally changed our way of thinking about learning from data. Yet I also think that there are many problems of inductive inference to which it has no direct bearing. Some of these are concerned with hypothesis testing, but even when one is concerned with prediction accuracy – which Vladimir considers the basic goal – there are situations where I do not see how it plays a direct role. One of these is sequential prediction with log-loss or its generalization, Cover’s loss. This loss function plays a fundamental role in (1) language modeling, (2) on-line data compression, (3a) gambling and (3b) sequential investment on the stock market (here we need Cover’s loss). [a superquick intro to log-loss as well as some references are given below under [A]; see also my talk at the Ockham workshop (slides 16-26 about weather forecasting!) )

Categories: philosophy of science, Statistics |

## Vladimir Cherkassky Responds on Foundations of Simplicity

I thank Dr. Vladimir Cherkassky for taking up my general invitation to comment. I don’t have much to add to my original post[i], except to make two corrections at the end of this post.  I invite readers’ comments.

As I could not participate in the discussion session on Sunday, I would like to address several technical issues and points of disagreement that became evident during this workshop. All opinions are mine, and may not be representative of the “machine learning community.” Unfortunately, the machine learning community at large is not very much interested in the philosophical and methodological issues. This breeds a lot of fragmentation and confusion, as evidenced by the existence of several technical fields: machine learning, statistics, data mining, artificial neural networks, computational intelligence, etc.—all of which are mainly concerned with the same problem of estimating good predictive models from data.

Occam’s Razor (OR) is a general metaphor in the philosophy of science, and it has been discussed for ages. One of the main goals of this workshop was to understand the role of OR as a general inductive principle in the philosophy of science and, in particular, its importance in data-analytic knowledge discovery for statistics and machine learning.

Data-analytic modeling is concerned with estimating good predictive models from finite data samples. This is directly related to the philosophical problem of inductive inference. The problem of learning (generalization) from finite data had been formally investigated in VC-theory ~ 40 years ago. This theory starts with a mathematical formulation of the problem of learning from finite samples, without making any assumptions about parametric distributions. This formalization is very general and relevant to many applications in machine learning, statistics, life sciences, etc. Further, this theory provides necessary and sufficient conditions for generalization. That is, a set of admissible models (hypotheses about the data) should be constrained, i.e., should have finite VC-dimension. Therefore, any inductive theory or algorithm designed to explain the data should satisfy VC-theoretical conditions. Continue reading

Categories: philosophy of science, Statistics |

## Elliott Sober Responds on Foundations of Simplicity

Here are a few comments on your recent blog about my ideas on parsimony.  Thanks for inviting me to contribute!

You write that in model selection, “’parsimony fights likelihood,’ while, in adequate evolutionary theory, the two are thought to go hand in hand.”  The second part of this statement isn’t correct.  There are sufficient conditions (i.e., models of the evolutionary process) that entail that parsimony and maximum likelihood are ordinally equivalent, but there are cases in which they are not.  Biologists often have data sets in which maximum parsimony and maximum likelihood disagree about which phylogenetic tree is best.

You also write that “error statisticians view hypothesis testing as between exhaustive hypotheses H and not-H (usually within a model).”  I think that the criticism of Bayesianism that focuses on the problem of assessing the likelihoods of “catch-all hypotheses” applies to this description of your error statistical philosophy.  The General Theory of Relativity, for example, may tell us how probable a set of observations is, but its negation does not.  I note that you have “usually within a model” in parentheses.  In many such cases, two alternatives within a model will not be exhaustive even within the confines of a model and of course they won’t be exhaustive if we consider a wider domain.

Categories: philosophy of science, Statistics |

## Deviates, Sloths, and Exiles: Philosophical Remarks on the Ockham’s Razor Workshop*

Picking up the pieces…

My flight out of Pittsburgh has been cancelled, and as I may be stuck in the airport for some time, I will try to make a virtue of it by jotting down some of my promised reflections on the “simplicity and truth” conference at Carnegie Mellon (organized by Kevin Kelly). My remarks concern only the explicit philosophical connections drawn by (4 of) the seven non-philosophers who spoke. For more general remarks, see blogs of: Larry Wasserman (Normal Deviate) and Cosma Shalizi (Three-Toed Sloth). (The following, based on my notes and memory, may include errors/gaps, but I trust that my fellow bloggers and sloggers, will correct me.)

First to speak were Vladimir Vapnik and Vladimir Cherkassky, from the field of machine learning, a discipline I know of only formally. Vapnik, of the Vapnik Chervonenkis (VC) theory, is known for his seminal work here. Their papers, both of which addressed directly the philosophical implications of their work, share enough themes to merit being taken up together.

Vapnik and Cherkassky find a number of striking dichotomies in the standard practice of both philosophy and statistics. They contrast the “classical” conception of scientific knowledge as essentially rational with the more modern, “data-driven” empirical view:

The former depicts knowledge as objective, deterministic, rational. Ockham’s razor is a kind of synthetic a priori statement that warrants our rational intuitions as the foundation of truth with a capital T, as well as a naïve realism (we may rely on Cartesian “clear and distinct” ideas; God does not deceive; and so on). The latter empirical view, illustrated by machine learning, is enlightened. It settles for predictive successes and instrumentalism, views models as mental constructs (in here, not out there), and exhorts scientists to restrict themselves to problems deemed “well posed” by machine-learning criteria.

But why suppose the choice is between assuming “a single best (true) theory or model” and the extreme empiricism of their instrumental machine learner? Continue reading

Categories: philosophy of science, Statistics |

## The Error Statistical Philosophy and The Practice of Bayesian Statistics: Comments on Gelman and Shalizi

The following is my commentary on a paper by Gelman and Shalizi, forthcoming (some time in 2013) in the British Journal of Mathematical and Statistical Psychology* (submitted February 14, 2012).
_______________________

1. Introduction

I am pleased to have the opportunity to comment on this interesting and provocative paper. I shall begin by citing three points at which the authors happily depart from existing work on statistical foundations.

First, there is the authors’ recognition that methodology is ineluctably bound up with philosophy. If nothing else “strictures derived from philosophy can inhibit research progress” (p. 4). They note, for example, the reluctance of some Bayesians to test their models because of their belief that “Bayesian models were by definition subjective,” or perhaps because checking involves non-Bayesian methods (4, n4).

Second, they recognize that Bayesian methods need a new foundation. Although the subjective Bayesian philosophy, “strongly influenced by Savage (1954), is widespread and influential in the philosophy of science (especially in the form of Bayesian confirmation theory),”and while many practitioners perceive the “rising use of Bayesian methods in applied statistical work,” (2) as supporting this Bayesian philosophy, the authors flatly declare that “most of the standard philosophy of Bayes is wrong” (2 n2). Despite their qualification that “a statistical method can be useful even if its philosophical justification is in error”, their stance will rightly challenge many a Bayesian.

Categories: Statistics |

## A. Spanos: Jerzy Neyman and his Enduring Legacy

A Statistical Model as a Chance Mechanism

Aris Spanos

Jerzy Neyman (April 16, 1894 – August 5, 1981), was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals.

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for  non-random samples. Fisher’s original parametric statistical model Mθ(x) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data x0:=(x1,x2,…,xn) can be viewed as a ‘truly representative sample’ from that ‘population’:

“The postulate of randomness thus resolves itself into the question, Of what population is this a random sample? (ibid., p. 313), underscoring that: the adequacy of our choice may be tested a posteriori.’’ (p. 314)

In cases where data x0 come from sample surveys or it can be viewed as a typical realization of a random sample X:=(X1,X2,…,Xn), i.e. Independent and Identically Distributed (IID) random variables, the ‘population’ metaphor can be helpful in adding some intuitive appeal to the inductive dimension of statistical inference, because one can imagine using a subset of a population (the sample) to draw inferences pertaining to the whole population.

This ‘infinite population’ metaphor, however, is of limited value in most applied disciplines relying on observational data. To see how inept this metaphor is consider the question: what is the hypothetical ‘population’ when modeling the gyrations of stock market prices? More generally, what is observed in such cases is a certain on-going process and not a fixed population from which we can select a representative sample. For that very reason, most economists in the 1930s considered Fisher’s statistical modeling irrelevant for economic data! Continue reading

## Two New Properties of Mathematical Likelihood

17 February 1890–29 July 1962

Note: I find this to be an intriguing, if perhaps little-known, discussion, long before the conflicts reflected in the three articles (the “triad”) below,  Here Fisher links his tests to the Neyman and Pearson lemma in terms of power.  I invite your deconstructions/comments.

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

To Thomas Bayes must be given the credit of broaching the problem of using the concepts of mathematical probability in discussing problems of inductive inference, in which we argue from the particular to the general; or, in statistical phraselogy, argue from the sample to the population, from which, ex hypothesi, the sample was drawn.  Bayes put forward, with considerable caution, a method by which such problems could be reduced to the form of problems of probability.  His method of doing this depended essentially on postulating a priori knowledge, not of the particular population of which our observations form a sample, but of an imaginary population of populations from which this population was regarded as having been drawn at random.  Clearly, if we have possession of such a priori knowledge, our problem is not properly an inductive one at all, for the population under discussion is then regarded merely as a particular case of a general type, of which we already possess exact knowledge, and are therefore in a position to draw exact deductive inferences.

Categories: Likelihood Principle, Statistics |

## Guest Blogger. ARIS SPANOS: The Enduring Legacy of R. A. Fisher

By Aris Spanos

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most re­markable, but least recognized, achievement was to initiate the recast­ing of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the proba­bilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily conﬁned to the description of the distributional features of the data in hand using the histogram and the ﬁrst few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Categories: Statistics |

## E.S. PEARSON: Statistical Concepts in Their Relation to Reality

by E.S. PEARSON (1955)

SUMMARY: This paper contains a reply to some criticisms made by Sir Ronald Fisher in his recent article on “Scientific Methods and Scientific Induction”.

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955), it is impossible to leave him altogether unanswered.

Categories: Statistics |

## No-Pain Philosophy: Skepticism, Rationality, Popper and All That (part 2): Duhem’s problem & methodological falsification

(See Part 1)

5. Duhemian Problems of Falsification

Any interesting case of hypothesis falsification, or even a severe attempt to falsify, rests on both empirical and inductive hypotheses or claims. Consider the most simplistic form of deductive falsification (an instance of the valid form of modus tollens): “If H entails O, and not-O, then not-H.”  (To infer “not-H” is to infer H is false, or, more often, it involves inferring there is some discrepancy in what H claims regarding the phenomenon in question). Continue reading

Categories: No-Pain Philosophy, philosophy of science |