Spanos

Guest Blog: ARIS SPANOS: The Enduring Legacy of R. A. Fisher

By Aris Spanos

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most re­markable, but least recognized, achievement was to initiate the recast­ing of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the proba­bilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily confined to the description of the distributional features of the data in hand using the histogram and the first few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Fisher was able to recast statistical inference by turning Karl Pear­son’s approach, proceeding from data x0 in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespecified Mθ(x) (a ‘hypothetical infinite population’), and view x0 as a ‘typical’ realization thereof; see Spanos (1999). Continue reading

Categories: Fisher, Spanos, Statistics | Tags: , , , , , , | Leave a comment

“Error statistical modeling and inference: Where methodology meets ontology” A. Spanos and D. Mayo

copy-cropped-ampersand-logo-blog1

.

A new joint paper….

“Error statistical modeling and inference: Where methodology meets ontology”

Aris Spanos · Deborah G. Mayo

Abstract: In empirical modeling, an important desideratum for deeming theoretical entities and processes real is that they can be reproducible in a statistical sense. Current day crises regarding replicability in science intertwine with the question of how statistical methods link data to statistical and substantive theories and models. Different answers to this question have important methodological consequences for inference, which are intertwined with a contrast between the ontological commitments of the two types of models. The key to untangling them is the realization that behind every substantive model there is a statistical model that pertains exclusively to the probabilistic assumptions imposed on the data. It is not that the methodology determines whether to be a realist about entities and processes in a substantive field. It is rather that the substantive and statistical models refer to different entities and processes, and therefore call for different criteria of adequacy.

Keywords: Error statistics · Statistical vs. substantive models · Statistical ontology · Misspecification testing · Replicability of inference · Statistical adequacy

To read the full paper: “Error statistical modeling and inference: Where methodology meets ontology.”

The related conference.

Mayo & Spanos spotlight

Reference: Spanos, A. & Mayo, D. G. (2015). “Error statistical modeling and inference: Where methodology meets ontology.” Synthese (online May 13, 2015), pp. 1-23.

Categories: Error Statistics, misspecification testing, O & M conference, reproducibility, Severity, Spanos | 2 Comments

Spurious Correlations: Death by getting tangled in bedsheets and the consumption of cheese! (Aris Spanos)

Spanos

Spanos

These days, there are so many dubious assertions about alleged correlations between two variables that an entire website: Spurious Correlation (Tyler Vigen) is devoted to exposing (and creating*) them! A classic problem is that the means of variables X and Y may both be trending in the order data are observed, invalidating the assumption that their means are constant. In my initial study with Aris Spanos on misspecification testing, the X and Y means were trending in much the same way I imagine a lot of the examples on this site are––like the one on the number of people who die by becoming tangled in their bedsheets and the per capita consumption of cheese in the U.S.

The annual data for 2000-2009 are: xt: per capita consumption of cheese (U.S.) : x = (29.8, 30.1, 30.5, 30.6, 31.3, 31.7, 32.6, 33.1, 32.7, 32.8); yt: Number of people who died by becoming tangled in their bedsheets: y = (327, 456, 509, 497, 596, 573, 661, 741, 809, 717)

I asked Aris Spanos to have a look, and it took him no time to identify the main problem. He was good enough to write up a short note which I’ve pasted as slides.

spurious-correlation-updated-4-1024

Aris Spanos

Wilson E. Schmidt Professor of Economics
Department of Economics, Virginia Tech

OfQYQW8

 

*The site says that the server attempts to generate a new correlation every 60 seconds.

Categories: misspecification testing, Spanos, Statistics, Testing Assumptions | 14 Comments

A. Spanos: Jerzy Neyman and his Enduring Legacy

.

A Statistical Model as a Chance Mechanism
Aris Spanos 

Today is the birthday of Jerzy Neyman (April 16, 1894 – August 5, 1981). Neyman was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

Neyman: 16 April

Neyman: 16 April 1894 – 5 Aug 1981

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for  non-random samples. Fisher’s original parametric statistical model Mθ(x) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data x0:=(x1,x2,…,xn) can be viewed as a ‘truly representative sample’ from that ‘population’:

photoSmall

Fisher and Neyman

“The postulate of randomness thus resolves itself into the question, Of what population is this a random sample? (ibid., p. 313), underscoring that: the adequacy of our choice may be tested a posteriori.’’ (p. 314)

In cases where data x0 come from sample surveys or it can be viewed as a typical realization of a random sample X:=(X1,X2,…,Xn), i.e. Independent and Identically Distributed (IID) random variables, the ‘population’ metaphor can be helpful in adding some intuitive appeal to the inductive dimension of statistical inference, because one can imagine using a subset of a population (the sample) to draw inferences pertaining to the whole population. Continue reading

Categories: Neyman, phil/history of stat, Spanos, Statistics | Tags: , | Leave a comment

R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)

A SPANOS

.

In recognition of R.A. Fisher’s birthday….

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

Categories: Fisher, phil/history of stat, Spanos, Statistics | 6 Comments

Erich Lehmann: Statistician and Poet

Erich Lehmann 20 November 1917 – 12 September 2009

Erich Lehmann                       20 November 1917 –              12 September 2009

Memory Lane 1 Year (with update): Today is Erich Lehmann’s birthday. The last time I saw him was at the Second Lehmann conference in 2004, at which I organized a session on philosophical foundations of statistics (including David Freedman and D.R. Cox).

I got to know Lehmann, Neyman’s first student, in 1997.  One day, I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that).  He told me he was sitting in a very large room at an ASA meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, dark table sat just one book, all alone, shiny red.  He said he wondered if it might be of interest to him!  So he walked up to it….  It turned out to be my Error and the Growth of Experimental Knowledge (1996, Chicago), which he reviewed soon after. Some related posts on Lehmann’s letter are here and here.

That same year I remember having a last-minute phone call with Erich to ask how best to respond to a “funny Bayesian example” raised by Colin Howson. It is essentially the case of Mary’s positive result for a disease, where Mary is selected randomly from a population where the disease is very rare. See for example here. (It’s just like the case of our high school student Isaac). His recommendations were extremely illuminating, and with them he sent me a poem he’d written (which you can read in my published response here*). Aside from being a leading statistician, Erich had a (serious) literary bent. Continue reading

Categories: highly probable vs highly probed, phil/history of stat, Sir David Cox, Spanos, Statistics | Tags: , | Leave a comment

A. Spanos: Jerzy Neyman and his Enduring Legacy

A Statistical Model as a Chance Mechanism
Aris Spanos 

Jerzy Neyman (April 16, 1894 – August 5, 1981), was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

Neyman: 16 April

Neyman: 16 April 1894 – 5 Aug 1981

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for  non-random samples. Continue reading

Categories: phil/history of stat, Spanos, Statistics | Tags: , | 4 Comments

Phil 6334: March 26, philosophy of misspecification testing (Day #9 slides)

 

may-4-8-aris-spanos-e2809contology-methodology-in-statistical-modelinge2809d“Probability/Statistics Lecture Notes 6: An Introduction to Mis-Specification (M-S) Testing” (Aris Spanos)

 

[Other slides from Day 9 by guest, John Byrd, can be found here.]

Categories: misspecification testing, Phil 6334 class material, Spanos, Statistics | Leave a comment

Phil 6334: February 20, 2014 (Spanos): Day #5

may-4-8-aris-spanos-e2809contology-methodology-in-statistical-modelinge2809dPHIL 6334 – “Probability/Statistics Lecture Notes 3 for 2/20/14: Estimation (Point and Interval)”:(Prof. Spanos)*

*This is Day #5 on the Syllabus, as Day #4 had to be made up (Feb 24, 2014) due to snow. Slides for Day #4 will go up Feb. 26, 2014. (See the revised Syllabus Second Installment.)

Categories: Phil6334, Philosophy of Statistics, Spanos | 5 Comments

R. A. Fisher: how an outsider revolutionized statistics

A SPANOSToday is R.A. Fisher’s birthday and I’m reblogging the post by Aris Spanos which, as it happens, received the highest number of views of 2013.

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

Categories: Fisher, phil/history of stat, Spanos, Statistics | 6 Comments

More on deconstructing Larry Wasserman (Aris Spanos)

This follows up on yesterday’s deconstruction:


 Aris Spanos (2012)[i] – Comments on: L. Wasserman “Low Assumptions, High Dimensions (2011)*

I’m happy to play devil’s advocate in commenting on Larry’s very interesting and provocative (in a good way) paper on ‘how recent developments in statistical modeling and inference have [a] changed the intended scope of data analysis, and [b] raised new foundational issues that rendered the ‘older’ foundational problems more or less irrelevant’.

The new intended scope, ‘low assumptions, high dimensions’, is delimited by three characteristics:

“1. The number of parameters is larger than the number of data points.

2. Data can be numbers, images, text, video, manifolds, geometric objects, etc.

3. The model is always wrong. We use models, and they lead to useful insights but the parameters in the model are not meaningful.” (p. 1)

In the discussion that follows I focus almost exclusively on the ‘low assumptions’ component of the new paradigm. The discussion by David F. Hendry (2011), “Empirical Economic Model Discovery and Theory Evaluation,” RMM, 2: 115-145,  is particularly relevant to some of the issues raised by the ‘high dimensions’ component in a way that complements the discussion that follows.

My immediate reaction to the demarcation based on 1-3 is that the new intended scope, although interesting in itself, excludes the overwhelming majority of scientific fields where restriction 3 seems unduly limiting. In my own field of economics the substantive information comes primarily in the form of substantively specified mechanisms (structural models), accompanied with theory-restricted and substantively meaningful parameters.

In addition, I consider the assertion “the model is always wrong” an unhelpful truism when ‘wrong’ is used in the sense that “the model is not an exact picture of the ‘reality’ it aims to capture”. Worse, if ‘wrong’ refers to ‘the data in question could not have been generated by the assumed model’, then any inference based on such a model will be dubious at best! Continue reading

Categories: Philosophy of Statistics, Spanos, Statistics, U-Phil, Wasserman | Tags: , , , , | 5 Comments

Blog at WordPress.com.