Statistics

RMM-7: Commentary and Response on Senn published: Special Volume on Stat Scie Meets Phil Sci

Dear Reold blogspot typewriterader: My commentary, “How Can We Cultivate Senn’s Ability, Comment on Stephen Senn, ‘You May Believe You are a Bayesian But You’re Probably Wrong’” and Senn’s, “Names and Games, A Reply to Deborah G. Mayo” have been published under the Discussion Section of Rationality, Markets, and Morals.(Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”)

I encourage you to submit your comments/exchanges on any of the papers in this special volume [this is the first].  (Information may be found on their webpage [no longer active 3/21/2021]. Questions/Ideas: please write to me at error@vt.edu.)

Categories: Philosophy of Statistics, Statistics | Tags: | Leave a comment

Blogologue*

Gelman responds on his blog today: “Gelman on Hennig on Gelman on Bayes”.

http://andrewgelman.com/2012/03/gelman-on-hennig-on-gelman-on-bayes/

I invite comments here….

*An ongoing exchange among a group of blogs that remain distinct (just coined)

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , | Leave a comment

U-PHIL: A Further Comment on Gelman by Christian Hennig (UCL, Statistics)

Comment on Gelman’sInduction and Deduction in Bayesian Data Analysis” (RMM)

Dr. Christian Hennig (Senior Lecturer, Department of Statistical Science, University College London)

I have read quite a bit of what Andrew Gelman has written in recent years, including some of his blog. One thing that I find particularly refreshing and important about his approach is that he contrasts the Bayesian and frequentist philosophical conceptions honestly with what happens in the practice of data analysis, which often cannot (or does better not to) proceed according to an inflexible dogmatic book of rules.

I also like the emphasis on the fact that all models are wrong. I personally believe that a good philosophy of statistics should consistently take into account that models are rather tools for thinking than able to “match” reality, and in the vast majority of cases we know clearly that they are wrong (all continuous models are wrong because all observed data are discrete, for a start).

There is, however, one issue on which I find his approach unsatisfactory (or at least not well enough explained), and on which both frequentism and subjective Bayesianism seem superior to me.

Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , | 5 Comments

Lifting a piece from Spanos’ contribution* will usefully add to the mix

The following two sections from Aris Spanos’ contribution to the RMM volume are relevant to the points raised by Gelman (as regards what I am calling the “two slogans”)**.

 6.1 Objectivity in Inference (From Spanos, RMM 2011, pp. 166-7)

The traditional literature seems to suggest that ‘objectivity’ stems from the mere fact that one assumes a statistical model (a likelihood function), enabling one to accommodate highly complex models. Worse, in Bayesian modeling it is often misleadingly claimed that as long as a prior is determined by the assumed statistical model—the so called reference prior—the resulting inference procedures are objective, or at least as objective as the traditional frequentist procedures:

“Any statistical analysis contains a fair number of subjective elements; these include (among others) the data selected, the model assumptions, and the choice of the quantities of interest. Reference analysis may be argued to provide an ‘objective’ Bayesian solution to statistical inference in just the same sense that conventional statistical methods claim to be ‘objective’: in that the solutions only depend on model assumptions and observed data.” (Bernardo 2010, 117)

This claim brings out the unfathomable gap between the notion of ‘objectivity’ as understood in Bayesian statistics, and the error statistical viewpoint. As argued above, there is nothing ‘subjective’ about the choice of the statistical model Mθ(z) because it is chosen with a view to account for the statistical regularities in data z0, and its validity can be objectively assessed using trenchant M-S testing. Model validation, as understood in error statistics, plays a pivotal role in providing an ‘objective scrutiny’ of the reliability of the ensuing inductive procedures.

Continue reading

Categories: Philosophy of Statistics, Statistics, Testing Assumptions, U-Phil | Tags: , , , , | 43 Comments

Mayo, Senn, and Wasserman on Gelman’s RMM** Contribution

Picking up the pieces…

Continuing with our discussion of contributions to the special topic,  Statistical Science and Philosophy of Science in Rationality, Markets and Morals (RMM),* I am pleased to post some comments on Andrew **Gelman’s paper “Induction and Deduction in Bayesian Data Analysis”.  (More comments to follow—as always, feel free to comment.)

Note: March 9, 2012: Gelman has commented to some of our comments on his blog today: http://andrewgelman.com/2012/03/coming-to-agreement-on-philosophy-of-statistics/

D. Mayo

For now, I will limit my own comments to two: First, a fairly uncontroversial point, while Gelman writes that “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive,” a main point of my series (Part 123) of “No-Pain” philosophy was that “deductive” falsification involves inductively inferring a “falsifying hypothesis”.

More importantly, and more challengingly, Gelman claims the view he recommends “corresponds closely to the error-statistics idea of Mayo (1996)”.  Now the idea that non-Bayesian ideas might afford a foundation for strands of Bayesianism is not as implausible as it may seem. On the face of it, any inference to a claim, whether to the adequacy of a model (for a given purpose), or even to a posterior probability, can be said to be warranted just to the extent that the claim has withstood a severe test (i.e, a test that would, at least with reasonable probability, have discerned a flaw with the claim, were it false).  The question is: How well do Gelman’s methods for inferring statistical models satisfy severity criteria?  (I’m not sufficiently familiar with his intended applications to say.)

Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , | 1 Comment

Statistical Science Court?

Nathan Schactman has an interesting blog post onScientific illiteracy among the judiciary”:

February 29th, 2012

Ken Feinberg, speaking at a symposium on mass torts, asks what legal challenges do mass torts confront in the federal courts. The answer seems obvious.

Pharmaceutical cases that warrant federal court multi-district litigation (MDL) treatment typically involve complex scientific and statistical issues. The public deserves having MDL cases assigned to judges who have special experience and competence to preside in cases in which these complex issues predominate. There appears to be no procedural device to ensure that the judges selected in the MDL process have the necessary experience and competence, and a good deal of evidence to suggest that the MDL judges are not up to the task at hand.

In the aftermath of the Supreme Court’s decision in Daubert, the Federal Judicial Center assumed responsibility for producing science and statistics tutorials to help judges grapple with technical issues in their cases. The Center has produced videotaped lectures as well as the Reference Manual on Scientific Evidence, now in its third edition. Despite the Center’s best efforts, many federal judges have shown themselves to be incorrigible. It is time to revive the discussions and debates about implementing a “science court.”

I am intrigued to hear Schachtman revive the old and controversial idea of a “science court”, although it has actually never left, but has come up for debate every few years for the past 35 or 40 years! In the 80s, it was a hot topic in the new “science and values” movement, but I do not think it was ever really put to an adequate experimental test. The controversy directly relates to the whole issue of distinguishing evidential and policy issues (in evidence-based policy), Continue reading
Categories: philosophy of science, PhilStatLaw, Statistics | Tags: , , , , | 2 Comments

Misspecification Tests: (part 4) and brief concluding remarks

The Nature of the Inferences From Graphical Techniques: What is the status of the learning from graphs? In this view, the graphs afford good ideas about the kinds of violations for which it would be useful to probe, much as looking at a forensic clue (e.g., footprint, tire track) helps to narrow down the search for a given suspect, a fault-tree, for a given cause. The same discernment can be achieved with a formal analysis (with parametric and nonparametric tests), perhaps more discriminating than can be accomplished by even the most trained eye, but the reasoning and the justification are much the same. (The capabilities of these techniques may be checked by simulating data deliberately generated to violate or obey the various assumptions.)

The combined indications from the graphs indicate departures from the LRM in the direction of the DLRM, but only, for the moment, as indicating a fruitful model to probe further.  We are not licensed to infer that it is itself a statistically adequate model until its own assumptions are subsequently tested.  Even when they are checked and found to hold up – which they happen to be in this case – our inference must still be qualified. While we may infer that the model is statistically adequate – this should be understood only as licensing the use the model as a reliable tool for primary statistical inferences but not necessarily as representing the substantive phenomenon being modeled.

Continue reading

Categories: Intro MS Testing, Statistics | Tags: , , , , | 6 Comments

Misspecification Testing: (part 3) Subtracting-out effects “on paper”

Nurse chart behind her pink

A Better Way  The traditional approach described in Part 2 did not detect the presence of mean-heterogeneity and so it misidentified temporal dependence as the sole source of misspecification associated with the original LRM.

On the basis of figures 1-3 we can summarize our progress in detecting potential departures from the LRM model assumptions to probe thus far:

LRM Alternatives
(D) Distribution: Normal ?
(M) Dependence: Independent ?
(H) Heterogeneity: Identically Distributed mean-heterogeneity

Discriminating and Amplifying the Effects of Mistakes  We could correctly assess dependence if our data were ID and not obscured by the influence of the trending mean.  Although, we can not literally manipulate relevant factors, we can ‘subtract out’ the trending mean in a generic way to see what it would be like if there were no trending mean. Here are the detrended xt and yt.

 

Fig. 4: Detrended Population (y - trend )

Fig. 4: Detrended Population (y – trend )

Continue reading

Categories: Intro MS Testing, Statistics | Tags: , , , | 11 Comments

Misspecification Testing: (part 2) A Fallacy of Error “Fixing”

mstestingPart2nurse

Part 1 is here.

Graphing t-plots (This is my first experiment with blogging data plots, they have been blown up a bit, so hopefully they are now sufficiently readable).

Here are two plots (t-plots) of the observed data where yt is the population of the USA in millions, and  xt our “secret” variable, to be revealed later on, both over time (1955-1989).

Fig 1: USA Population (y)

Fig 1: USA Population (y)

mstestingPart2Fig2

Fig. 2: Secret variable (x)

mstestingPart2 Fig 3

Figure 3: A typical realization of a NIID process.

Pretty clearly, there are glaring departures from IID when we compare a typical realization of a NIID process,  in fig. 3, with the t-plots of the two series  in figures 1-2.  In particular, both data series show the mean is increasing with time – that is, strong mean-heterogeneity (trending mean).Our recommended next step would be to continue exploring the probabilistic structure of the data in figures 1 and 2  with a view toward thoroughly assessing the validity of the LRM assumptions [1]-[5] (table 1). But first let us take a quick look at the traditional approach for testing assumptions, focusing just on assumption [4] traditionally viewed as error non-autocorrelation: E(ut,us)=0 for t≠s, t,s=1,2,…,n. Continue reading

Categories: Intro MS Testing, Statistics | Tags: , , , , , | Leave a comment

Intro to Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)

 

“This is the kind of cure that kills the patient!”

is the line of Aris Spanos that I most remember from when I first heard him talk about testing assumptions of, and respecifying, statistical models in 1999.  (The patient, of course, is the statistical model.) On finishing my book, EGEK 1996, I had been keen to fill its central gaps one of which was fleshing out a crucial piece of the error-statistical framework of learning from error: How to validate the assumptions of statistical models. But the whole problem turned out to be far more philosophically—not to mention technically—challenging than I imagined.I will try (in 3 short posts) to sketch a procedure that I think puts the entire process of model validation on a sound logical footing.  Thanks to attending several of Spanos’ seminars (and his patient tutorials, for which I am very grateful), I was eventually able to reflect philosophically on aspects of  his already well-worked out approach. (Synergies with the error statistical philosophy, of which this is a part,  warrant a separate discussion.)

Continue reading

Categories: Intro MS Testing, Statistics | Tags: , , , , | 20 Comments

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”? (Rejected Post Feb 20)

Dear Reader: Not having been at this very long, I don’t know if it’s common for bloggers to collect a pile of rejected posts that one thinks better of before posting. Well, here’s one that belongs up in a “rejected post” page (and will be tucked away soon enough), but since we have so recently posted the FisherNeymanPearson “triad”, the blog-elders of Elba have twisted my elbow (repeatedly) to share this post, from back in the fall of 2011, London. Sincerely, D. Mayo

Egon Pearson on a Gate (by D. Mayo)

Did you ever consider how some of the colorful exchanges among better-known names in statistical foundations could be the basis for high literary drama in the form of one-act plays (even if appreciated by only 3-7 people in the world)? (Think of the expressionist exchange between Bohr and Heisenberg in Michael Frayn’s play Copenhagen, except here there would be no attempt at all to popularize—only published quotes and closely remembered conversations would be included, with no attempt to create a “story line”.)  Somehow I didn’t think so. But rereading some of Savage’s high-flown praise of Birnbaum’s “breakthrough” argument (for the Likelihood Principle) today, I was swept into a “(statistical) theater of the absurd” mindset.

Continue reading

Categories: Statistics | Tags: , , , , | 13 Comments

Two New Properties of Mathematical Likelihood

17 February 1890–29 July 1962

Note: I find this to be an intriguing, if perhaps little-known, discussion, long before the conflicts reflected in the three articles (the “triad”) below,  Here Fisher links his tests to the Neyman and Pearson lemma in terms of power.  I invite your deconstructions/comments.

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

      To Thomas Bayes must be given the credit of broaching the problem of using the concepts of mathematical probability in discussing problems of inductive inference, in which we argue from the particular to the general; or, in statistical phraselogy, argue from the sample to the population, from which, ex hypothesi, the sample was drawn.  Bayes put forward, with considerable caution, a method by which such problems could be reduced to the form of problems of probability.  His method of doing this depended essentially on postulating a priori knowledge, not of the particular population of which our observations form a sample, but of an imaginary population of populations from which this population was regarded as having been drawn at random.  Clearly, if we have possession of such a priori knowledge, our problem is not properly an inductive one at all, for the population under discussion is then regarded merely as a particular case of a general type, of which we already possess exact knowledge, and are therefore in a position to draw exact deductive inferences.

Continue reading

Categories: Likelihood Principle, Statistics | Tags: , , , , , | 2 Comments

Guest Blogger. ARIS SPANOS: The Enduring Legacy of R. A. Fisher

By Aris Spanos

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most re­markable, but least recognized, achievement was to initiate the recast­ing of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the proba­bilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily confined to the description of the distributional features of the data in hand using the histogram and the first few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Continue reading

Categories: Statistics | Tags: , , , , , , | 5 Comments

Guest Blogger. STEPHEN SENN: Fisher’s alternative to the alternative

By: Stephen Senn

This year marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading

Categories: Statistics | Tags: , , , , | 4 Comments

E.S. PEARSON: Statistical Concepts in Their Relation to Reality

by E.S. PEARSON (1955)

SUMMARY: This paper contains a reply to some criticisms made by Sir Ronald Fisher in his recent article on “Scientific Methods and Scientific Induction”.

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955), it is impossible to leave him altogether unanswered.

Continue reading

Categories: Statistics | Tags: , , , , , , | 3 Comments

JERZY NEYMAN: Note on an Article by Sir Ronald Fisher

 

By Jerzy Neyman (1956)

Summary

(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation.  (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible.  (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values.  The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight.  (4)  The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

Continue reading

Categories: Statistics | Tags: , , , , | 2 Comments

R.A.FISHER: Statistical Methods and Scientific Inference

In honor of R.A. Fisher’s birthday this week (Feb 17), in a year that will mark 50 years since his death, we will post the “Triad” exchange between  Fisher, Pearson and Neyman, and other guest contributions*

by Sir Ronald Fisher (1955)

SUMMARY

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of  acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

  1. “Repeated sampling from the same population”,
  2. Errors of the “second kind”,
  3. “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

TO CONTINUE READING R. A. FISHER’S  PAPER, CLICK HERE.

*If you wish to contribute something in connection to Fisher, send to error@vt.edu

Categories: Statistics | Tags: , , , , , , | 6 Comments

Guest Blogger: Interstitial Doubts About the Matrixx

By: Nathan Schachtman, Esq., PC*

When the Supreme Court decided this case, I knew that some people would try to claim that it was a decision about the irrelevance or unimportance of statistical significance in assessing epidemiologic data. Indeed, the defense lawyers invited this interpretation by trying to connect materiality with causation. Having rejected that connection, the Supreme Court’s holding could address only materiality because causation was never at issue. It is a fundamental mistake to include undecided, immaterial facts as part of a court’s holding or the ratio decidendi of its opinion.

Interstitial Doubts About the Matrixx 

Statistics professors are excited that the United States Supreme Court issued an opinion that ostensibly addressed statistical significance. One such example of the excitement is an article, in press, by Joseph B. Kadane, Professor in the Department of Statistics, in Carnegie Mellon University, Pittsburgh, Pennsylvania. See Joseph B. Kadane, “Matrixx v. Siracusano: what do courts mean by ‘statistical significance’?” 11[x] Law, Probability and Risk 1 (2011).

Professor Kadane makes the sensible point that the allegations of adverse events did not admit of an analysis that would imply statistical significance or its absence. Id. at 5. See Schachtman, “The Matrixx – A Comedy of Errors” (April 6, 2011)”; David Kaye, ” Trapped in the Matrixx: The U.S. Supreme Court and the Need for Statistical Significance,” BNA Product Safety and Liability Reporter 1007 (Sept. 12, 2011). Unfortunately, the excitement has obscured Professor Kadane’s interpretation of the Court’s holding, and has led him astray in assessing the importance of the case. Continue reading

Categories: Statistics | Tags: , , , , , , | 8 Comments

When Can Risk-Factor Epidemiology Provide Reliable Tests?

A commentator  brings up risk factor epidemiology, and while I’m not sure the following very short commentary* by Aris Spanos and I directly deals with his query, Greenland happens to mention Popper, and it might be of interest: “When Can Risk-Factor Epidemiology Provide Reliable Tests?

Here’s the abstract:

Can we obtain interesting and valuable knowledge from observed associations of the sort described by Greenland and colleagues in their paper on risk factor epidemiology? Greenland argues “yes,” and we agree. However, the really important and difficult questions are when and why. Answering these questions demands a clear understanding of the problems involved when going from observed associations of risk factors to causal hypotheses that account for them. Two main problems are that 1) the observed associations could fail to be genuine; and 2) even if they are genuine, there are many competing causal inferences that can account for them. Although Greenland’s focus is on the latter, both are equally important, and progress here hinges on disentangling the two to a much greater extent than is typically recognized. 

* We were commenting on “The Value of Risk-Factor (“Black-Box”) Epidemiology” by Greenland, Sander; Gago-Dominguez, Manuela; Castelao, Jose Esteban full citation & abstract can be found at the link above.

Categories: Statistics | Tags: , , , | 2 Comments

Senn Again (Gelman)

Senn will be glad to see that we haven’t forgotten him!  (see this blog Jan. 14, Jan. 15,  Jan. 23, and 24, 2012).  He’s back on Gelman’s blog today .

http://andrewgelman.com/2012/02/philosophy-of-bayesian-statistics-my-reactions-to-senn/

I hope to hear some reflections this time around on the issue often noted but not discussed: updating and down dating (see this blog, Jan. 26, 2012).

Categories: Philosophy of Statistics, Statistics | Tags: , | Leave a comment

Blog at WordPress.com.