Monthly Archives: February 2012

Misspecification Tests: (part 4) and brief concluding remarks

Posted on February 28, 2012 by Mayo

The Nature of the Inferences From Graphical Techniques: What is the status of the learning from graphs? In this view, the graphs afford good ideas about the kinds of violations for which it would be useful to probe, much as looking at a forensic clue (e.g., footprint, tire track) helps to narrow down the search for a given suspect, a fault-tree, for a given cause. The same discernment can be achieved with a formal analysis (with parametric and nonparametric tests), perhaps more discriminating than can be accomplished by even the most trained eye, but the reasoning and the justification are much the same. (The capabilities of these techniques may be checked by simulating data deliberately generated to violate or obey the various assumptions.)

The combined indications from the graphs indicate departures from the LRM in the direction of the DLRM, but only, for the moment, as indicating a fruitful model to probe further. We are not licensed to infer that it is itself a statistically adequate model until its own assumptions are subsequently tested. Even when they are checked and found to hold up – which they happen to be in this case – our inference must still be qualified. While we may infer that the model is statistically adequate – this should be understood only as licensing the use the model as a reliable tool for primary statistical inferences but not necessarily as representing the substantive phenomenon being modeled.

Continue reading →

Categories: Intro MS Testing, Statistics | Tags: Aris Spanos, Duhem's problem, piecemeal inquiry, statistical model, testing model assumptions | 6 Comments

Misspecification Testing: (part 3) Subtracting-out effects “on paper”

Posted on February 27, 2012 by Mayo

A Better Way The traditional approach described in Part 2 did not detect the presence of mean-heterogeneity and so it misidentified temporal dependence as the sole source of misspecification associated with the original LRM.

On the basis of figures 1-3 we can summarize our progress in detecting potential departures from the LRM model assumptions to probe thus far:

	LRM	Alternatives
(D) Distribution:	Normal	?
(M) Dependence:	Independent	?
(H) Heterogeneity:	Identically Distributed	mean-heterogeneity

Discriminating and Amplifying the Effects of Mistakes We could correctly assess dependence if our data were ID and not obscured by the influence of the trending mean. Although, we can not literally manipulate relevant factors, we can ‘subtract out’ the trending mean in a generic way to see what it would be like if there were no trending mean. Here are the detrended x_t and y_t.

Fig. 4: Detrended Population (y – trend )

Continue reading →

Categories: Intro MS Testing, Statistics | Tags: Aris Spanos, DLRM, LRM, misspecification testing | 11 Comments

Misspecification Testing: (part 2) A Fallacy of Error “Fixing”

Posted on February 23, 2012 by Mayo

Part 1 is here.

Graphing t-plots (This is my first experiment with blogging data plots, they have been blown up a bit, so hopefully they are now sufficiently readable).

Here are two plots (t-plots) of the observed data where y_t is the population of the USA in millions, and x_t our “secret” variable, to be revealed later on, both over time (1955-1989).

Fig 1: USA Population (y)

Fig. 2: Secret variable (x)

Figure 3: A typical realization of a NIID process.

Pretty clearly, there are glaring departures from IID when we compare a typical realization of a NIID process, in fig. 3, with the t-plots of the two series in figures 1-2. In particular, both data series show the mean is increasing with time – that is, strong mean-heterogeneity (trending mean).Our recommended next step would be to continue exploring the probabilistic structure of the data in figures 1 and 2 with a view toward thoroughly assessing the validity of the LRM assumptions [1]-[5] (table 1). But first let us take a quick look at the traditional approach for testing assumptions, focusing just on assumption [4] traditionally viewed as error non-autocorrelation: E(u_t,u_s)=0 for t≠s, t,s=1,2,…,n. Continue reading →

Categories: Intro MS Testing, Statistics | Tags: A-C LRM, Aris Spanos, Duhem's problem, Durbin-Watson Test, Graphing, misspecification testing | Leave a comment

Intro to Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)

Posted on February 22, 2012 by Mayo

“This is the kind of cure that kills the patient!”

is the line of Aris Spanos that I most remember from when I first heard him talk about testing assumptions of, and respecifying, statistical models in 1999. (The patient, of course, is the statistical model.) On finishing my book, EGEK 1996, I had been keen to fill its central gaps one of which was fleshing out a crucial piece of the error-statistical framework of learning from error: How to validate the assumptions of statistical models. But the whole problem turned out to be far more philosophically—not to mention technically—challenging than I imagined.I will try (in 3 short posts) to sketch a procedure that I think puts the entire process of model validation on a sound logical footing. Thanks to attending several of Spanos’ seminars (and his patient tutorials, for which I am very grateful), I was eventually able to reflect philosophically on aspects of his already well-worked out approach. (Synergies with the error statistical philosophy, of which this is a part, warrant a separate discussion.)

Continue reading →

Categories: Intro MS Testing, Statistics | Tags: Aris Spanos, Linear regression, LRM, misspecification testing, testing model assumptions | 20 Comments

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”? (Rejected Post Feb 20)

Posted on February 20, 2012 by Mayo

Dear Reader: Not having been at this very long, I don’t know if it’s common for bloggers to collect a pile of rejected posts that one thinks better of before posting. Well, here’s one that belongs up in a “rejected post” page (and will be tucked away soon enough), but since we have so recently posted the Fisher–Neyman–Pearson “triad”, the blog-elders of Elba have twisted my elbow (repeatedly) to share this post, from back in the fall of 2011, London. Sincerely, D. Mayo

Egon Pearson on a Gate (by D. Mayo)

Did you ever consider how some of the colorful exchanges among better-known names in statistical foundations could be the basis for high literary drama in the form of one-act plays (even if appreciated by only 3-7 people in the world)? (Think of the expressionist exchange between Bohr and Heisenberg in Michael Frayn’s play Copenhagen, except here there would be no attempt at all to popularize—only published quotes and closely remembered conversations would be included, with no attempt to create a “story line”.) Somehow I didn’t think so. But rereading some of Savage’s high-flown praise of Birnbaum’s “breakthrough” argument (for the Likelihood Principle) today, I was swept into a “(statistical) theater of the absurd” mindset.

Continue reading →

Categories: Statistics | Tags: Fisher, George Bernard, Likelihood Principle, Neyman, Pearson | 13 Comments

Two New Properties of Mathematical Likelihood

Posted on February 17, 2012 by Mayo

17 February 1890–29 July 1962

Note: I find this to be an intriguing, if perhaps little-known, discussion, long before the conflicts reflected in the three articles (the “triad”) below, Here Fisher links his tests to the Neyman and Pearson lemma in terms of power. I invite your deconstructions/comments.

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

To Thomas Bayes must be given the credit of broaching the problem of using the concepts of mathematical probability in discussing problems of inductive inference, in which we argue from the particular to the general; or, in statistical phraselogy, argue from the sample to the population, from which, ex hypothesi, the sample was drawn. Bayes put forward, with considerable caution, a method by which such problems could be reduced to the form of problems of probability. His method of doing this depended essentially on postulating a priori knowledge, not of the particular population of which our observations form a sample, but of an imaginary population of populations from which this population was regarded as having been drawn at random. Clearly, if we have possession of such a priori knowledge, our problem is not properly an inductive one at all, for the population under discussion is then regarded merely as a particular case of a general type, of which we already possess exact knowledge, and are therefore in a position to draw exact deductive inferences.

Continue reading →

Categories: Likelihood Principle, Statistics | Tags: Bayesianism, induction, Ronald Fisher, significance tests, sufficient statistic, Thomas Bayes | 2 Comments

Guest Blogger. ARIS SPANOS: The Enduring Legacy of R. A. Fisher

Posted on February 15, 2012 by Mayo

By Aris Spanos

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most remarkable, but least recognized, achievement was to initiate the recasting of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

M_θ(x)={f(x;θ); θ∈Θ}; x∈Rⁿ;Θ⊂R^m; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the probabilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily conﬁned to the description of the distributional features of the data in hand using the histogram and the ﬁrst few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x₀:=(x₁,x₂,…,x_n) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Continue reading →

Categories: Statistics | Tags: E S Pearson, Frequentist inference, induction, Jerzy Neyman, Models/Modelling, Ronald Fisher, statistical model | 5 Comments

Guest Blogger. STEPHEN SENN: Fisher’s alternative to the alternative

Posted on February 12, 2012 by Mayo

By: Stephen Senn

This year marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading →

Categories: Statistics | Tags: power, Ronald Fisher, Savage, Statistical hypothesis testing, Stephen Senn | 4 Comments

E.S. PEARSON: Statistical Concepts in Their Relation to Reality

Posted on February 12, 2012 by Mayo

by E.S. PEARSON (1955)

SUMMARY: This paper contains a reply to some criticisms made by Sir Ronald Fisher in his recent article on “Scientific Methods and Scientific Induction”.

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data. We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done. If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955), it is impossible to leave him altogether unanswered.

Continue reading →

Categories: Statistics | Tags: Egon Pearson, George Alfred Barnard, induction, Jerzy Neyman, Ronald Fisher, significance tests, Statistical hypothesis testing | 3 Comments

JERZY NEYMAN: Note on an Article by Sir Ronald Fisher

Posted on February 11, 2012 by Mayo

By Jerzy Neyman (1956)

Summary

(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation. (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible. (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values. The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight. (4) The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

Continue reading →

Categories: Statistics | Tags: Abraham Wald, Egon Pearson, Fiducial inference, Ronald Fisher, significance tests | 2 Comments

R.A.FISHER: Statistical Methods and Scientific Inference

Posted on February 11, 2012 by Mayo

In honor of R.A. Fisher’s birthday this week (Feb 17), in a year that will mark 50 years since his death, we will post the “Triad” exchange between Fisher, Pearson and Neyman, and other guest contributions*

by Sir Ronald Fisher (1955)

SUMMARY

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

“Repeated sampling from the same population”,
Errors of the “second kind”,
“Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

TO CONTINUE READING R. A. FISHER’S PAPER, CLICK HERE.

*If you wish to contribute something in connection to Fisher, send to error@vt.edu

Categories: Statistics | Tags: Abraham Wald, Egon Pearson, fallacies, Jerzy Neyman, power, Ronald Fisher, Statistical hypothesis testing | 6 Comments

Distortions in the Court? (PhilStock Feb 8)

Posted on February 8, 2012 by Mayo

Anyone who trades in biotech stocks knows that the slightest piece of news, rumors of successful /unsuccessful drug trials, upcoming FDA panels, anecdotal side effects, and much, much else, can radically alter a stock price in the space of a few hours. Pre-market, for example, websites are busy disseminating bits of information garnered from anywhere and everywhere, helping to pump or dump biotechs. I think just about every small biotech stock I’ve ever traded has been involved in some kind of lawsuit regarding what the company should have told shareholders during earnings. (Most don’t go very far.) If you ever visit the FDA page, you can find every drug/medical device coming up for considerations, recent letters to the company etc., etc.

Nevertheless, you might be surprised to learn that companies are not required to inform shareholders of news simply because it is likely to be relevant to an investor’s overall cost-benefit analysis in deciding how much the stock is worth, and where its price is likely to move. It’s more minimalist than that. It is only required to provide information which, if not revealed, would render misleading something the company already said.

So for example suppose a drug company M publicly denied any reports claiming a link between its drug Z and effect E, declaring that drug Z had a clean bill of health as regards this risk concern. Having made that statement, the company would then be in violation of the requirements if they did not also reveal information such as: numerous consumers were suing them alleging the untoward effect E from having taken drug Z; several letters had been written to the company from the FDA expressing concern about the number of cases where doctors had reported effect E among patients taking drug Z, still other letters warning company M that they should cease and desist from issuing statements that any alleged links between drug Z and effect E were entirely baseless and unfounded.

Now the information that company M was not revealing did not, and could not, have shown a statistically significant correlation between drug Z and effect E. But failing to reveal this information rendered company M in violation of FDA and stock rules, because of the statements company M already made about drug Z’s clean bill of health regarding this very effect E (along with bullish price projections). Not revealing this information, and the related information in their possession, rendered misleading things the company already said when it comes to information shareholders use in deciding on M’s value.

Pretty obvious, right?

Suppose then that company M is found in violation of this rule. And suppose someone inferred from this that evidence of statistical significance is not required for showing a causal connection between a drug and hazardous side-effects.

Well, to infer that would be like doubly (or perhaps triply) missing the point: the ruling had nothing to do with what’s required to show cause and effect, but only what information a company is required to reveal to its shareholders in order not to mislead them (as regards information that could be of relevance to them in their cost-benefit assessments of the stock’s value and future price).

Secondly, the ruling made it very explicit that it was not making any claim about the actual existence of evidence linking drug Z and effect E: they were only proclaiming that drug company M would be in error, if they claimed they did not violate the rule of disclosure.[i] (Determining whether there is any link between Z and E was an entirely separate matter.)

This is precisely the situation as regards a drug company Matrixx, over the counter cold remedy Zicam, and side effect E: anosmia (loss or diminished sense of smell). It was the focus of lawyer and guest blogger Nathan Schachtman yesterday.

“The potentially fraudulent aspect of Matrixx’s conduct was not that it had “hidden” adverse event reports, but rather that it had adverse event reports and a good deal of additional information, none of which it had disclosed to investors, when at the same time, the company chose to give the investment community particularly bullish projections of future sales.” (Schachtman)

Nevertheless, critics of statistical significance testing wasted no time in declaring that this ruling (which for some inexplicable reason made it to the Supreme Court) just goes to show that statistical significance is not and should not be required to show evidence of a causal link[ii]. (See also my Sept. 26 post). Kadane’s article, which is quite interesting, concludes:

“The fact-based consideration that the Supreme Court endorses is very much in line with the Bayesian decision-theoretic approach that models how to make rational decisions under uncertainty. The presence or absence of statistical significance (in the formal, narrow sense) plays little role in such an analysis. “ (Jay Kadane)

I leave it to interested readers to explore the various ins and outs of the case, which our guest poster has summarized in a much more legally correct fashion.

[i] Company M would certainly be in error, if the reason they claimed not to have violated the rule of disclosure is that the information they did not reveal could not have constituted evidence of a statistically significant link between drug Z and effect E!

[ii] There was a session at the ASA last summer on this, including Kadane, Ziliac, and I don’t know who else (I had to leave prior to it).

Categories: Philosophy of Statistics | Tags: biotech stocks, FDA, Phil Law, Statistical significance testing | 2 Comments

Guest Blogger: Interstitial Doubts About the Matrixx

Posted on February 8, 2012 by Mayo

By: Nathan Schachtman, Esq., PC*

When the Supreme Court decided this case, I knew that some people would try to claim that it was a decision about the irrelevance or unimportance of statistical significance in assessing epidemiologic data. Indeed, the defense lawyers invited this interpretation by trying to connect materiality with causation. Having rejected that connection, the Supreme Court’s holding could address only materiality because causation was never at issue. It is a fundamental mistake to include undecided, immaterial facts as part of a court’s holding or the ratio decidendi of its opinion.

Interstitial Doubts About the Matrixx

Statistics professors are excited that the United States Supreme Court issued an opinion that ostensibly addressed statistical significance. One such example of the excitement is an article, in press, by Joseph B. Kadane, Professor in the Department of Statistics, in Carnegie Mellon University, Pittsburgh, Pennsylvania. See Joseph B. Kadane, “Matrixx v. Siracusano: what do courts mean by ‘statistical significance’?” 11[x] Law, Probability and Risk 1 (2011).

Professor Kadane makes the sensible point that the allegations of adverse events did not admit of an analysis that would imply statistical significance or its absence. Id. at 5. See Schachtman, “The Matrixx – A Comedy of Errors” (April 6, 2011)”; David Kaye, ” Trapped in the Matrixx: The U.S. Supreme Court and the Need for Statistical Significance,” BNA Product Safety and Liability Reporter 1007 (Sept. 12, 2011). Unfortunately, the excitement has obscured Professor Kadane’s interpretation of the Court’s holding, and has led him astray in assessing the importance of the case. Continue reading →

Categories: Statistics | Tags: evidence based policy, Kadane, law and statistics, Matrixx Initiatives Inc. v. Siracusano, Nathan Schachtman, significance tests, Supreme Court | 8 Comments

When Can Risk-Factor Epidemiology Provide Reliable Tests?

Posted on February 7, 2012 by Mayo

A commentator brings up risk factor epidemiology, and while I’m not sure the following very short commentary* by Aris Spanos and I directly deals with his query, Greenland happens to mention Popper, and it might be of interest: “When Can Risk-Factor Epidemiology Provide Reliable Tests?”

Here’s the abstract:

Can we obtain interesting and valuable knowledge from observed associations of the sort described by Greenland and colleagues in their paper on risk factor epidemiology? Greenland argues “yes,” and we agree. However, the really important and difficult questions are when and why. Answering these questions demands a clear understanding of the problems involved when going from observed associations of risk factors to causal hypotheses that account for them. Two main problems are that 1) the observed associations could fail to be genuine; and 2) even if they are genuine, there are many competing causal inferences that can account for them. Although Greenland’s focus is on the latter, both are equally important, and progress here hinges on disentangling the two to a much greater extent than is typically recognized.

* We were commenting on “The Value of Risk-Factor (“Black-Box”) Epidemiology” by Greenland, Sander; Gago-Dominguez, Manuela; Castelao, Jose Esteban full citation & abstract can be found at the link above.

Categories: Statistics | Tags: causal hypotheses, Greenland, risk assessment, risk factor epidemiology | 2 Comments

No-Pain Philosophy (part 3): A more contemporary perspective

Posted on February 3, 2012 by Mayo

See (Part 2)

See (Part 1)

7. How the story turns out (not well)

This conception of testing, which Lakatos called “sophisticated methodological falsificationism,” takes us quite a distance from the more familiar if hackneyed conception of Popper as a simple falsificationist.[i] It called for warranting a host of different methodological rules for each of the steps along the way in order to either falsify or corroborate hypotheses. But it doesn’t end well. Continue reading →

Categories: No-Pain Philosophy, philosophy of science | Tags: Duhem's problem, Lakatos, methodological falsificationism, modus tollens, Popper | 10 Comments

Senn Again (Gelman)

Posted on February 3, 2012 by Mayo

Senn will be glad to see that we haven’t forgotten him! (see this blog Jan. 14, Jan. 15, Jan. 23, and 24, 2012). He’s back on Gelman’s blog today .

http://andrewgelman.com/2012/02/philosophy-of-bayesian-statistics-my-reactions-to-senn/

I hope to hear some reflections this time around on the issue often noted but not discussed: updating and down dating (see this blog, Jan. 26, 2012).

Categories: Philosophy of Statistics, Statistics | Tags: Andrew Gelman, Stephen Senn | Leave a comment

No-Pain Philosophy: Skepticism, Rationality, Popper and All That (part 2): Duhem’s problem & methodological falsification

Posted on February 1, 2012 by Mayo

(See Part 1)

5. Duhemian Problems of Falsification

Any interesting case of hypothesis falsification, or even a severe attempt to falsify, rests on both empirical and inductive hypotheses or claims. Consider the most simplistic form of deductive falsification (an instance of the valid form of modus tollens): “If H entails O, and not-O, then not-H.” (To infer “not-H” is to infer H is false, or, more often, it involves inferring there is some discrepancy in what H claims regarding the phenomenon in question). Continue reading →

Categories: No-Pain Philosophy, philosophy of science | Tags: Duhem's problem, Imre Lakatos, induction, methodological falsificationism, modus tollens | 4 Comments

Reposting from Jan 29: No-Pain Philosophy: Skepticism, Rationality, Popper, and All That: The First of 3 Parts

Posted on February 1, 2012 by Mayo

I want to shift to the arena of testing the adequacy of statistical models and misspecification testing (leading up to articles by Aris Spanos, Andrew Gelman, and David Hendry). But first, a couple of informal, philosophical mini-posts, if only to clarify terms we will need (each has a mini test at the end).

1. How do we obtain Knowledge, and how can we get more of it?

Few people doubt that science is successful and that it makes progress. This remains true for the philosopher of science, despite her tendency to skepticism. By contrast, most of us think we know a lot of things, and that science is one of our best ways of acquiring knowledge. But how do we justify our lack of skepticism? Continue reading →

Categories: philosophy of science | Tags: Popper, probabilism, severe testing, wedge between rationality and skepticism | 3 Comments

Monthly Archives: February 2012

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.