Author Archives: Mayo

A. Spanos: Egon Pearson’s Neglected Contributions to Statistics

Continuing with the discussion of E.S. Pearson:

Egon Pearson’s Neglected Contributions to Statistics

by Aris Spanos

    Egon Pearson (11 August 1895 – 12 June 1980), is widely known today for his contribution in recasting of Fisher’s significance testing into the Neyman-Pearson (1933) theory of hypothesis testing. Occasionally, he is also credited with contributions in promoting statistical methods in industry and in the history of modern statistics; see Bartlett (1981). What is rarely mentioned is Egon’s early pioneering work on:

(i) specification: the need to state explicitly the inductive premises of one’s inferences,

(ii) robustness: evaluating the ‘sensitivity’ of inferential procedures to departures from the Normality assumption, as well as

(iii) Mis-Specification (M-S) testing: probing for potential departures from the Normality  assumption. Continue reading

Categories: Philosophy of Statistics, Statistics | Tags: , , , , , , , | 6 Comments

E.S. Pearson’s Statistical Philosophy

E.S. Pearson on the gate,
D. Mayo sketch

Egon Sharpe (E.S.) Pearson’s birthday was August 11.  This slightly belated birthday discussion is directly connected to the question of the uses to which frequentist methods may be put in inquiry.  Are they limited to supplying procedures which will not err too frequently in some vast long run? Or are these long run results of crucial importance for understanding and learning about the underlying causes in the case at hand?   I say no to the former and yes to the latter.  This was also the view of Egon Pearson (of Neyman and Pearson).

(i) Cases of Type A and Type B

“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)

Pearson considers the rationale that might be given to N-P tests in two types of cases, A and B:

“(A) At one extreme we have the case where repeated decisions must be made on results obtained from some routine procedure…

(B) At the other is the situation where statistical tools are applied to an isolated investigation of considerable importance…?” (ibid., 170)

In cases of type A, long-run results are clearly of interest, while in cases of type B, repetition is impossible and may be irrelevant:

“In other and, no doubt, more numerous cases there is no repetition of the same type of trial or experiment, but all the same we can and many of us do use the same test rules to guide our decision, following the analysis of an isolated set of numerical data. Why do we do this? What are the springs of decision? Is it because the formulation of the case in terms of hypothetical repetition helps to that clarity of view needed for sound judgment? Continue reading

Categories: Philosophy of Statistics, Statistics | Tags: , , | 2 Comments

Good Scientist Badge of Approval?

In an attempt to fix the problem of “unreal” results in science some have started a “reproducibility initiative”. Think of the incentive for being explicit about how the results were obtained the first time….But would researchers really pay to have their potential errors unearthed in this way?  Even for a “good scientist” badge of approval?

August 14, 2012

Fixing Science’s Problem of ‘Unreal’ Results: “Good Scientist: You Get a Badge!”

Carl Zimmer, Slate

As a young biologist, Elizabeth Iorns did what all young biologists do: She looked around for something interesting to investigate. Having earned a Ph.D. in cancer biology in 2007, she was intrigued by a paper that appeared the following year in Nature. Biologists at the University of California-Berkeley linked a gene called SATB1 to cancer. They found that it becomes unusually active in cancer cells and that switching it on in ordinary cells made them cancerous. The flipside proved true, too: Shutting down SATB1 in cancer cells returned them to normal. The results raised the exciting possibility that SATB1 could open up a cure for cancer. So Iorns decided to build on the research.

There was just one problem. As her first step, Iorns tried replicate the original study. She couldn’t. Boosting SATB1 didn’t make cells cancerous, and shutting it down didn’t make the cancer cells normal again.

For some years now, scientists have gotten increasingly worried about replication failures. In one recent example, NASA made a headline-grabbing announcement in 2010 that scientists had found bacteria that could live on arsenic—a finding that would require biology textbooks to be rewritten. At the time, many experts condemned the paper as a poor piece of science that shouldn’t have been published. This July, two teams of scientists reported that they couldn’t replicate the results. Continue reading

Categories: philosophy of science, Philosophy of Statistics | Tags: , , , | 12 Comments

U-Phil: (concluding the deconstruction) Wasserman / Mayo

It is traditional to end the U-Phil deconstruction discussion with the author’s remarks on the deconstruction itself.  I take this from Wasserman’s initial comment on 7/28/12, and my brief reply. I especially want to highlight the question of goals that arises.

Wasserman:

I thank Deborah Mayo for deconstructing me and Al Franken. (And for the record, I couldn’t be further from Franken politically; I just liked his joke.)

I have never been deconstructed before. I feel a bit like Humpty Dumpty. Anyway, I think I agree with everything Deborah wrote. I’ll just clarify two points.

First, my main point was just that the cutting edge of statistics today is dealing with complex, high-dimensional data. My essay was an invitation to Philosophers to turn their analytical skills towards the problems that arise in these modern statistical problems.

Deborah wonders whether these are technical rather than foundational issues. I don’t know. When physicists went from studying medium sized, slow-moving objects to studying the very small, the very fast and the very massive, they found a plethora of interesting questions, both technical and foundational. Perhaps inference for high-dimensional, complex data can also serve as a venue for both both technical and foundational questions.

Second, I downplayed the Bayes-Frequentist perhaps more than I should have. Indeed, this debate still persists. But I also feel that only a small subset of statisticians care about the debate (because, they do what they were taught to do, without questioning it) and those that do care, will never be swayed by debate. The way I see it is that there are basically two goals:

  • Goal 1: Find ways to quantify your subjective degrees of belief.
  • Goal 2: Find procedures with good frequency properties. Continue reading
Categories: Statistics | Tags: , , , , | Leave a comment

U-PHIL: Wasserman Replies to Spanos and Hennig

Wasserman on Spanos and Hennig on  “Low Assumptions, High Dimensions” (2011)

(originating U-PHIL : “Deconstructing Larry Wasserman” by Mayo )

________

Thanks to Aris and others for comments .

Response to Aris Spanos:

1. You don’t prefer methods based on weak assumptions? Really? I suspect Aris is trying to be provocative. Yes such inferences can be less precise. Good. Accuracy is an illusion if it comes from assumptions, not from data.

2. I do not think I was promoting inferences based on “asymptotic grounds.” If I did, that was not my intent. I want finite sample, distribution free methods. As an example, consider the usual finite sample (order statistics based) confidence interval for the median. No regularity assumptions, no asymptotics, no approximations. What is there to object to?

3. Indeed, I do have to make some assumptions. For simplicity, and because it is often reasonable, I assumed iid in the paper (as I will here). Other than that, where am I making any untestable assumptions in the example of the median?

4. I gave a very terse and incomplete summary of Davies’ work. I urge readers to look at Davies’ papers; my summary does not do the work justice. He certainly did not advocate eyeballing the data. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 3 Comments

E.S. Pearson Birthday

Egon Pearson on a Gate (by D. Mayo)

Today is Egon Pearson’s birthday, but I will postpone some discussion of his work for a few days. He is, as Erich Lehmann noted in his review of EGEK (1996)[i]*, “the hero of Mayo’s story” because one may find throughout his work, if only in side discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of Neyman-Pearson theory of statistics.  Pearson and Pearson statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect.[i]


[i] Mayo (1996), Error and the Growth of Experimental Knowledge.

*If you have items relating to E.S. Pearson you think might be relevant for this blog, please send them to: error@vt.edu until the end of August.

Categories: Statistics | Tags: , , | Leave a comment

U-PHIL: Hennig and Gelman on Wasserman (2011)

Two further contributions in relation to

Low Assumptions, High Dimensions” (2011)

Please also see : “Deconstructing Larry Wasserman” by Mayo, and Comments by Spanos

Christian Hennig:  Some comments on Larry Wasserman, “Low Assumptions, High Dimensions”

I enjoyed reading this stimulating paper. These are very important issues indeed. I’ll comment on both main concepts in the text.

1) Low Assumptions. I think that the term “assumption” is routinely misused and misunderstood in statistics. In Wasserman’s paper I can’t see such misuse explicitly, but I think that the “message” of the paper may be easily misunderstood because Wasserman doesn’t do much to stop people from this kind of misunderstanding.

Here is what I mean. The arithmetic mean can be derived as optimal estimator under an i.i.d. Gaussian model, which is often interpreted as “model assumption” behind it. However, we don’t really need the Gaussian distribution to be true for the mean to do a good job. Sometimes the mean will do a bad job in a non-Gaussian situation (for example in presence of gross outliers), but sometimes not. The median has nice robustness properties and is seen as admissible for ordinal data. It is therefore usually associated with “weaker assumptions”. However, the median may be worse than the mean in a situation where the Gaussian “assumption” of the mean is grossly violated. At UCL we ask students on a -2/-1/0/1/2 Likert scale for their general opinion about our courses. The distributions that we get here are strongly discrete and the scale is usually interpreted as of ordinal type. Still, for ranking courses, the median is fairly useless (pretty much all courses end up with a median of 0 or 1); whereas, the arithmetic mean can still detect statistically significant meaningful differences between courses.

Why? Because it’s not only the “official” model assumptions that matter but also whether a statistic uses all the data in an appropriate manner for the given application. Here it’s fatal that the median ignores all differences among observations north and south of it. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 3 Comments

U-PHIL: Aris Spanos on Larry Wasserman

Our first outgrowth of “Deconstructing Larry Wasserman”. 

Aris Spanos – Comments on:

Low Assumptions, High Dimensions” (2011)

by Larry Wasserman*

I’m happy to play devil’s advocate in commenting on Larry’s very interesting and provocative (in a good way) paper on ‘how recent developments in statistical modeling and inference have [a] changed the intended scope of data analysis, and [b] raised new foundational issues that rendered the ‘older’ foundational problems more or less irrelevant’.

The new intended scope, ‘low assumptions, high dimensions’, is delimited by three characteristics:

“1. The number of parameters is larger than the number of data points.

2. Data can be numbers, images, text, video, manifolds, geometric objects, etc.

3. The model is always wrong. We use models, and they lead to useful insights but the parameters in the model are not meaningful.” (p. 1)

In the discussion that follows I focus almost exclusively on the ‘low assumptions’ component of the new paradigm. The discussion by David F. Hendry (2011), “Empirical Economic Model Discovery and Theory Evaluation,” RMM, 2: 115-145,  is particularly relevant to some of the issues raised by the ‘high dimensions’ component in a way that complements the discussion that follows.

My immediate reaction to the demarcation based on 1-3 is that the new intended scope, although interesting in itself, excludes the overwhelming majority of scientific fields where restriction 3 seems unduly limiting. In my own field of economics the substantive information comes primarily in the form of substantively specified mechanisms (structural models), accompanied with theory-restricted and substantively meaningful parameters.

In addition, I consider the assertion “the model is always wrong” an unhelpful truism when ‘wrong’ is used in the sense that “the model is not an exact picture of the ‘reality’ it aims to capture”. Worse, if ‘wrong’ refers to ‘the data in question could not have been generated by the assumed model’, then any inference based on such a model will be dubious at best! Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 7 Comments

Bad news bears: Bayesian rejoinder

This continues yesterday’s post: I checked out the the” xtranormal” http://www.xtranormal.com/ website. Turns out there are other figures aside from the bears that one may hire out, but they pronounce “Bayesian” as an unrecognizable, foreign-sounding word with around five syllables. Anyway, before taking the plunge, here is my first attempt, just off the top of my head. Please send corrections and additions.

Bear #1: Do you have the results of the study?

Bear #2:Yes. The good news is there is a .996 probability of a positive difference in the main comparison.

Bear #1: Great. So I can be well assured that there is just a .004 probability that such positive results would occur if they were merely due to chance.

Bear #2: Not really, that would be an incorrect interpretation.

Bear #1: Oh. I see. Then you must mean 99.6% of the time a smaller difference would have been observed if in fact the null hypothesis of “no effect” was true.

Bear #2: No, that would also be an incorrect interpretation.

Bear #1: Well then you must be saying it is rational to believe to degree .996 that there is a real difference?

Bear #2: It depends. That might be so if the prior probability distribution was a proper probabilistic distribution representing rational beliefs in the different possible parameter values independent of the data.

Bear #1: But I was assured that this would be a nonsubjective Bayesian analysis.

Bear #2: Yes, the prior would at most have had the more important parameters elicited from experts in the field, the remainder being a product of one of the default or conjugate priors.

Bear #1: Well which one was used in this study? Continue reading

Categories: Statistics | Tags: , , | 20 Comments

A “Bayesian Bear” rejoinder practically writes itself…

These stilted bear figures and their voices are sufficiently obnoxious in their own right, even without the tedious lampooning of p-values and the feigned horror at learning they should not be reported as posterior probabilities. Coincidentally, I have been sent several different p-value U-Tube clips in the past two weeks, rehearsing essentially the same interpretive issues, but this one (“what the p-value”*) was created by some freebee outfit that will apparently set their irritating cartoon bear voices to your very own dialogue (I don’t know the website or outfit.)

The presumption is that somehow there would be no questions or confusion of interpretation were the output in the form of a posterior probability. The problem of indicating the extent of discrepancies that are/are not warranted by a given p-value is genuine but easy enough to solve**. What I never understand is why it is presupposed that the most natural and unequivocal way to interpret and communicate evidence (in this case, leading to low p-values) is by means of a (posterior) probability assignment, when it seems clear that the more relevant question the testy-voiced (“just wait a tick”) bear would put to the know-it-all bear would be: how often would this method erroneously declare a genuine discrepancy? A corresponding “Bayesian bear” video practically writes itself, but I’ll let you watch this first. Share any narrative lines that come to mind.

*Reference: Blume, J. and J. F. Peipert (2003). “What your statistician never told you about P-values.” J Am Assoc Gynecol Laparosc 10(4): 439-444.

**See for example, Mayo & Spanos (2011) ERROR STATISTICS

Categories: Statistics | Tags: , , , | 6 Comments

Stephen Senn: Fooling the Patient: an Unethical Use of Placebo? (Phil/Stat/Med)

Senn in China

Stephen Senn
Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg

I think the placebo gets a bad press with ethicists. Many do not seem to understand that the only purpose of a placebo as a control in a randomised clinical trial is to permit the trial to be run as double-blind. A common error is to assume that the giving of a placebo implies the withholding of a known effective treatment. In fact many placebo controlled trials are ‘add-on’ trials in which all patients get proven (partially) effective treatment. We can refer to such treatment as standard common background therapy.  In addition, one group gets an unproven experimental treatment and the other a placebo. Used in this way in a randomised clinical trial, the placebo can be a very useful way to increase the precision of our inferences.

A control group helps eliminate many biases: trend effects affecting the patients, local variations in illness, trend effects in assays and regression to the mean. But such biases could be eliminated by having a group given nothing (apart from the standard common background therapy). Only a placebo, however, can allow patients and physicians to be uncertain whether the experimental treatment is being given or not. And ‘blinding’ or ‘masking’ can play a valuable role in eliminating that bias which is due to either expectation of efficacy or fear of side-effects.

However, there is one use of placebo I consider unethical. In many clinical trials a so-called ‘placebo run-in’ is used. That is to say, there is a period after patients are enrolled in the trial and before they are randomised to one of the treatment groups when all of the patients are given a placebo.  The reasons can be to stabilise the patients or to screen out those who are poor compliers before the trial proper begins. Indeed, the FDA encourages this use of placebo and, for example, in a 2008 guideline on developing drugs for Diabetes advises:  ‘In addition, placebo run-in periods in phase 3 studies can help screen out noncompliant subjects’. Continue reading

Categories: Statistics | Tags: , , , , | 10 Comments

What’s in a Name? (Gelman’s blog)

I just noticed Andrew Gelman’s blog today. ..too good to let pass without quick comment: He asks:

What is a Bayesian?

Deborah Mayo recommended that I consider coming up with a new name for the statistical methods that I used, given that the term “Bayesian” has all sorts of associations that I dislike (as discussed, for example, in section 1 of this article).

I replied that I agree on Bayesian, I never liked the term and always wanted something better, but I couldn’t think of any convenient alternative. Also, I was finding that Bayesians (even the Bayesians I disagreed with) were reading my research articles, while non-Bayesians were simply ignoring them. So I thought it was best to identify with, and communicate with, those people who were willing to engage with me.

More formally, I’m happy defining “Bayesian” as “using inference from the posterior distribution, p(theta|y)”. This says nothing about where the probability distributions come from (thus, no requirement to be “subjective” or “objective”) and it says nothing about the models (thus, no requirement to use the discrete models that have been favored by the Bayesian model selection crew). Based on my minimal definition, I’m as Bayesian as anyone else.

He may be “as Bayesian as anyone else,” but does he really want to be as Bayesian as anyone? (slight, deliberate equivocation). As a good Popperian, I concur (with Popper), that names should not matter, but Gelman’s remarks suggest he should distinguish himself, at least philosophically[i].

As in note [iv] of my Wasserman deconstruction: “Even where Bayesian methods are usefully applied, some say ‘most of the standard philosophy of Bayes is wrong’ (Gelman and Shalizi 2012, 2 n2)”.

In the paper Gelman today cites (from our RMM collection):

… we see science—and applied statistics—as resolving anomalies via the creation of improved models which of- ten include their predecessors as special cases. This view corresponds closely to the error-statistics idea of Mayo (1996). (Gelman 2011, 70)

If the foundations for these methods are error statistical, then shouldn’t that come out in the description? (error-statistical Bayes?) It seems sufficiently novel to warrant some greater gesture, than ‘this too is Bayesian’.)

In that spirit I ended my deconstruction with the passage:

Ironically many seem prepared to allow that Bayesianism still gets it right for epistemology, even as statistical practice calls for methods more closely aligned with frequentist principles. What I would like the reader to consider is that what is right for epistemology is also what is right for statistical learning in practice. That is, statistical inference in practice deserves its own epistemology. (Mayo,  2011p. 100)

What do people think?


[i] To Gelman’s credit, he is one of the few contemporary statisticians to (openly) recognize the potential value of philosophy of statistics for statistical practice!

Categories: Statistics | Tags: , , , | 2 Comments

U-PHIL: Deconstructing Larry Wasserman

Deconstructing [i] Larry Wasserman

The temptation is strong, but I shall refrain from using the whole post to deconstruct Al Franken’s 2003 quip about media bias (from Lies and Lying Liars Who Tell Them: A Fair and Balanced Look at the Right), with which Larry Wasserman begins his paper “Low Assumptions, High Dimensions” (2011) in his contribution to Rationality, Markets and Morals (RMM) Special Topic: Statistical Science and Philosophy of Science:

Wasserman: There is a joke about media bias from the comedian Al Franken:
‘To make the argument that the media has a left- or right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?’

According to Wasserman, “a similar comment could be applied to the usual debates in the foundations of statistical inference.”

Although it’s not altogether clear what Wasserman means by his analogy with comedian (now senator) Franken, it’s clear enough what Franken meant if we follow up the quip with the next sentence in his text (which Wasserman omits): “The problem with al Qaeda is that they’re trying to kill us!” (p. 1). The rest of Franken’s opening chapter is not about al Qaeda but about bias in media. Conservatives, he says, decry what they claim is a liberal bias in mainstream media. Franken rejects their claim.

The mainstream media does not have a liberal bias. And for all their other biases . . . , the mainstream media . . . at least try to be fair. …There is, however, a right-wing media. . . . They are biased. And they have an agenda…The members of the right-wing media are not interested in conveying the truth… . They are an indispensable component of the right-wing machine that has taken over our country… .   We have to be vigilant.  And we have to be more than vigilant.  We have to fight back… . Let’s call them what they are: liars. Lying, lying, liars. (Franken, pp. 3-4)

When I read this in 2004 (when Bush was in office), I couldn’t have agreed more. How things change*. Now, of course, any argument that swerves from the politically correct is by definition unsound, irrelevant, and/ or biased. [ii]

But what does this have to do with Bayesian-frequentist foundations? What is Wasserman, deep down, really trying to tell us by way of this analogy (if only subliminally)? Such are my ponderings—and thus this deconstruction.  (I will invite your “U-Phils” at the end.) I will allude to passages from my contribution to  RMM (2011) (in red).

A.What Is the Foundational Issue?

Wasserman: To me, the most pressing foundational question is: how do we reconcile the two most powerful needs in modern statistics: the need to make methods assumption free and the need to make methods work in high dimensions… . The Bayes-Frequentist debate is not irrelevant but it is not as central as it once was. (p. 201)

One may wonder why he calls this a foundational issue, as opposed to, say, a technical one. I will assume he means what he says and attempt to extract his meaning by looking through a foundational lens.

Let us examine the urgency of reconciling the need to make methods assumption-free and that of making them work in complex high dimensions. The problem of assumptions of course arises when they are made about unknowns that can introduce threats of error and/or misuse of methods. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , | 21 Comments

P-values as Frequentist Measures

Working on the last two chapters of my book on philosophy of statistical inference, I’m revisiting such topics as weak conditioning, Birnbaum, likelihood principle, etc., and was reading from the Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer (1985)[i]. In a paper I had not seen (or had forgotten), Jim Berger “The Frequentist Viewpoint and Conditioning,” writes that the quoting of a P-value “may be felt to be a frequentist procedure by some, since it involves an averaging over the sample space. The reporting of P-values can be given no long-run frequency interpretation [in any of the set-ups generally considered].  A P-value actually lies closer to conditional (Bayesian) measures than to frequentist measures.” (Berger 1985, 23). These views are echoed in Berger’s more recent “Could Fisher,Jeffreys and Neyman Have Agreed on Testing?”(2003). This is at odds with what Fisher, N-P, Cox, Lehmann, etc. have held, and if true, would also seem to entail that a severity assessment had no frequentist interpretation!  The flaw lies in that all-too-common behavioristic, predesignated conception…

Among related posts:

https://errorstatistics.com/2012/04/28/3671/
https://errorstatistics.com/2012/05/10/excerpts-from-s-senns-letter-on-replication-p-values-and-evidence/

 


[i] Also because of Peter Gruenwald’s recent mention of Kiefer’s work, read long ago.

Categories: Statistics | Tags: , , , | Leave a comment

Clark Glymour: The Theory of Search Is the Economics of Discovery (part 2)

“Some Thoughts Prompted by David Hendry’s Essay * (RMM) Special Topic: Statistical Science and Philosophy of Science,” by  Professor Clark Glymour

Part 2 (of 2) (Please begin with part 1)

The first thing one wants to know about a search method is what it is searching for, what would count as getting it right. One might want to estimate a probability distribution, or get correct forecasts of some probabilistic function of the distribution (e.g., out-of-sample means), or a causal structure, or some probabilistic function of the distribution resulting from some class of interventions.  Secondly, one wants to know about what decision theorists call a loss function, but less precisely, what is the comparative importance of various errors of measurement, or, in other terms, what makes some approximations better than others. Third, one wants a limiting consistency proof: sufficient conditions for the search to reach the goal in the large sample limit. There are various kinds of consistency—pointwise versus uniform for example—and one wants to know which of those, if any, hold for a search method under what assumptions about the hypothesis space and the sampling distribution. Fourth, one wants to know as much as possible about the behavior of the search method on finite samples. In simple cases of statistical estimation there are analytic results; more often for search methods only simulation results are possible, but if so, one wants them to explore the bounds of failure, not just easy cases. And, of course, one wants a rationale for limiting the search space, as well as, some sense of how wrong the search can be if those limits are violated in various ways.

There are other important economic features of search procedures. Probability distributions (or likelihood functions) can instantiate any number of constraints—vanishing partial correlations for example, or inequalities of correlations. Suppose the hypothesis space delimits some big class of probability distributions. Suppose the search proceeds by testing constraints (the points that follow apply as well if the procedure computes posterior probabilities for particular hypotheses and applies a decision rule.) There is a natural partial ordering of classes of constraints: B is weaker than A if and only if every distribution that satisfies class A satisfies class B.  Other things equal, a weakest class might be preferred because it requires fewer tests.  But more important is what the test of a constraint does in efficiently guiding the search. A test that eliminates a particular hypothesis is not much help. A test that eliminates a big class of hypotheses is a lot of help.

Other factors: the power of the requisite tests; the numbers of tests (or posterior probability assessments) required; the computational requirements of individual tests (or posterior probability assessments.) And so on.  And, finally, search algorithms have varying degrees of generality. For example, there are general algorithms, such as the widely used PC search algorithm for graphical causal models, that are essentially search schema: stick in whatever decision procedure for conditional independence and PC becomes a search procedure using that conditional independence oracle. By contrast, some searches are so embedded in a particular hypothesis space that it is difficult to see the generality.

I am sure I am not qualified to comment on the details of Hendry’s search procedure, and even if I were, for reasons of space his presentation is too compressed for that. Still, I can make some general remarks.  I do not know from his essay the answers to many of the questions pertinent to evaluating a search procedure that I raised above. For example, his success criterion is “congruence” and I have no idea what that is. That is likely my fault, since I have read only one of his books, and that long ago.

David Hendry dismisses “priors,” meaning, I think, Bayesian methods, with an argument from language acquisition. Kids don’t need priors to learn a language. I am not sure of Hendry’s logic.  Particular grammars within a parametric “universal grammar” could in principle be learned by a Bayesian procedure, although I have no reason to think they are. But one way or the other, that has no import for whether Bayesian procedures are the most advantageous for various search problems by any of the criteria I have noted above. Sometimes they may be, sometimes not, there is no uniform answer, in part because computational requirements vary. I could give examples, but space forbids.

Abstractly, one could think there are two possible ways of searching when the set of relationships to be uncovered may form a complex web: start by positing all possible relationships and eliminate from there, or start by positing no relationships and build up.  Hendry dismisses the latter, with what generality I do not know. What I do know is that the relations between “bottom-up” and “top-down” or “forward” and “backward” search can be intricate, and in some cases one may need both for consistency.  Sometimes either will do. Graphical models, for example can be searched starting with the assumption that every variable influences every other and eliminating, or starting with the assumption that no variable influences any other and adding.  There are pointwise consistent searches in both directions. The real difference is in complexity.

Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 11 Comments

Clark Glymour: The Theory of Search Is the Economics of Discovery (part 1)

The Theory of Search Is the Economics of Discovery:
Some Thoughts Prompted by Sir David Hendry’s Essay  *
in Rationality, Markets and Morals (RMM) Special Topic:
Statistical Science and Philosophy of Science

Part 1 (of 2)

Professor Clark Glymour

Alumni University Professor
Department of Philosophy[i]
Carnegie Mellon University

Professor Hendry* endorses a distinction between the “context of discovery” and the “context of evaluation” which he attributes to Herschel and to Popper and could as well have attributed also to Reichenbach and to most contemporary methodological commentators in the social sciences. The “context” distinction codes two theses.

1.“Discovery” is a mysterious psychological process of generating hypotheses; “evaluation” is about the less mysterious process of warranting them.

2. Of the three possible relations with data that could conceivably warrant a hypothesis—how it was generated, its explanatory connections with the data used to generate it, and its predictions—only the last counts.

Einstein maintained the first but not the second. Popper maintained the first but that nothing warrants a hypothesis.  Hendry seems to maintain neither–he has a method for discovery in econometrics, a search procedure briefly summarized in the second part of his essay, which is not evaluated by forecasts. Methods may be esoteric but they are not mysterious. And yet Hendry endorses the distinction. Let’s consider it.

As a general principle rather than a series of anecdotes, the distinction between discovery and justification or evaluation has never been clear and what has been said in its favor of its implied theses has not made much sense, ever. Let’s start with the father of one of Hendry’s endorsers, William Herschel. William Herschel discovered Uranus, or something. Actually, the discovery of the planet Uranus was a collective effort with, subject to vicissitudes of error and individual opinion, was a rational search strategy. On March 13, 1781, in the course of a sky survey for double stars Hershel reports in his journal the observation of a “nebulous star or perhaps a comet.”  The object came to his notice how it appeared through the telescope, perhaps the appearance of a disc. Herschel changed the magnification of his telescope, and finding that the brightness of the object changed more than the brightness of fixed stars, concluded he had seen a comet or “nebulous star.”  Observations that, on later nights, it had moved eliminated the “nebulous star” alternative and Herschel concluded that he had seen a comet. Why not a planet? Because lots of comets had been hitherto observed—Edmund Halley computed orbits for half a dozen including his eponymous comet—but never a planet.  A comet was much the more likely on frequency grounds. Further, Herschel had made a large error in his estimate of the distance of the body based on parallax values using his micrometer.  A planet could not be so close.

Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics, U-Phil | Tags: , , , | 1 Comment

“Always the last place you look!”

“Always the last place you look!”

This gets to a distinction I have tried to articulate, between explaining a known effect (like looking for a known object), and searching for an unknown effect (that may well not exist). In the latter, possible effects of “selection” or searching need to be taken account of. Of course, searching for the Higgs is akin to the latter, not the former, hence the joke in the recent New Yorker cartoon.

Categories: philosophy of science, Statistics | Tags: , , | 20 Comments

New Kvetch Posted 7/18/12

New Kvetch

Categories: Uncategorized | Tags: , , | 1 Comment

Peter Grünwald: Follow-up on Cherkassky’s Comments

Peter Grünwald

Peter Grünwald

A comment from Professor Peter Grünwald

Head, Information-theoretic Learning Group, Centrum voor Wiskunde en Informatica (CWI)
Part-time full professor  at Leiden University.

This is a follow-up on Vladimir Cherkassky’s comments on Deborah’s blog. First of all let me thank Vladimir for taking the time to clarify his position. Still, there’s one issue where we disagree and which, at the same time, I think, needs clarification, so I decided to write this follow-up.[related posts 1]

The issue is about how central VC (Vapnik-Chervonenkis)-theory is to inductive inference.

I agree with Vladimir that VC-theory is one of the most important achievements in the field ever, and indeed, that it fundamentally changed our way of thinking about learning from data. Yet I also think that there are many problems of inductive inference to which it has no direct bearing. Some of these are concerned with hypothesis testing, but even when one is concerned with prediction accuracy – which Vladimir considers the basic goal – there are situations where I do not see how it plays a direct role. One of these is sequential prediction with log-loss or its generalization, Cover’s loss. This loss function plays a fundamental role in (1) language modeling, (2) on-line data compression, (3a) gambling and (3b) sequential investment on the stock market (here we need Cover’s loss). [a superquick intro to log-loss as well as some references are given below under [A]; see also my talk at the Ockham workshop (slides 16-26 about weather forecasting!) )

Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , , | 16 Comments

Deconstructing Larry Wasserman–it starts like this…

In my July 8, 2012 post “Metablog: Up and Coming,” I wrote: “I will attempt a (daring) deconstruction of Professor Wasserman’s paper[i] and at that time will invite your “U-Phils” for posting around a week after (<1000 words).” These could reflect on Wasserman’s paper and/or my deconstruction of it. See an earlier post for the way we are using “deconstructing” here. For some guides, see “so you want to do a philosophical analysis“.

So my Wasserman deconstruction notes have been sitting in the “draft” version of this blog for several days as we focused on other things.  Here’s how it starts…

             Deconstructing Larry Wasserman–it starts like this…

1.Al Franken’s Joke

The temptation is strong, but I shall refrain from using the whole post to deconstruct Al Franken’s 2003 quip about media bias (from Lies and Lying Liars Who Tell Them: A Fair and Balanced Look at the Right), with which Larry Wasserman begins his paper “Low Assumptions, High Dimensions” (2011):

To make the argument that the media has a left- or right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?

According to Wasserman, “a similar comment could be applied to the usual debates in the foundations of statistical inference.”

Although it’s not altogether clear what Wasserman means by his analogy with comedian (now senator) Franken, it’s clear enough what Franken means if we follow up the quip with the next sentence in his text (which Wasserman omits): “The problem with al Qaeda is that they’re trying to kill us!” (p. 1) The rest of Franken’s opening chapter is not about al Qaeda but about bias in media.

But what does this have to do with the usual debates in the foundations of statistical inference? What is Wasserman, deep down, perhaps unconsciously, really, really, possibly implicitly, trying to tell us by way of this analogy? Such are the ponderings in my deconstruction of him…

Yet the footnote to my July 8 blog also said that my post assumed ” I don’t chicken out”.  So I will put it aside until I get a chorus of encouragement to post it…

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , | 5 Comments

Blog at WordPress.com.