An argument that assumes the very thing that was to have been argued for is guilty of begging the question; signing on to an argument whose conclusion you favor even though you cannot defend its premises is to argue unsoundly, and in bad faith. When a whirlpool of “reforms” subliminally alter the nature and goals of a method, falling into these sins can be quite inadvertent. Start with a simple point on defining the power of a statistical test.
I. Redefine Power?
Given that power is one of the most confused concepts from Neyman-Pearson (N-P) frequentist testing, it’s troubling that in “Redefine Statistical Significance”, power gets redefined too. “Power,” we’re told, is a Bayes Factor BF “obtained by defining H1 as putting ½ probability on μ = ± m for the value of m that gives 75% power for the test of size α = 0.05. This H1 represents an effect size typical of that which is implicitly assumed by researchers during experimental design.” (material under Figure 1). Continue reading
In preparation for a new post that takes up some of the recent battles on reforming or replacing p-values, I reblog an older post on power, one of the most misunderstood and abused notions in statistics. (I add a few “notes on howlers”.) The power of a test T in relation to a discrepancy from a test hypothesis H0 is the probability T will lead to rejecting H0 when that discrepancy is present. Power is sometimes misappropriated to mean something only distantly related to the probability a test leads to rejection; but I’m getting ahead of myself. This post is on a classic fallacy of rejection. Continue reading
“The subjective Bayesian theory as developed, for example, by Savage … cannot solve the deceptively simple but actually intractable old evidence problem, whence as a foundation for a logic of confirmation at any rate, it must be accounted a failure.” (Howson, (2017), p. 674)
What? Did the “old evidence” problem cause Colin Howson to recently abdicate his decades long position as a leading subjective Bayesian? It seems to have. I was so surprised to come across this in a recent perusal of Philosophy of Science that I wrote to him to check if it is really true. (It is.) I thought perhaps it was a different Colin Howson, or the son of the one who co-wrote 3 editions of Howson and Urbach: Scientific Reasoning: The Bayesian Approach espousing hard-line subjectivism since 1989. I am not sure which of the several paradigms of non-subjective or default Bayesianism Howson endorses (he’d argued for years, convincingly, against any one of them), nor how he handles various criticisms (Kass and Wasserman 1996), I put that aside. Nor have I worked through his, rather complex, paper to the extent necessary, yet. What about the “old evidence” problem, made famous by Clark Glymour 1980? What is it? Continue reading
Erich Lehmann 20 November 1917 – 12 September 2009
Erich Lehmann was born 100 years ago today! (20 November 1917 – 12 September 2009). Lehmann was Neyman’s first student at Berkeley (Ph.D 1942), and his framing of Neyman-Pearson (NP) methods has had an enormous influence on the way we typically view them.*
I got to know Erich in 1997, shortly after publication of EGEK (1996). One day, I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that). He began by telling me that he was sitting in a very large room at an ASA (American Statistical Association) meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, wood table sat just one book, all alone, shiny red.
He said ” I wonder if it might be of interest to me!” So he walked up to it…. It turned out to be my Error and the Growth of Experimental Knowledge (1996, Chicago), which he reviewed soon after. (What are the chances?) Some related posts on Lehmann’s letter are here and here.
3 years ago…
MONTHLY MEMORY LANE: 3 years ago: November 2014. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently, and in green 3- 4 others of general relevance to philosophy of statistics (in months where I’ve blogged a lot). Posts that are part of a “unit” or a group count as one (11/1/14 & 11/09/14 and 11/15/14 & 11/25/14 are grouped). The comments are worth checking out.
- 11/01 Philosophy of Science Assoc. (PSA) symposium on Philosophy of Statistics in the Higgs Experiments “How Many Sigmas to Discovery?”
- 11/09 “Statistical Flukes, the Higgs Discovery, and 5 Sigma” at the PSA
- 11/11 The Amazing Randi’s Million Dollar Challenge
- 11/12 A biased report of the probability of a statistical fluke: Is it cheating?
- 11/15 Why the Law of Likelihood is bankrupt–as an account of evidence
- 11/18 Lucien Le Cam: “The Bayesians Hold the Magic”
- 11/20 Erich Lehmann: Statistician and Poet
- 11/22 Msc Kvetch: “You are a Medical Statistic”, or “How Medical Care Is Being Corrupted”
- 11/25 How likelihoodists exaggerate evidence from statistical tests
- 11/30 3 YEARS AGO: MONTHLY (Nov.) MEMORY LANE
 Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
 New Rule, July 30,2016, March 30,2017 -a very convenient way to allow data-dependent choices (note why it’s legit in selecting blog posts, on severity grounds).
However, it’s mandatory to adjust for selection effects, and Benjamini is one of the leaders in developing ways to carry out the adjustments. Even calling out the avenues for cherry-picking and multiple testing, long known to invalidate p-values, would make replication research more effective (and less open to criticism). Continue reading
.A world beyond p-values?
I was asked to write something explaining the background of my slides (posted here) in relation to the recent ASA “A World Beyond P-values” conference. I took advantage of some long flight delays on my return to jot down some thoughts:
The contrast between the closing session of the conference “A World Beyond P-values,” and the gist of the conference itself, shines a light on a pervasive tension within the “Beyond P-Values” movement. Two very different debates are taking place. First there’s the debate about how to promote better science. This includes welcome reminders of the timeless demands of rigor and integrity required to avoid deceiving ourselves and others–especially crucial in today’s world of high-powered searches and Big Data. That’s what the closing session was about.  Continue reading
Here are my slides from the ASA Symposium on Statistical Inference : “A World Beyond p < .05” in the session, “What are the best uses for P-values?”. (Aside from me,our session included Yoav Benjamini and David Robinson, with chair: Nalini Ravishanker.)
- Why use a tool that infers from a single (arbitrary) P-value that pertains to a statistical hypothesis H0 to a research claim H*?
- Why use an incompatible hybrid (of Fisher and N-P)?
- Why apply a method that uses error probabilities, the sampling distribution, researcher “intentions” and violates the likelihood principle (LP)? You should condition on the data.
- Why use methods that overstate evidence against a null hypothesis?
- Why do you use a method that presupposes the underlying statistical model?
- Why use a measure that doesn’t report effect sizes?
- Why do you use a method that doesn’t provide posterior probabilities (in hypotheses)?
I was part of something called “a brains blog roundtable” on the business of p-values earlier this week–I’m glad to see philosophers getting involved.
Next week I’ll be in a session that I think is intended to explain what’s right about P-values at an ASA Symposium on Statistical Inference : “A World Beyond p < .05”. Continue reading
Barnard 23 Sept.1915 – 9 Aug.20
With continued acknowledgement of Barnard’s birthday on Friday, Sept.23, I reblog an exchange on catchall probabilities from the “The Savage Forum” (pp 79-84 Savage, 1962) with some new remarks.[i]
BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important. Continue reading
G.A. Barnard: 23 Sept.1915 – 9 Aug.2002
Today is George Barnard’s birthday. I met him in the 1980s and we corresponded off and on until 1999. Here’s a snippet of his discussion with Savage (1962) (link below [i]) that connects to issues often taken up on this blog: stopping rules and the likelihood principle. (It’s a slightly revised reblog of an earlier post.) I’ll post some other items related to Barnard this week, in honor of his birthday.
Happy Birthday George!
Barnard: I have been made to think further about this issue of the stopping rule since I first suggested that the stopping rule was irrelevant (Barnard 1947a,b). This conclusion does not follow only from the subjective theory of probability; it seems to me that the stopping rule is irrelevant in certain circumstances. Since 1947 I have had the great benefit of a long correspondence—not many letters because they were not very frequent, but it went on over a long time—with Professor Bartlett, as a result of which I am considerably clearer than I was before. My feeling is that, as I indicated [on p. 42], we meet with two sorts of situation in applying statistics to data One is where we want to have a single hypothesis with which to confront the data. Do they agree with this hypothesis or do they not? Now in that situation you cannot apply Bayes’s theorem because you have not got any alternatives to think about and specify—not yet. I do not say they are not specifiable—they are not specified yet. And in that situation it seems to me the stopping rule is relevant. Continue reading
28 July 1902- 17 Sept. 1994
Karl Popper died on September 17 1994. One thing that gets revived in my new book (Statistical Inference as Severe Testing, 2018, CUP) is a Popperian demarcation of science vs pseudoscience Here’s a snippet from what I call a “live exhibit” (where the reader experiments with a subject) toward the end of a chapter on Popper:
Live Exhibit. Revisiting Popper’s Demarcation of Science: Here’s an experiment: Try shifting what Popper says about theories to a related claim about inquiries to find something out. To see what I have in mind, join me in watching a skit over the lunch break:
Physicist: “If mere logical falsifiability suffices for a theory to be scientific, then, we can’t properly oust astrology from the scientific pantheon. Plenty of nutty theories have been falsified, so by definition they’re scientific. Moreover, scientists aren’t always looking to subject well corroborated theories to “grave risk” of falsification.”
Fellow traveler: “I’ve been thinking about this. On your first point, Popper confuses things by making it sound as if he’s asking: When is a theory unscientific? What he is actually asking or should be asking is: When is an inquiry into a theory, or an appraisal of claim H unscientific? We want to distinguish meritorious modes of inquiry from those that are BENT. If the test methods enable ad hoc maneuvering, sneaky face-saving devices, then the inquiry–the handling and use of data–is unscientific. Despite being logically falsifiable, theories can be rendered immune from falsification by means of cavalier methods for their testing. Adhering to a falsified theory no matter what is poor science. On the other hand, some areas have so much noise that you can’t pinpoint what’s to blame for failed predictions. This is another way that inquiries become bad science.”
She continues: Continue reading
C. S. Peirce: 10 Sept, 1839-19 April, 1914
Sunday, September 10, was C.S. Peirce’s birthday. He’s one of my heroes. He’s a treasure chest on essentially any topic, and anticipated quite a lot in statistics and logic. (As Stephen Stigler (2016) notes, he’s to be credited with articulating and appling randomization .) I always find something that feels astoundingly new, even rereading him. He’s been a great resource as I complete my book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2018) . I’m reblogging the main sections of a (2005) paper of mine. It’s written for a very general philosophical audience; the statistical parts are very informal. I first posted it in 2013. Happy (belated) Birthday Peirce.
Peircean Induction and the Error-Correcting Thesis
Deborah G. Mayo
Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy, Volume 41, Number 2, 2005, pp. 299-319
Peirce’s philosophy of inductive inference in science is based on the idea that what permits us to make progress in science, what allows our knowledge to grow, is the fact that science uses methods that are self-correcting or error-correcting:
Induction is the experimental testing of a theory. The justification of it is that, although the conclusion at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error. (5.145)
Inductive methods—understood as methods of experimental testing—are justified to the extent that they are error-correcting methods. We may call this Peirce’s error-correcting or self-correcting thesis (SCT): Continue reading
E.S. Pearson: 11 Aug 1895-12 June 1980.
Here’s one last entry in honor of Egon Pearson’s birthday: “Statistical Concepts in Their Relation to Reality” (Pearson 1955). I’ve posted it several times over the years (6!), but always find a new gem or two, despite its being so short. E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:
“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.
(Nowadays, some people concentrate to an absurd extent on “science-wise error rates in dichotomous screening”.) Continue reading
11 August 1895 – 12 June 1980
Continuing with my Egon Pearson posts in honor of his birthday, I reblog a post by Aris Spanos: “Egon Pearson’s Neglected Contributions to Statistics“.
Egon Pearson (11 August 1895 – 12 June 1980), is widely known today for his contribution in recasting of Fisher’s significance testing into the Neyman-Pearson (1933) theory of hypothesis testing. Occasionally, he is also credited with contributions in promoting statistical methods in industry and in the history of modern statistics; see Bartlett (1981). What is rarely mentioned is Egon’s early pioneering work on:
(i) specification: the need to state explicitly the inductive premises of one’s inferences,
(ii) robustness: evaluating the ‘sensitivity’ of inferential procedures to departures from the Normality assumption, as well as
(iii) Mis-Specification (M-S) testing: probing for potential departures from the Normality assumption.
Arguably, modern frequentist inference began with the development of various finite sample inference procedures, initially by William Gosset (1908) [of the Student’s t fame] and then Fisher (1915, 1921, 1922a-b). These inference procedures revolved around a particular statistical model, known today as the simple Normal model: Continue reading
E.S. Pearson (11 Aug, 1895-12 June, 1980)
This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ll blog some E. Pearson items this week, including, my latest reflection on a historical anecdote regarding Egon and the woman he wanted marry, and surely would have, were it not for his father Karl!
HAPPY BELATED BIRTHDAY EGON!
Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson. Continue reading