E.S. Pearson (11 Aug, 1895-12 June, 1980)
This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ve recently been scouring around the history and statistical philosophies of Neyman, Pearson and Fisher for purposes of a book soon to be completed. I recently discovered a little anecdote that calls for a correction in something I’ve been saying for years. While it’s little more than a point of trivia, it’s in relation to Pearson’s (1955) response to Fisher (1955)–the last entry in this post. I’ll wait until tomorrow or the next day to share it, to give you a chance to read the background.
Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson.
Cases of Type A and Type B
“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)
1. PhilSci and StatSci. I’m always glad to come across statistical practitioners who wax philosophical, particularly when Karl Popper is cited. Best of all is when they get the philosophy somewhere close to correct. So, I came across an article by Burnham and Anderson (2014) in Ecology:
“While the exact definition of the so-called ‘scientific method’ might be controversial, nearly everyone agrees that the concept of ‘falsifiability’ is a central tenant [sic] of empirical science (Popper 1959). It is critical to understand that historical statistical approaches (i.e., P values) leave no way to ‘test’ the alternative hypothesis. The alternative hypothesis is never tested, hence cannot be rejected or falsified!… Surely this fact alone makes the use of significance tests and P values bogus. Lacking a valid methodology to reject/falsify the alternative science hypotheses seems almost a scandal.” (Burnham and Anderson p. 629)
Well I am (almost) scandalized by this easily falsifiable allegation! I can’t think of a single “alternative”, whether in a “pure” Fisherian or a Neyman-Pearson hypothesis test (whether explicit or implicit) that’s not falsifiable; nor do the authors provide any. I grant that understanding testability and falsifiability is far more complex than the kind of popularized accounts we hear about; granted as well, theirs is just a short paper. But then why make bold declarations on the topic of the “scientific method and statistical science,” on falsifiability and testability? Continue reading
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
The tweet read “Featured review: Only 10% people with tension-type headaches get a benefit from paracetamol” and immediately I thought, ‘how would they know?’ and almost as quickly decided, ‘of course they don’t know, they just think they know’. Sure enough, on following up the link to the Cochrane Review in the tweet it turned out that, yet again, the deadly mix of dichotomies and numbers needed to treat had infected the brains of researchers to the extent that they imagined that they had identified personal response. (See Responder Despondency for a previous post on this subject.)
The bare facts they established are the following:
The International Headache Society recommends the outcome of being pain free two hours after taking a medicine. The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo.
and the false conclusion they immediately asserted is the following
This means that only 10 in 100 or 10% of people benefited because of paracetamol 1000 mg.
To understand the fallacy, look at the accompanying graph. Continue reading
3 years ago…
MONTHLY MEMORY LANE: 3 years ago: July 2013. I mark in red three posts that seem most apt for general background on key issues in this blog, excluding those reblogged recently , and in green up to 3 others I’d recommend. Posts that are part of a “unit” or a group of “U-Phils”(you [readers] philosophize) count as one.
- (7/3) Phil/Stat/Law: 50 Shades of gray between error and fraud
- (7/6) Bad news bears: ‘Bayesian bear’ rejoinder–reblog mashup
- (7/10) PhilStatLaw: Reference Manual on Scientific Evidence (3d ed) on Statistical Significance (Schachtman)
- (7/11) Is Particle Physics Bad Science? (memory lane)
- (7/13) Professor of Philosophy Resigns over Sexual Misconduct (rejected post)
- (7/14) Stephen Senn: Indefinite irrelevance
- (7/17) Phil/Stat/Law: What Bayesian prior should a jury have? (Schachtman)
- (7/19) Msc Kvetch: A question on the Martin-Zimmerman case we do not hear
- (7/20) Guest Post: Larry Laudan. Why Presuming Innocence is Not a Bayesian Prior
- (7/23) Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs
- (7/26) New Version: On the Birnbaum argument for the SLP: Slides for JSM talk
 Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
 New Rule, July 30, 2016.
28 July 1902 – 17 September 1994
Today is Karl Popper’s birthday. I’m linking to a reading from his Conjectures and Refutations[i] along with an undergraduate item I came across: Popper Self-Test Questions. It includes multiple choice questions, quotes to ponder, and thumbnail definitions at the end[ii].
Blog Readers who wish to send me their answers will have their papers graded (at least try the multiple choice; if you’re unsure, do the reading). [Use the comments or e-mail.]
[i] Popper reading from Conjectures and Refutations
[ii] I might note the “No-Pain philosophy” (3 part) Popper posts from this blog: parts 1, 2, and 3.
HAPPY BIRTHDAY POPPER!
Popper, K. (1962). Conjectures and Refutations: The Growth of Scientific Knowledge. New York: Basic Books.
Seeing the world through overly rosy glasses
Taboos about power nearly always stem from misuse of power analysis. Sander Greenland (2012) has a paper called “Nonsignificance Plus High Power Does Not Imply Support for the Null Over the Alternative.” I’m not saying Greenland errs; the error would be made by anyone who interprets power analysis in a manner giving rise to Greenland’s objection. So what’s (ordinary) power analysis?
(I) Listen to Jacob Cohen (1988) introduce Power Analysis
“PROVING THE NULL HYPOTHESIS. Research reports in the literature are frequently flawed by conclusions that state or imply that the null hypothesis is true. For example, following the finding that the difference between two sample means is not statistically significant, instead of properly concluding from this failure to reject the null hypothesis that the data do not warrant the conclusion that the population means differ, the writer concludes, at least implicitly, that there is no difference. The latter conclusion is always strictly invalid, and is functionally invalid as well unless power is high. The high frequency of occurrence of this invalid interpretation can be laid squarely at the doorstep of the general neglect of attention to statistical power in the training of behavioral scientists. Continue reading
3 years ago…
MONTHLY MEMORY LANE: 3 years ago: June 2013. I mark in red three posts that seem most apt for general background on key issues in this blog, excluding those reblogged recently . Posts that are part of a “unit” or a group of “U-Phils”(you [readers] philosophize) count as one. Here I grouped 6/5 and 6/6.
- (6/1) Winner of May Palindrome Contest
- (6/1) Some statistical dirty laundry*(recently reblogged)
- (6/5) Do CIs Avoid Fallacies of Tests? Reforming the Reformers :(6/5 and6/6 are paired as one)
- (6/6) PhilStock: Topsy-Turvy Game
- (6/6) Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)
- (6/8) Richard Gill: “Integrity or fraud… or just questionable research practices?”*(recently reblogged)
- (6/11) Mayo: comment on the repressed memory research [How a conceptual criticism, requiring no statistics, might go.]
- (6/14) P-values can’t be trusted except when used to argue that p-values can’t be trusted!
- (6/19) PhilStock: The Great Taper Caper
- (6/19) Stanley Young: better p-values through randomization in microarrays
- (6/22) What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri*(recently reblogged)
- (6/26) Why I am not a “dualist” in the sense of Sander Greenland
- (6/29) Palindrome “contest” contest
- (6/30) Blog Contents: mid-year
 Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
Allan Birnbaum: May 27, 1923- July 1, 1976
Allan Birnbaum died 40 years ago today. He lived to be only 53 [i]. From the perspective of philosophy of statistics and philosophy of science, Birnbaum is best known for his work on likelihood, the Likelihood Principle [ii], and for his attempts to blend concepts of likelihood with error probability ideas to arrive at what he termed “concepts of statistical evidence”. Failing to find adequate concepts of statistical evidence, Birnbaum called for joining the work of “interested statisticians, scientific workers and philosophers and historians of science”–an idea I have heartily endorsed. While known for a result that the (strong) Likelihood Principle followed from sufficiency and conditionality principles (a result that Jimmy Savage deemed one of the greatest breakthroughs in statistics), a few years after publishing it, he turned away from it, perhaps discovering gaps in his argument. A post linking to a 2014 Statistical Science issue discussing Birnbaum’s result is here. Reference  links to the Synthese 1977 volume dedicated to his memory. The editors describe it as their way of “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. Ample weekend reading! Continue reading
Professor Richard Gill
It was statistician Richard Gill who first told me about Diederik Stapel (see an earlier post on Diederik). We were at a workshop on Error in the Sciences at Leiden in 2011. I was very lucky to have Gill be assigned as my commentator/presenter—he was excellent! As I was explaining some data problems to him, he suddenly said, “Some people don’t bother to collect data at all!” That’s when I learned about Stapel.
Committees often turn to Gill when someone’s work is up for scrutiny of bad statistics or fraud, or anything in between. Do you think he’s being too easy on researchers when he says, about a given case:
“data has been obtained by some combination of the usual ‘questionable research practices’ [QRPs] which are prevalent in the field in question. Everyone does it this way, in fact, if you don’t, you’d never get anything published. …People are not deliberately cheating: they honestly believe in their theories and believe the data is supporting them.”
Isn’t that the danger in relying on deeply felt background beliefs? Have our attitudes changed (toward QRPs) over the past 3 years (harsher or less harsh)? Here’s a talk of his I blogged 3 years ago (followed by a letter he allowed me to post). I reflect on the pseudoscientific nature of the ‘recovered memories’ program in one of the Geraerts et al. papers in a later post. Continue reading
For entertainment only
In a post 3 years ago (“What do these share in common: m&m’s, limbo stick, ovulation, Dale Carnegie? Sat night potpourri”), I expressed doubts about expending serious effort to debunk the statistical credentials of studies that most readers without any statistical training would regard as “for entertainment only,” dubious, or pseudoscientific quackery. It needn’t even be that the claim is implausible, what’s implausible is that it has been well probed in the experiment at hand. Given the attention being paid to such examples by some leading statisticians, and scores of replication researchers over the past 3 years–attention that has been mostly worthwhile–maybe the bar has been lowered. What do you think? Anyway, this is what I blogged 3 years ago. (Oh, I decided to put in a home-made cartoon!) Continue reading
Right after our session at the SPSP meeting last Friday, I chaired a symposium on replication that included Brian Earp–an active player in replication research in psychology (Replication and Evidence: A tenuous relationship p. 80). One of the first things he said, according to my notes, is that gambits such as cherry picking, p-hacking, hunting for significance, selective reporting, and other QRPs, had
been taught as acceptable become standard practice in psychology, without any special need to adjust p-values or alert the reader to their spuriousness [i]. (He will correct me if I’m wrong.) It shocked me to hear it, even though it shouldn’t have, given what I’ve learned about statistical practice in social science. It was the Report on Stapel that really pulled back the curtain on this attitude toward QRPs in social psychology–as discussed in this blogpost 3 years ago. (If you haven’t read Section 5 of the report on flawed science, you should.) Many of us assumed that QRPs, even if still committed, were at least recognized to be bad statistical practices since the time of Morrison and Henkel’s (1970) Significance Test Controversy. A question now is this: have all the confessions of dirty laundry, the fraudbusting of prominent researchers, the pledges to straighten up and fly right, the years of replication research, done anything to remove the stains? I leave the question open for now. Here’s my “statistical dirty laundry” post from 2013: Continue reading
Here are the slides from our talk at the Society for Philosophy of Science in Practice (SPSP) conference. I covered the first 27, Parker the rest. The abstract is here:
I’m giving a joint presentation with Caitlin Parker on Friday (June 17) at the meeting of the Society for Philosophy of Science in Practice (SPSP): “Using Philosophy of Statistics to Make Progress in the Replication Crisis in Psychology” (Rowan University, Glassboro, N.J.) The Society grew out of a felt need to break out of the sterile straightjacket wherein philosophy of science occurs divorced from practice. The topic of the relevance of PhilSci and PhilStat to Sci has often come up on this blog, so people might be interested in the SPSP mission statement below our abstract.
Using Philosophy of Statistics to Make Progress in the Replication Crisis in Psychology
Deborah Mayo Virginia Tech, Department of Philosophy United States
Caitlin Parker Virginia Tech, Department of Philosophy United States
I came across an excellent post on a blog kept by Daniel Lakens: “So you banned p-values, how’s that working out for you?” He refers to the journal that recently banned significance tests, confidence intervals, and a vague assortment of other statistical methods, on the grounds that all such statistical inference tools are “invalid” since they don’t provide posterior probabilities of some sort (see my post). The editors’ charge of “invalidity” could only hold water if these error statistical methods purport to provide posteriors based on priors, which is false. The entire methodology is based on methods in which probabilities arise to qualify the method’s capabilities to detect and avoid erroneous interpretations of data . The logic is of the falsification variety found throughout science. Lakens, an experimental psychologist, does a great job delineating some of the untoward consequences of their inferential ban. I insert some remarks in black. Continue reading
Winner of the May 2016 Palindrome contest
Curtis Williams: Inventor, entrepreneur, and professional actor
The winning palindrome (a dialog):
“Disable preplan?… I, Mon Ami?”
“No! I tag. I vandalized Dezi, lad.”
“Navigational leftism lacks aim…a nominal perp: Elba’s id.”
The requirement: A palindrome using “navigate” or “navigation” (and Elba, of course).
Book choice: Error and Inference (D. Mayo & A. Spanos, Cambridge University Press, 2010)
Bio: Curtis Mark Williams is the co-founder of WavHello and the inventor of Bellybuds, who also counts himself as an occasional professional actor who has performed on Broadway  and in several television shows and films.
He currently resides in Los Angeles with his lovely wife, two daughters, his dog, Newton, and his framed New Yorker Caption Contest winning cartoon. [He has been a finalist twice and the one he won is contest #329, by Joe Dator (inspired by his theatrical background. :)] Continue reading
Aris Spanos, my colleague (in economics) and co-author, came across this anonymous review of our Error and Inference (2010) [E & I]. Interestingly, the reviewer remarks that “The book gives a sense of security regarding the future of statistical science and its importance in many walks of life.” We’re not sure what the reviewer means–but it’s appreciated regardless. This post was from yesterday’s 3-year memory lane and was first posted here.
2010 American Statistical Association and the American Society for Quality
TECHNOMETRICS, AUGUST 2010, VOL. 52, NO. 3, Book Reviews, 52:3, pp. 362-370.
Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, edited by Deborah G. MAYO and Aris SPANOS, New York: Cambridge University Press, 2010, ISBN 978-0-521-88008-4, xvii+419 pp., $60.00.
This edited volume contemplates the interests of both scientists and philosophers regarding gathering reliable information about the problem/question at hand in the presence of error, uncertainty, and with limited data information.
The volume makes a signiﬁcant contribution in bridging the gap between scientiﬁc practice and the philosophy of science. The main contribution of this volume pertains to issues of error and inference, and showcases intriguing discussions on statistical testing and providing alternative strategy to Bayesian inference. In words, it provides cumulative information towards the philosophical and methodological issues of scientiﬁc inquiry at large.
The target audience of this volume is quite general and open to a broad readership. With some reasonable knowledge of probability theory and statistical science, one can get the maximum beneﬁt from most of the chapters of the volume. The volume contains original and fascinating articles by eminent scholars (nine, including the editors) who range from names in statistical science to philosophy, including D. R. Cox, a name well known to statisticians. Continue reading
3 years ago…
MONTHLY MEMORY LANE: 3 years ago: May 2013. I mark in red three posts that seem most apt for general background on key issues in this blog . Some of the May 2013 posts blog the conference we held earlier that month: “Ontology and Methodology”. I highlight in burgundy a post on Birnbaum that follows up on my last post in honor of his birthday. New questions or comments can be placed on this post.
- (5/3) Schedule for Ontology & Methodology, 2013
- (5/6) Professorships in Scandal?
- (5/9) If it’s called the “The High Quality Research Act,” then ….
- (5/13) ‘No-Shame’ Psychics Keep Their Predictions Vague: New Rejected post
- (5/14) “A sense of security regarding the future of statistical science…” Anon review of Error and Inference
- (5/18) Gandenberger on Ontology and Methodology (May 4) Conference: Virginia Tech
- (5/19) Mayo: Meanderings on the Onto-Methodology Conference
- (5/22) Mayo’s slides from the Onto-Meth conference
- (5/24) Gelman sides w/ Neyman over Fisher in relation to a famous blow-up
- (5/26) Schachtman: High, Higher, Highest Quality Research Act
- (5/27) A.Birnbaum: Statistical Methods in Scientific Inference
- (5/29) K. Staley: review of Error & Inference
Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
27 May 1923-1 July 1976
Today is Allan Birnbaum’s birthday. In honor of his birthday this year, I’m posting the articles in the Synthese volume that was dedicated to his memory in 1977. The editors describe it as their way of “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. I paste a few snippets from the articles by Giere and Birnbaum. If you’re interested in statistical foundations, and are unfamiliar with Birnbaum, here’s a chance to catch up.(Even if you are,you may be unaware of some of these key papers.)
HAPPY BIRTHDAY ALLAN!
Synthese Volume 36, No. 1 Sept 1977: Foundations of Probability and Statistics, Part I
This special issue of Synthese on the foundations of probability and statistics is dedicated to the memory of Professor Allan Birnbaum. Professor Birnbaum’s essay ‘The Neyman-Pearson Theory as Decision Theory; and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory’ was received by the editors of Synthese in October, 1975, and a decision was made to publish a special symposium consisting of this paper together with several invited comments and related papers. The sad news about Professor Birnbaum’s death reached us in the summer of 1976, but the editorial project could nevertheless be completed according to the original plan. By publishing this special issue we wish to pay homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics. We are grateful to Professor Ronald Giere who wrote an introductory essay on Professor Birnbaum’s concept of statistical evidence and who compiled a list of Professor Birnbaum’s publications.
In their “Comment: A Simple Alternative to p-values,” (on the ASA P-value document), Benjamin and Berger (2016) recommend researchers report a pre-data Rejection Ratio:
It is the probability of rejection when the alternative hypothesis is true, divided by the probability of rejection when the null hypothesis is true, i.e., the ratio of the power of the experiment to the Type I error of the experiment. The rejection ratio has a straightforward interpretation as quantifying the strength of evidence about the alternative hypothesis relative to the null hypothesis conveyed by the experimental result being statistically significant. (Benjamin and Berger 2016, p. 1)
The recommendation is much more fully fleshed out in a 2016 paper by Bayarri, Benjamin, Berger, and Sellke (BBBS 2016): Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses. Their recommendation is:
…that researchers should report the ‘pre-experimental rejection ratio’ when presenting their experimental design and researchers should report the ‘post-experimental rejection ratio’ (or Bayes factor) when presenting their experimental results. (BBBS 2016, p. 3)….
The (pre-experimental) ‘rejection ratio’ Rpre , the ratio of statistical power to significance threshold (i.e., the ratio of the probability of rejecting under H1 and H0 respectively), is shown to capture the strength of evidence in the experiment for H1 over H0. (ibid., p. 2)
But in fact it does no such thing! [See my post from the FUSION conference here.] J. Berger, and his co-authors, will tell you the rejection ratio (and a variety of other measures created over the years) are entirely frequentist because they are created out of frequentist error statistical measures. But a creation built on frequentist measures doesn’t mean the resulting animal captures frequentist error statistical reasoning. It might be a kind of Frequentstein monster!  Continue reading