#### 2016 UK-EU Foundations of Physics Conference

#### Start Date:16 July 2016

- End Date:18 July 2016

Location: London School of Economics, London, UK

Website: http://www.lse.ac.uk/foundations2016/ Continue reading

- End Date:18 July 2016

Location: London School of Economics, London, UK

Website: http://www.lse.ac.uk/foundations2016/ Continue reading

Categories: Announcement
Leave a comment

**MONTHLY MEMORY LANE: 3 years ago: June 2013. **I mark in red **three** posts that seem most apt for general background on key issues in this blog, excluding those reblogged recently **[1]**.** **Posts that are part of a “unit” or a group of “U-Phils”(you [readers] philosophize) count as one. Here I grouped 6/5 and 6/6.

**June 2013
**

- (6/1) Winner of May Palindrome Contest
- (6/1) Some statistical dirty laundry
***(recently reblogged)** **(6/5) Do CIs Avoid Fallacies of Tests? Reforming the Reformers :(6/5 and6/6 are paired as one)**- (6/6) PhilStock: Topsy-Turvy Game
**(6/6) Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)**- (6/8) Richard Gill: “Integrity or fraud… or just questionable research practices?”*
**(recently reblogged)** - (6/11) Mayo: comment on the repressed memory research
**[How a conceptual criticism, requiring no statistics, might go.]** **(6/14) P-values can’t be trusted except when used to argue that p-values can’t be trusted!**- (6/19) PhilStock: The Great Taper Caper
**(6/19) Stanley Young: better p-values through randomization in microarrays**- (6/22) What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri*
**(recently reblogged)** - (6/26) Why I am not a “dualist” in the sense of Sander Greenland
- (6/29) Palindrome “contest” contest
- (6/30) Blog Contents: mid-year

**[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.**

Categories: 3-year memory lane, Error Statistics, Statistics
Leave a comment

Allan Birnbaum died 40 years ago today. He lived to be only 53 [i]. From the perspective of philosophy of statistics and philosophy of science, Birnbaum is best known for his work on likelihood, the Likelihood Principle [ii], and for his attempts to blend concepts of likelihood with error probability ideas to arrive at what he termed “concepts of statistical evidence”. Failing to find adequate concepts of statistical evidence, Birnbaum called for joining the work of “interested statisticians, scientific workers and philosophers and historians of science”–an idea I have heartily endorsed. While known for a result that the (strong) Likelihood Principle followed from sufficiency and conditionality principles (a result that Jimmy Savage deemed one of the greatest breakthroughs in statistics), a few years after publishing it, he turned away from it, perhaps discovering gaps in his argument. A post linking to a 2014 *Statistical Science* issue discussing Birnbaum’s result is here. Reference [5] links to the *Synthese* 1977 volume dedicated to his memory. The editors describe it as their way of “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. Ample weekend reading! Continue reading

Categories: Birnbaum, Likelihood Principle, phil/history of stat, Statistics
Tags: Birnbaum
62 Comments

Professor Richard Gill

Statistics Group

Mathematical Institute

Leiden University

It was statistician Richard Gill who first told me about Diederik Stapel (see an earlier post on Diederik). We were at a workshop on *Error in the Sciences* at Leiden in 2011. I was very lucky to have Gill be assigned as my commentator/presenter—he was excellent! As I was explaining some data problems to him, he suddenly said, “Some people don’t bother to collect data at all!” That’s when I learned about Stapel.

Committees often turn to Gill when someone’s work is up for scrutiny of bad statistics or fraud, or anything in between. Do you think he’s being too easy on researchers when he says, about a given case:

“data has been obtained by some combination of the usual ‘questionable research practices’ [QRPs] which are prevalent in the field in question. Everyone does it this way, in fact, if you don’t, you’d never get anything published. …People are not deliberately cheating: they honestly believe in their theories and believe the data is supporting them.”

Isn’t that the danger in relying on deeply felt background beliefs? * Have our attitudes changed (toward QRPs) over the past 3 years (harsher or less harsh)?* Here’s a talk of his I blogged 3 years ago (followed by a letter he allowed me to post). I reflect on the pseudoscientific nature of the ‘recovered memories’ program in one of the Geraerts et al. papers in a later post. Continue reading

Categories: 3-year memory lane, junk science, Statistical fraudbusting, Statistics
4 Comments

In a post 3 years ago (“What do these share in common: m&m’s, limbo stick, ovulation, Dale Carnegie? Sat night potpourri”), I expressed doubts about expending serious effort to debunk the statistical credentials of studies that most readers without any statistical training would regard as “for entertainment only,” dubious, or pseudoscientific quackery. It needn’t even be that the claim is implausible, what’s implausible is that it has been well probed in the experiment at hand. Given the attention being paid to such examples by some leading statisticians, and scores of replication researchers over the past 3 years–attention that has been mostly worthwhile–maybe the bar has been lowered. What do you think? Anyway, this is what I blogged 3 years ago. (Oh, I decided to put in a home-made cartoon!) Continue reading

Categories: junk science, replication research, Statistics
2 Comments

Right after our session at the SPSP meeting last Friday, I chaired a symposium on replication that included Brian Earp–an active player in replication research in psychology *(*Replication and Evidence: A tenuous relationship p. 80). One of the first things he said, according to my notes, is that gambits such as cherry picking, p-hacking, hunting for significance, selective reporting, and other QRPs, had ~~been taught as acceptable~~ become standard practice in psychology, without any special need to adjust p-values or alert the reader to their spuriousness [i]. (He will correct me if I’m wrong[2].) It shocked me to hear it, even though it shouldn’t have, given what I’ve learned about statistical practice in social science. It was the Report on Stapel that really pulled back the curtain on this attitude toward QRPs in social psychology–as discussed in this blogpost 3 years ago. (If you haven’t read Section 5 of the report on flawed science, you should.) Many of us assumed that QRPs, even if still committed, were at least recognized to be bad statistical practices since the time of Morrison and Henkel’s (1970) *Significance Test Controversy*. A question now is this: have all the confessions of dirty laundry, the fraudbusting of prominent researchers, the pledges to straighten up and fly right, the years of replication research, done anything to remove the stains? I leave the question open for now. Here’s my* “statistical dirty laundry” post from 2013:* Continue reading

Categories: junk science, reproducibility, spurious p values, Statistics
4 Comments

Here are the slides from our talk at the Society for Philosophy of Science in Practice (SPSP) conference. I covered the first 27, Parker the rest. The abstract is here:

I’m giving a joint presentation with Caitlin Parker[1] on Friday (June 17) at the meeting of the Society for Philosophy of Science in Practice (SPSP): “Using Philosophy of Statistics to Make Progress in the Replication Crisis in Psychology” (Rowan University, Glassboro, N.J.)[2] The Society grew out of a felt need to break out of the sterile straightjacket wherein philosophy of science occurs divorced from practice. The topic of the relevance of PhilSci and PhilStat to Sci has often come up on this blog, so people might be interested in the SPSP mission statement below our abstract.

**Using Philosophy of Statistics to Make Progress in the Replication Crisis in Psychology**

Deborah Mayo Virginia Tech, Department of Philosophy United States

Caitlin Parker Virginia Tech, Department of Philosophy United States

Categories: Announcement, replication research, reproducibility
8 Comments

I came across an excellent post on a blog kept by Daniel Lakens: “So you banned p-values, how’s that working out for you?” He refers to the journal that recently banned significance tests, confidence intervals, and a vague assortment of other statistical methods, on the grounds that all such statistical inference tools are “invalid” since they don’t provide posterior probabilities of some sort (see my post). The editors’ charge of “invalidity” could only hold water if these error statistical methods purport to provide posteriors based on priors, which is false. The entire methodology is based on methods in which probabilities arise to qualify the method’s capabilities to detect and avoid erroneous interpretations of data [0]. The logic is of the falsification variety found throughout science. Lakens, an experimental psychologist, does a great job delineating some of the untoward consequences of their inferential ban. I insert some remarks in * black*. Continue reading

**Winner of the May 2016 Palindrome contest**

**Curtis Williams: **Inventor, entrepreneur, and professional actor

**The winning palindrome (a dialog): **

“Disable preplan?… I, Mon Ami?”

“Ask!”

“Calm…Sit, fella.”

“No! I tag. I vandalized Dezi, lad.”

“Navigational leftism lacks aim…a nominal perp: Elba’s id.”

**The requirement: **A palindrome using “navigate” or “navigation” (and Elba, of course).

**Book choice**: *Error and Inference *(D. Mayo & A. Spanos, Cambridge University Press, 2010)

**Bio: **Curtis Mark Williams is the co-founder of **WavHello **and the inventor of **Bellybuds**, who also counts himself as an occasional professional actor who has performed on Broadway [1] and in several television shows and films.

He currently resides in Los Angeles with his lovely wife, two daughters, his dog, Newton, and his framed New Yorker Caption Contest winning cartoon. [He has been a finalist twice and the one he won is contest #329, by Joe Dator (inspired by his theatrical background. :)] Continue reading

Categories: Palindrome
Leave a comment

Aris Spanos, my colleague (in economics) and co-author, came across this anonymous review of our *Error and Inference* (2010) [E & I]. Interestingly, the reviewer remarks that “The book gives a sense of security regarding the future of statistical science and its importance in many walks of life.” We’re not sure what the reviewer means–but it’s appreciated regardless. This post was from yesterday’s 3-year memory lane and was first posted here.

2010

American Statistical Association and the American Society for QualityTECHNOMETRICS, AUGUST 2010, VOL. 52, NO. 3, Book Reviews, 52:3, pp. 362-370.

Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, edited by Deborah G. MAYO and Aris SPANOS, New York: Cambridge University Press, 2010, ISBN 978-0-521-88008-4, xvii+419 pp., $60.00.This edited volume contemplates the interests of both scientists and philosophers regarding gathering reliable information about the problem/question at hand in the presence of error, uncertainty, and with limited data information.

The volume makes a signiﬁcant contribution in bridging the gap between scientiﬁc practice and the philosophy of science. The main contribution of this volume pertains to issues of error and inference, and showcases intriguing discussions on statistical testing and providing alternative strategy to Bayesian inference. In words, it provides cumulative information towards the philosophical and methodological issues of scientiﬁc inquiry at large.

The target audience of this volume is quite general and open to a broad readership. With some reasonable knowledge of probability theory and statistical science, one can get the maximum beneﬁt from most of the chapters of the volume. The volume contains original and fascinating articles by eminent scholars (nine, including the editors) who range from names in statistical science to philosophy, including D. R. Cox, a name well known to statisticians. Continue reading

Categories: 3-year memory lane, Review of Error and Inference, Statistics
3 Comments

**MONTHLY MEMORY LANE: 3 years ago: May 2013. **I mark in **red** **three** posts that seem most apt for general background on key issues in this blog **[1]**.** ** Some of the May 2013 posts blog the conference we held earlier that month: “Ontology and Methodology”. I highlight in **burgundy** a post on Birnbaum that follows up on my last post in honor of his birthday. New questions or comments can be placed on this post.

**May 2013**

- (5/3) Schedule for Ontology & Methodology, 2013
- (5/6) Professorships in Scandal?
- (5/9) If it’s called the “The High Quality Research Act,” then ….
- (5/13) ‘No-Shame’ Psychics Keep Their Predictions Vague: New Rejected post
**(5/14) “A sense of security regarding the future of statistical science…” Anon review of Error and Inference**- (5/18) Gandenberger on Ontology and Methodology (May 4) Conference: Virginia Tech
- (5/19) Mayo: Meanderings on the Onto-Methodology Conference
- (5/22) Mayo’s slides from the Onto-Meth conference
**(5/24) Gelman sides w/ Neyman over Fisher in relation to a famous blow-up**- (5/26) Schachtman: High, Higher, Highest Quality Research Act
**(5/27) A.Birnbaum: Statistical Methods in Scientific Inference****(5/29) K. Staley: review of Error & Inference**

** ****[1]Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.**

Categories: 3-year memory lane, Statistics
Leave a comment

*Today is Allan Birnbaum’s birthday. In honor of his birthday this year, I’m posting the articles in the *Synthese* volume that was dedicated to his memory in 1977. The editors describe it as their way of “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. I paste a few snippets from the articles by Giere and Birnbaum. If you’re interested in statistical foundations, and are unfamiliar with Birnbaum, here’s a chance to catch up.(Even if you are,you may be unaware of some of these key papers.)*

**HAPPY BIRTHDAY ALLAN!**

*Synthese* Volume 36, No. 1 Sept 1977: *Foundations of Probability and Statistics*, Part I

**Editorial Introduction:**

This special issue of

Syntheseon the foundations of probability and statistics is dedicated to the memory of Professor Allan Birnbaum. Professor Birnbaum’s essay ‘The Neyman-Pearson Theory as Decision Theory; and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory’ was received by the editors ofSynthesein October, 1975, and a decision was made to publish a special symposium consisting of this paper together with several invited comments and related papers. The sad news about Professor Birnbaum’s death reached us in the summer of 1976, but the editorial project could nevertheless be completed according to the original plan. By publishing this special issue we wish to pay homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics. We are grateful to Professor Ronald Giere who wrote an introductory essay on Professor Birnbaum’s concept of statistical evidence and who compiled a list of Professor Birnbaum’s publications.THE EDITORS

In their “Comment: A Simple Alternative to p-values,” (on the ASA P-value document), Benjamin and Berger (2016) recommend researchers report a pre-data Rejection Ratio:

It is the probability of rejection when the alternative hypothesis is true, divided by the probability of rejection when the null hypothesis is true, i.e., the ratio of the power of the experiment to the Type I error of the experiment. The rejection ratio has a straightforward interpretation as quantifying the strength of evidence about the alternative hypothesis relative to the null hypothesis conveyed by the experimental result being statistically significant. (Benjamin and Berger 2016, p. 1)

The recommendation is much more fully fleshed out in a 2016 paper by Bayarri, Benjamin, Berger, and Sellke (BBBS 2016): Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses. Their recommendation is:

…that researchers should report the ‘pre-experimental rejection ratio’ when presenting their experimental design and researchers should report the ‘post-experimental rejection ratio’ (or Bayes factor) when presenting their experimental results. (BBBS 2016, p. 3)….

The (pre-experimental) ‘rejection ratio’ R

_{pre}, the ratio of statistical power to significance threshold (i.e., the ratio of the probability of rejecting underH_{1}andH_{0}respectively), is shown to capture the strength of evidence in the experiment for H_{1 }over H_{0}. (ibid., p. 2)

*But in fact it does no such thing!* [See my post from the FUSION conference here.] J. Berger, and his co-authors, will tell you the rejection ratio (and a variety of other measures created over the years) are entirely frequentist because they are created out of frequentist error statistical measures. But a creation built on frequentist measures doesn’t mean the resulting animal captures frequentist error statistical reasoning. It might be a kind of Frequentstein monster! [1] Continue reading

Categories: J. Berger, power, reforming the reformers, S. Senn, Statistical power, Statistics
36 Comments

Whenever I’m in London, my criminologist friend Katrin H. and I go in search of stand-up comedy. Since it’s Saturday night (and I’m in London), we’re setting out in search of a good comedy club (I’ll complete this post upon return). A few years ago we heard Jackie Mason do his shtick, a one-man show billed as his swan song to England. It was like a repertoire of his “Greatest Hits” without a new or updated joke in the mix. Still, hearing his rants for the n^{th} time was often quite hilarious. It turns out that he has already been back doing another “final shtick tour” in England, but not tonight.

A sample: If you want to eat nothing, eat nouvelle cuisine. Do you know what it means? No food. The smaller the portion the more impressed people are, so long as the food’s got a fancy French name, haute cuisine. An empty plate with sauce!

As one critic wrote, Mason’s jokes “offer a window to a different era,” one whose caricatures and biases one can only hope we’ve moved beyond:

But it’s one thing for Jackie Mason to scowl at a seat in the front row and yell to the shocked audience member in his imagination, “These are jokes! They are just jokes!” and another to reprise statistical howlers, which are not jokes, to me. This blog found its reason for being partly as a place to expose, understand, and avoid them. I had earlier used this Jackie Mason opening to launch into a well-known fallacy of rejection using statistical significance tests. I’m going to go further this time around. I began by needling some leading philosophers of statistics: Continue reading

I first blogged this letter here. Below the references are some more recent blog links of relevance to this issue.

* Dear Reader: I am typing in some excerpts from a letter Stephen Senn shared with me in relation to my April 28, 2012 blogpost. It is a letter to the editor of Statistics in Medicine in response to S. Goodman. It contains several important points that get to the issues we’ve been discussing. You can read the full letter here. Sincerely, D. G. Mayo*

STATISTICS IN MEDICINE, LETTER TO THE EDITOR

From: Stephen Senn*

Some years ago, in the pages of this journal, Goodman gave an interesting analysis of ‘replication probabilities’ of p-values. Specifically, he considered the possibility that a given experiment had produced a p-value that indicated ‘significance’ or near significance (he considered the range p=0.10 to 0.001) and then calculated the probability that a study with equal power would produce a significant result at the conventional level of significance of 0.05. He showed, for example, that given an uninformative prior, and (subsequently) a resulting p-value that was exactly 0.05 from the first experiment, the probability of significance in the second experiment was 50 per cent. A more general form of this result is as follows. If the first trial yields p=α then the probability that a second trial will be significant at significance level α (and in the same direction as the first trial) is 0.5. Continue reading

Categories: 4 years ago!, reproducibility, S. Senn, Statistics
Tags: Evidence-based medicine, p-value vs posterior, significance tests, Stephen Senn
3 Comments

Below are the slides from my Popper talk at the LSE today (up to slide 70): (post any questions in the comments)

Categories: P-values, replication research, reproducibility, Statistics
11 Comments

I’m giving a Popper talk at the London School of Economics next Tuesday (10 May). If you’re in the neighborhood, I hope you’ll stop by.

A somewhat accurate blurb is here. I say “somewhat” because it doesn’t mention that I’ll talk a bit about the replication crisis in psychology, and the issues that crop up (or ought to) in connecting statistical results and the causal claim of interest.

Categories: Announcement
6 Comments

**Who should apply:**

**This workshop is open to graduate students in master’s or PhD programs in philosophy or the sciences, including computer science.**

*For additional information or to apply online, visit thinkandcode.vtlibraries.org, or contact Dr. Benjamin Jantzen at bjantzen@vt.edu*

Categories: Announcement
Leave a comment