Below are the slides from my Popper talk at the LSE today (up to slide 70): (post any questions in the comments)

Main menu

- LSE PH500 (May-June 2020)
- Frequentists in Exile
- Blog Bagel
- Elbar Grease
- (LSE) PH500
- 12-12-12 December 12 Seminar (10-12)
- 12-12-12 (background): Some Recipes for p-values, type 1 and 2 error probabilities, power, etc.
- 5 Dec. seminar reading (remember it is 10a.m.-12p.m.)
- 28 Nov. Seminar and Current U-Phil
- AUTUMN SEMINARS: Contemporary Philosophy of Statistics
- office hours week of Dec. 3-10 Dec:
- SUMMER SEMINARS: Contemporary Philosophy of Statistics

- W14Phil6334
- SEV APP
- PhilStat Spring 19
- Syllabus: Second Installment
- NOTES
- SLIDES
- Mayo Slides Meeting #1 (Phil 6334/Econ 6614)
- Mayo Slides: Meeting #2 (Phil 6334/Econ 6614) Part I (Bernoulli Trials)
- Mayo Slides: Meeting #2 (Phil 6334/Econ 6614) Part II (Logic)
- Mayo Slides Meeting #3 (Phil 6334/Econ 6614)
- Mayo Slides Meeting #4 (Phil 6334/Econ 6614)
- Mayo Slides #5 (Phil 6334/Econ 6614)
- Mayo Slides Meeting #6 (Phil 6334/Econ 6614)
- Mayo Slides Meeting #7 (Phil 6334/Econ 6614)
- Mayo Slides Meeting #9 (Phil 6334/Econ 6614)
- Mayo Slides Meeting #10 (Phil 6334/Econ 6614)
- Mayo Slides Meeting #11 (Phil 6334/Econ 6614)
- Mayo Slides Meeting #12 (Phil 6334/Econ 6614)
- Spanos Lecture Notes 1 (Phil 6334/Econ 6614)
- Spanos Lecture Notes 2 (Phil 6334/Econ 6614)
- Spanos Lecture Notes 3 (Phil 6334/Econ 6614)
- Spanos Lecture Notes 4 (Phil 6334/Econ 6614)
- Spanos Lecture Notes 5 (Phil 6334/Econ 6614)
- Spanos Lecture Notes 6 (Phil 6334/Econ 6614)
- Spanos Lecture Notes 7 (Phil 6334/Econ 6614)
- Spanos Lecture Notes 8 (Phil 6334/Econ 6614)

- SIST Tour Summaries
- Captain’s Biblio with Links
- Spanos ch 1, 2 & IID R.V. explained
- Additional Stats help

- Summer Seminar
- Mayo Pubs
- Senn’s posts+

Below are the slides from my Popper talk at the LSE today (up to slide 70): (post any questions in the comments)

This site uses Akismet to reduce spam. Learn how your comment data is processed.

- Peter Chapman on G.A. Barnard’s 105th Birthday: The Bayesian “catch-all” factor: probability vs likelihood
- john byrd on Live Exhibit: Bayes Factors & Those 6 ASA P-value Principles
- Mayo on September 24: Bayes factors from all sides: who’s worried, who’s not, and why (R. Morey)
- Christian Hennig on Live Exhibit: Bayes Factors & Those 6 ASA P-value Principles
- 5 September, 2018 (w/updates) RSS 2018 – Significance Tests: Rethinking the Controversy | Error Statistics Philosophy on “The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean” (Some Recommendations)(ii)
- Mayo on Statistical Crises and Their Casualties–what are they?
- The Physical Reality of My New Book! Here at the RSS Meeting (2 years ago) | Error Statistics Philosophy on SIST: All Excerpts and Mementos: May 2018-June 2020 (updated)
- Mark on Statistical Crises and Their Casualties–what are they?

- Live Exhibit: Bayes Factors & Those 6 ASA P-value Principles
- George Barnard: 100th birthday: "We need more complexity" (and coherence) in statistical education
- G.A. Barnard's 105th Birthday: The Bayesian "catch-all" factor: probability vs likelihood
- Statistical Science: The Likelihood Principle issue is out...!
- "The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean" (Some Recommendations)(ii)
- "A small p-value indicates it’s improbable that the results are due to chance alone" –fallacious or not? (more on the ASA p-value doc)
- Statistical Crises and Their Casualties--what are they?
- The Meaning of My Title: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars
- Spurious Correlations: Death by getting tangled in bedsheets and the consumption of cheese! (Aris Spanos)
- You Should Be Binge Reading the (Strong) Likelihood Principle

- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011

Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Deborah G. Mayo and Error Statistics Philosophy with appropriate and specific direction to the original content.

I found this a helpful look at this important issue, clarifying the debate and pointing forward to solutions (i.e., new best practices).

Question: what do you mean by “severe testing” as used by Popper? The only cite I recall is from “Conjectures and Refutations: The Growth of Scientific Knowledge” (1963).

“Confirmations should count only if they are the result of risky predictions; that is to say, if, unenlightened by the theory in question, we should have expected an event which was incompatible with the theory — an event which would have refuted the theory.”

But that does not appear to be what you mean.

Hadn’t seen your comment. It’s true that Popper didn’t adequately define severity. I think I give a better defn using error statistics.

H passes test T w/severity when T accords with x*

and

Pr(a worse accordance would have occurred; H is false)= high

* for an adequate accordance measure

Professor Mayo,

Thanks for the alternative perspective on severity, a quantitative rather than qualitative one.

I’ve long used Popper’s definition, finding it analytically useful to sort wheat from chaff.

I’d love to see a post contrasting the two approaches to testing theories, with the relative strengths and weaknesses of each. I suspect that when the stakes are high both are required.

Check my Error and the Growth of Experimental Knowledge. His severity is satisfied by the first theory to explain a fact–just for one example of its weakness. (He changed his views, but he affirmed “theoretical novelty” as what he intended by severity (i.e., T entails or fits novel fact x, in the sense that it hasn’t already been explained).

Professor Mayo,

Thanks for the pointer. I’ll check it out.

While I have you here, any thoughts — broadly speaking — on successful predictions as a “gold standard” (an anachronistic economic concept, but vivid) for testing theories?

No matter how sophisticated the math, backtesting models remains problematic, especially when the stakes are high.

Why lump Stapel (outright fraud) with garden-variety sloppy statistical methods (negligence)?

I don’t, what the slide notes is that in investigating him, a whole culture of verification bias emerged as routine.

The slide (read literally) does do this, but perhaps there is a connection: maybe the general sloppiness of statistical methods generally made easier for fakes to engage in such reckless data fabrication

Yes, but I was also pointing out that the investigators, (Levelt committee) were seeking to find out about Stapel and his coworkers (to see, for example, if co-authors were guilty) and to their shock, found themselves in a culture where leaving out results you don’t like, reporting just what looks good, mix and matching control and treated groups from different experiments (with the defense that it’s all random) were not only commonplace, the researchers claimed that’s what they were taught to do. I will link to one of my posts on Stapel. Interestingly, the audience yesterday was unfamiliar with this case.

https://errorstatistics.com/2015/06/14/some-statistical-dirty-laundry-the-tilberg-stapel-report-on-flawed-science/

Is it possible to distinguish a proposition as being believable but not well-tested? It seems somewhat plausible that the optimum way to test a proposition assumes everything we know, or believe with some degree of certainty, and nothing else other than the logical connections between those beliefs. Furthermore, it seems reasonable that our test should, on these assumptions, output some ideally quantitative indication of the validity or error of our proposition. But then it seems like a well-tested proposition is simply one for which we have calculated P(Proposition | Beliefs). Or perhaps, a well-tested proposition is, if we accept the proposition, one with very high probability, and if we reject the proposition, one with very low probability, and a proposition with middling probability is yet to be well-tested. Either way, we end up with some version of Bayesianism, collapsing the notions of believability and well-tested.

I’m playing devil’s advocate above, by the way. You might be interested that, as part of a report I conducted on Simonsohn et al.’s p-curves, I conducted a p-curve analysis of Joshua Knobe’s work (prominent experimental philosopher), and was able to reject the null of no evidential value with a minute p-value (10^-5 or something). But, unfortunately, in my analysis of p-curve theory I found that p-values which fail to account for multiple comparisons can bias the test for evidential massively in favour of rejecting, and I believe some of Knobe’s p-values failed to account for multiplicity.

Kim: No, in appraising whether test T (with data x) did a good job probing claim C, I wouldn’t consider everything I knew from other tests of C (although I would obviously use the background needed to assess test T). So I might say the deflection of light, with thus and so properties, was well tested in 1960, say; but not well tested in the famous 1919 eclipse experiment. The 1919 experimental test remains as imprecise as it was in 1919, even though in later years radioastronomy was capable of discerning errors not distinguishable in 1919. Of course time doesn’t matter, some of the 1919 experiments were decent, one from Sobral, no evidence at all. One needn’t consider anything so high falutin. We have good evidence x for mad cow, cloning or what have you, but you wouldn’t say that tea leaf reading supplies such evidence. For any well tested empirical claim C, I can find a method/data that does a lousy job in substantiating C. (Of course I can ask a question about overall evidence for Cwhich is different.)