philosophy of science

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Understanding Reproducibility & Error Correction in Science

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2016–2017
57th Annual Program

Download the 57th Annual Program

The Alfred I. Taub forum:

UNDERSTANDING REPRODUCIBILITY & ERROR CORRECTION IN SCIENCE

Cosponsored by GMS and BU’s BEST at Boston University.
Friday, March 17, 2017
1:00 p.m. – 5:00 p.m.
The Terrace Lounge, George Sherman Union
775 Commonwealth Avenue

  • Reputation, Variation, &, Control: Historical Perspectives
    Jutta Schickore History and Philosophy of Science & Medicine, Indiana University, Bloomington.
  • Crisis in Science: Time for Reform?
    Arturo Casadevall Molecular Microbiology & Immunology, Johns Hopkins
  • Severe Testing: The Key to Error Correction
    Deborah Mayo Philosophy, Virginia Tech
  • Replicate That…. Maintaining a Healthy Failure Rate in Science
    Stuart Firestein Biological Sciences, Columbia

 

boston-mayo-2017

Categories: Announcement, philosophy of science, Philosophy of Statistics, Statistical fraudbusting, Statistics

Heads I win, tails you lose? Meehl and many Popperians get this wrong (about severe tests)!

.

bending of starlight.

[T]he impressive thing about the 1919 tests of Einstein ‘s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted. The theory is incompatible with certain possible results of observation—in fact with results which everybody before Einstein would have expected. This is quite different from the situation I have previously described, [where]..it was practically impossible to describe any human behavior that might not be claimed to be a verification of these [psychological] theories.” (Popper, CR, [p. 36))

 

Popper lauds Einstein’s General Theory of Relativity (GTR) as sticking its neck out, bravely being ready to admit its falsity were the deflection effect not found. The truth is that even if no deflection effect had been found in the 1919 experiments it would have been blamed on the sheer difficulty in discerning so small an effect (the results that were found were quite imprecise.) This would have been entirely correct! Yet many Popperians, perhaps Popper himself, get this wrong.[i] Listen to Popperian Paul Meehl (with whom I generally agree).

The stipulation beforehand that one will be pleased about substantive theory T when the numerical results come out as forecast, but will not necessarily abandon it when they do not, seems on the face of it to be about as blatant a violation of the Popperian commandment as you could commit. For the investigator, in a way, is doing…what astrologers and Marxists and psychoanalysts allegedly do, playing heads I win, tails you lose.” (Meehl 1978, 821)

No, there is a confusion of logic. A successful result may rightly be taken as evidence for a real effect H, even though failing to find the effect need not be taken to refute the effect, or even as evidence as against H. This makes perfect sense if one keeps in mind that a test might have had little chance to detect the effect, even if it existed. The point really reflects the asymmetry of falsification and corroboration. Popperian Alan Chalmers wrote an appendix to a chapter of his book, What is this Thing Called Science? (1999)(which at first had criticized severity for this) once I made my case. [i] Continue reading

Categories: fallacy of non-significance, philosophy of science, Popper, Severity, Statistics | Tags:

The Unexpected Way Philosophy Majors Are Changing The World Of Business

 

Philosopher

Philosopher

“Philosophy majors rule” according to this recent article. We philosophers should be getting the word out. Admittedly, the type of people inclined to do well in philosophy are already likely to succeed in analytic areas. Coupled with the chuzpah of taking up an “outmoded and impractical” major like philosophy in the first place, innovative tendencies are not surprising.  But can the study of philosophy also promote these capacities? I think it can and does; yet it could be far more effective than it is, if it was less hermetic and more engaged with problem-solving across the landscape of science,statistics,law,medicine,and evidence-based policy. Here’s the article: Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics

Is it true that all epistemic principles can only be defended circularly? A Popperian puzzle

images-8Current day Popperians, the “critical rationalists”, espouse the following epistemic principle CR:[i]

(CR) it is reasonable to adopt or believe a claim or theory P which best survives serious criticism.

What justifies CR?  To merely declare it is a reasonable epistemic principle without giving evidence that following it advances any epistemic goals is entirely unsatisfactory, and decidedly un-Popperian in spirit.

Alan Musgrave (1999), leading critical rationalist, mounts a defence of CR that he openly concedes is circular, admitting, as he does, that such circular defences could likewise be used to argue for principles he himself regards as ‘crazy’.
However, he also gives a subtle and clever argument that it’s impossible to do better, that such a circular defense is the only kind possible. So since we’re reading Popper this week (some of us), and since an analogous argument arises in defending principles of statistical inference, try your hand at this conundrum. Continue reading

Categories: philosophy of science, Popper, Statistics

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2013–2014
54th Annual Program

Download the 54th Annual Program

REVISITING THE FOUNDATIONS OF STATISTICS IN THE ERA OF BIG DATA: SCALING UP TO MEET THE CHALLENGE

Cosponsored by the Department of Mathematics & Statistics at Boston University.
Friday, February 21, 2014
10 a.m. – 5:30 p.m.
Photonics Center, 9th Floor Colloquium Room (Rm 906)
8 St. Mary’s Street

10 a.m.–noon

  • Computational Challenges in Genomic Medicine
    Jill Mesirov Computational Biology and Bioinformatics, Broad Institute
  • Selection, Significance, and Signification: Issues in High Energy Physics
    Kent Staley Philosophy, Saint Louis University

1:30–5:30 p.m.

  • Multi-Resolution Inference: An Engineering (Engineered?) Foundation of Statistical Inference
    Xiao-Li Meng Statistics, Harvard University
  • Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?
    Deborah Mayo Philosophy, Virginia Tech
  • Targeted Learning from Big Data
    Mark van der Laan Biostatistics and Statistics, UC Berkeley

Panel Discussion

Boston Colloquium 2013-2014 (3)

Categories: Announcement, philosophy of science, Philosophy of Statistics, Statistical fraudbusting, Statistics

Surprising Facts about Surprising Facts

Mayo mirror

double-counting

A paper of mine on “double-counting” and novel evidence just came out: “Some surprising facts about (the problem of) surprising facts” in Studies in History and Philosophy of Science (2013), http://dx.doi.org/10.1016/j.shpsa.2013.10.005

ABSTRACT: A common intuition about evidence is that if data x have been used to construct a hypothesis H, then x should not be used again in support of H. It is no surprise that x fits H, if H was deliberately constructed to accord with x. The question of when and why we should avoid such ‘‘double-counting’’ continues to be debated in philosophy and statistics. It arises as a prohibition against data mining, hunting for significance, tuning on the signal, and ad hoc hypotheses, and as a preference for predesignated hypotheses and ‘‘surprising’’ predictions. I have argued that it is the severity or probativeness of the test—or lack of it—that should determine whether a double-use of data is admissible. I examine a number of surprising ambiguities and unexpected facts that continue to bedevil this debate.

Categories: double-counting, Error Statistics, philosophy of science, Statistics

Erich Lehmann: Statistician and Poet

Erich Lehmann 20 November 1917 – 12 September 2009

Erich Lehmann                       20 November 1917 –              12 September 2009

Today is Erich Lehmann’s birthday. The last time I saw him was at the Second Lehmann conference in 2004, at which I organized a session on philosophical foundations of statistics (including David Freedman and D.R. Cox).

I got to know Lehmann, Neyman’s first student, in 1997.  One day, I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that).  He told me he was sitting in a very large room at an ASA meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, dark table sat just one book, all alone, shiny red.  He said he wondered if it might be of interest to him!  So he walked up to it….  It turned out to be my Error and the Growth of Experimental Knowledge (1996, Chicago), which he reviewed soon after. Some related posts on Lehmann’s letter are here and here.

That same year I remember having a last-minute phone call with Erich to ask how best to respond to a “funny Bayesian example” raised by Colin Howson. It is essentially the case of Mary’s positive result for a disease, where Mary is selected randomly from a population where the disease is very rare. See for example here. (It’s just like the case of our high school student Isaac). His recommendations were extremely illuminating, and with them he sent me a poem he’d written (which you can read in my published response here*). Aside from being a leading statistician, Erich had a (serious) literary bent.

Juliet Shafer, Erich Lehmann, D. Mayo

Juliet Shafer, Erich Lehmann, D. Mayo

The picture on the right was taken in 2003 (by A. Spanos).

Mayo, D. G (1997a), “Response to Howson and Laudan,” Philosophy of Science 64: 323-333.

(Selected) Books

  • Testing Statistical Hypotheses, 1959
  • Basic Concepts of Probability and Statistics, 1964, co-author J. L. Hodges
  • Elements of Finite Probability, 1965, co-author J. L. Hodges
  • Lehmann, Erich L.; With the special assistance of H. J. M. D’Abrera (2006). Nonparametrics: Statistical methods based on ranks (Reprinting of 1988 revision of 1975 Holden-Day ed.). New York: Springer. pp. xvi+463. ISBN 978-0-387-35212-1. MR 2279708.
  • Theory of Point Estimation, 1983
  • Elements of Large-Sample Theory (1988). New York: Springer Verlag.
  • Reminiscences of a Statistician, 2007, ISBN 978-0-387-71596-4
  • Fisher, Neyman, and the Creation of Classical Statistics, 2011, ISBN 978-1-4419-9499-8 [published posthumously]

Articles (3 of very many)

Categories: philosophy of science, Statistics | Tags: ,

Is Particle Physics Bad Science? (memory lane)

Memory Lane: reblog July 11, 2012 (+ updates at the end). 

I suppose[ed] this was somewhat of a joke from the ISBA, prompted by Dennis Lindley, but as I [now] accord the actual extent of jokiness to be only ~10%, I’m sharing it on the blog [i].  Lindley (according to O’Hagan) wonders why scientists require so high a level of statistical significance before claiming to have evidence of a Higgs boson.  It is asked: “Are the particle physics community completely wedded to frequentist analysis?  If so, has anyone tried to explain what bad science that is?”

Bad science?   I’d really like to understand what these representatives from the ISBA would recommend, if there is even a shred of seriousness here (or is Lindley just peeved that significance levels are getting so much press in connection with so important a discovery in particle physics?)

Well, read the letter and see what you think.

On Jul 10, 2012, at 9:46 PM, ISBA Webmaster wrote:

Dear Bayesians,

A question from Dennis Lindley prompts me to consult this list in search of answers.

We’ve heard a lot about the Higgs boson.  The news reports say that the LHC needed convincing evidence before they would announce that a particle had been found that looks like (in the sense of having some of the right characteristics of) the elusive Higgs boson.  Specifically, the news referred to a confidence interval with 5-sigma limits.

Now this appears to correspond to a frequentist significance test with an extreme significance level.  Five standard deviations, assuming normality, means a p-value of around 0.0000005.  A number of questions spring to mind.

1.  Why such an extreme evidence requirement?  We know from a Bayesian  perspective that this only makes sense if (a) the existence of the Higgs  boson (or some other particle sharing some of its properties) has extremely small prior probability and/or (b) the consequences of erroneously announcing its discovery are dire in the extreme.  Neither seems to be the case, so why  5-sigma?

2.  Rather than ad hoc justification of a p-value, it is of course better to do a proper Bayesian analysis.  Are the particle physics community completely wedded to frequentist analysis?  If so, has anyone tried to explain what bad science that is? Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , ,

Severity as a ‘Metastatistical’ Assessment

Some weeks ago I discovered an error* in the upper severity bounds for the one-sided Normal test in section 5 of: “Statistical Science Meets Philosophy of Science Part 2” SS & POS 2.  The published article has been corrected.  The error was in section 5.3, but I am blogging all of 5.  

(* μo was written where xo should have been!)

5. The Error-Statistical Philosophy

I recommend moving away, once and for all, from the idea that frequentists must ‘sign up’ for either Neyman and Pearson, or Fisherian paradigms. As a philosopher of statistics I am prepared to admit to supplying the tools with an interpretation and an associated philosophy of inference. I am not concerned to prove this is what any of the founders ‘really meant’.

Fisherian simple-significance tests, with their single null hypothesis and at most an idea of  a directional alternative (and a corresponding notion of the ‘sensitivity’ of a test), are commonly distinguished from Neyman and Pearson tests, where the null and alternative exhaust the parameter space, and the corresponding notion of power is explicit. On the interpretation of tests that I am proposing, these are just two of the various types of testing contexts appropriate for different questions of interest. My use of a distinct term, ‘error statistics’, frees us from the bogeymen and bogeywomen often associated with ‘classical’ statistics, and it is to be hoped that that term is shelved. (Even ‘sampling theory’, technically correct, does not seem to represent the key point: the sampling distribution matters in order to evaluate error probabilities, and thereby assess corroboration or severity associated with claims of interest.) Nor do I see that my comments turn on whether one replaces frequencies with ‘propensities’ (whatever they are). Continue reading

Categories: Error Statistics, philosophy of science, Philosophy of Statistics, Severity, Statistics

Seminars at the London School of Economics: Contemporary Problems in Philosophy of Statistics

As a visitor of the Centre for Philosophy of Natural and Social Science (CPNSS) at the London School of Economics and Political Science, I am leading 3 seminars in the department of Philosophy, Logic, and Scientific Method on Wednesdays from Nov. 28-Dec 12 on Contemporary Philosophy of Statistics under the PH500 rubric, Room: Lak 2.06 (Lakatos building). Interested individuals who have not yet contacted me, write:  error@vt.edu .*
The Autumn seminars will also feature discussions with distinguished guest statisticians: Sir David Cox (Oxford); Dr. Stephen Senn: (Competence Center for Methodology and Statistics, Luxembourg); Dr. Christian Hennig (University College, London):
  • 28 November: (10 – 12 noon): Mayo: On Birnbaum’s argument for the Likelihood Principle: A 50-year old error and its influence on statistical foundations (See my blog and links within.)

5 December and 12 December: Statistical Science meets philosophy of science: Mayo and guests:

  • 5 Dec: 12 (noon)- 2p.m.: Sir David Cox
  • 12 Dec (10-12).Dr. Stephen Senn;
    Dr. Christian Hennig: TBA

Topics, activities, readings :TBA (Two 2012 Summer Seminars may be found here).

Blurb: Debates over the philosophical foundations of statistical science have a long and fascinating history marked by deep and passionate controversies that intertwine with fundamental notions of the nature of statistical inference and the role of probabilistic concepts in inductive learning. Progress in resolving decades-old controversies which still shake the foundations of statistics, demands both philosophical and technical acumen, but gaining entry into the current state of play requires a roadmap that zeroes in on core themes and current standpoints. While the seminar will attempt to minimize technical details, it will be important to clarify key notions to fully contribute to the debates. Relevance for general philosophical problems will be emphasized. Because the contexts in which statistical methods are most needed are ones that compel us to be most aware of strategies scientists use to cope with threats to reliability, considering the nature of statistical method in the collection, modeling, and analysis of data is an effective way to articulate and warrant general principles of evidence and inference.
Room 2.06 Lakatos Building; Centre for Philosophy of Natural and Social Science
 London School of Economics
 Houghton Street
London WC2A 2AE
Administrator: T. R. Chivers@lse.ac.uk

For  updates, details, and associated readings: please check the LSE Ph500 page on my blog or write to me.
*It is not necessary to have attended the 2 sessions held during the summer of 2012.

Categories: Announcement, philosophy of science, Statistics | Tags: ,

PhilStat: So you’re looking for a Ph.D dissertation topic?

Maybe you’ve already heard Hal Varian, Google’s chief economist: “The next sexy job in the next ten years will be statisticians.” Even Larry Wasserman declares that “statistics is sexy.” In that case, philosophy of statistics must be doubly so!

Thus one wonders at the decline of late in the lively and long-standing exchange between philosophers of science and statisticians. If you are a graduate student wondering how you might make your mark in a philosophy of science area, philosophy of statistical science, fairly brimming over with rich and open philosophical problems, may be the thing for you!* Surprising, pressing, intriguing, and novel philosophical twists on both traditional and cutting-edge controversies are going begging for analysis—they not only bear on many areas of popular philosophy but also may offer you ways of getting out in front of them.

I came across a spotty blog by Pitt graduate student Gregory Gandenberger awhile back (not like his new, frequently updated one) where he was wrestling with a topic for his masters thesis, and some years later, wrangling over dissertation topics in philosophy of statistics. After I started this blog, I looked for it again, and now I’ve invited him to post, on the topic of his choice, as he did here, and I invite other graduate students though the U-Phil call. Continue reading

Categories: Error Statistics, philosophy of science, Philosophy of Statistics

Mayo: (section 6) “StatSci and PhilSci: part 2″

Here is section 6 of my paper: “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. Section 5 is in my last post.

6. Some Knock-Down Criticisms of Frequentist Error Statistics

 With the error-statistical philosophy of inference under our belts, it is easy to run through the classic and allegedly damning criticisms of frequentist errorstatistical methods. Open up Bayesian textbooks and you will find, endlessly reprised, the handful of ‘counterexamples’ and ‘paradoxes’ that make up the charges leveled against frequentist statistics, after which the Bayesian account is proffered as coming to the rescue. There is nothing about how frequentists have responded to these charges; nor evidence that frequentist theory endorses the applications or interpretations around which these ‘chestnuts’ revolve.

If frequentist and Bayesian philosophies are to find common ground, this should stop. The value of a generous interpretation of rival views should cut both ways. A key purpose of the forum out of which this paper arises is to encourage reciprocity.

Continue reading

Categories: Error Statistics, philosophy of science, Philosophy of Statistics

Mayo: (section 5) “StatSci and PhilSci: part 2”

Here is section 5 of my new paper: “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. Sections 1 and 2 are in my last post.*

5. The Error-Statistical Philosophy

I recommend moving away, once and for all, from the idea that frequentists must ‘sign up’ for either Neyman and Pearson, or Fisherian paradigms. As a philosopher of statistics I am prepared to admit to supplying the tools with an interpretation and an associated philosophy of inference. I am not concerned to prove this is what any of the founders ‘really meant’.

Fisherian simple-significance tests, with their single null hypothesis and at most an idea of  a directional alternative (and a corresponding notion of the ‘sensitivity’ of a test), are commonly distinguished from Neyman and Pearson tests, where the null and alternative exhaust the parameter space, and the corresponding notion of power is explicit. On the interpretation of tests that I am proposing, these are just two of the various types of testing contexts appropriate for different questions of interest. My use of a distinct term, ‘error statistics’, frees us from the bogeymen and bogeywomen often associated with ‘classical’ statistics, and it is to be hoped that that term is shelved. (Even ‘sampling theory’, technically correct, does not seem to represent the key point: the sampling distribution matters in order to evaluate error probabilities, and thereby assess corroboration or severity associated with claims of interest.) Nor do I see that my comments turn on whether one replaces frequencies with ‘propensities’ (whatever they are). Continue reading

Categories: Error Statistics, philosophy of science, Philosophy of Statistics, Severity

Insevere tests and pseudoscience

Against the PSI skeptics of this period (discussed in my last post), defenders of PSI would often erect means to take experimental results as success stories (e.g., if he failed to correctly predict the next card, maybe he was aiming at the second or third card). If the data could not be made to fit some ESP claim or other (e.g., through multiple end points) it might, as a last resort, be explained away as due to negative energy of nonbelievers (or being on the Carson show). They manage to get their ESP hypothesis H to “pass,” but the “test” had little or no capability of finding (uncovering, admitting) the falsity of H, even if H is false. (This is the basis for my term “Gellerization”.) In such cases, I would deny that the results afford any evidence for H. They are terrible evidence for H. Now any domain will have some terrible tests, but a field that routinely passes off terrible tests as success stories I would deem pseudoscientific. 

We get a kind of minimal requirement for a test result to afford any evidence of assertion H, however partial and approximate H may be:  If a hypothesis H is assured of having* “passed” a test T, even if H is false, then test T is a terrible test or no test at all.**

Far from trying to reveal flaws, it masks them or prevents them from being uncovered. No one would be impressed to learn their bank had passed a “stress test” if it turns out that the test had little or no chance of giving a failing score to any bank, regardless of its ability to survive a stressed economy. (Would they?)

There are a million different ways to flesh out the idea, and I welcome hearing others. Now you might say that no one would disagree with this. Great. Because a core requirement for an adequate account of inquiry, as I see it, is that it be able to capture this rationale for pretty terrible evidence and fairly pseudoscientific inquiry– and it should do so in such a way that affords a starting point for not-so-awful tests, and rather reliable learning.

* or very probably would have passed.

**QUESTION: I seek your input: which sounds better, or is more accurate: saying a test T passes a hypothesis H, or that a hypothesis H passes a test T? I’ve used both and want to settle on one.

Categories: Error Statistics, philosophy of science

Statistics and ESP research (Diaconis)

In the early ‘80s, fresh out of graduate school, I persuaded Persi Diaconis, Jack Good, and Patrick Suppes to participate in a session I wanted to organize on ESP and statistics. It seems remarkable to me now—not only that they agreed to participate*, but the extent that PSI research was taken seriously at the time. It wasn’t much later that all the recurring errors and loopholes, and the persistent cheating self-delusion —despite earnest attempts to trigger and analyze the phenomena—would lead many nearly everyone to label PSI research a “degenerating programme” (in the Popperian-Lakatosian sense).

(Though I’d have to check names and dates, I seem to recall that the last straw was when some of the Stanford researchers were found guilty of (unconscious) fraud. Jack Good continued to be interested in the area, but less so, I think. I do not know about the others.)

It is interesting to see how background information enters into inquiry here. So, even though it’s late on a Saturday night, here’s a snippet from one of the papers that caught my interest in graduate school: Diaconis’s (1978) “Statistical Problems in ESP Research“, in Science, along with some critical “letters”

Summary. In search of repeatable ESP experiments, modern investigators are using more complex targets, richer and freer responses, feedback, and more naturalistic conditions. This makes tractable statistical models less applicable. Moreover, controls often are so loose that no valid statistical analysis is possible. Some common problems are multiple end points, subject cheating, and unconscious sensory cueing. Unfortunately, such problems are hard to recognize from published records of the experiments in which they occur; rather, these problems are often uncovered by reports of independent skilled observers who were present during the experiment. This suggests that magicians and psychologists be regularly used as observers. New statistical ideas have been developed for some of the new experiments. For example, many modern ESP studies provide subjects with feedback—partial information about previous guesses—to reward the subjects for correct guesses in hope of inducing ESP learning. Some feedback experiments can be analyzed with the use of skill-scoring, a statistical procedure that depends on the information available and the way the guessing subject uses this information. (p. 131) Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics

More on using background info

For the second* bit of background on the use of background info (for the new U-Phil for 9/21/12 9/25/12, I’ll reblog:

Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs

…I am discovering that one of the biggest sources of confusion about the foundations of statistics has to do with what it means or should mean to use “background knowledge” and “judgment” in making statistical and scientific inferences. David Cox and I address this in our “Conversation” in RMM (2011)….

Insofar as humans conduct science and draw inferences, and insofar as learning about the world is not reducible to a priori deductions, it is obvious that “human judgments” are involved. True enough, but too trivial an observation to help us distinguish among the very different ways judgments should enter according to contrasting inferential accounts. When Bayesians claim that frequentists do not use or are barred from using background information, what they really mean is that frequentists do not use prior probabilities of hypotheses, at least when those hypotheses are regarded as correct or incorrect, if only approximately. So, for example, we would not assign relative frequencies to the truth of hypotheses such as (1) prion transmission is via protein folding without nucleic acid, or (2) the deflection of light is approximately 1.75” (as if, as Pierce puts it, “universes were as plenty as blackberries”). How odd it would be to try to model these hypotheses as themselves having distributions: to us, statistical hypotheses assign probabilities to outcomes or values of a random variable. Continue reading

Categories: Background knowledge, philosophy of science, Philosophy of Statistics, Statistics | Tags: ,

After dinner Bayesian comedy hour….

Given it’s the first anniversary of this blog, which opened with the howlers in “Overheard at the comedy hour …” let’s listen in as a Bayesian holds forth on one of the most famous howlers of the lot: the mysterious role that psychological intentions are said to play in frequentist methods such as statistical significance tests. Here it is, essentially as I remember it (though shortened), in the comedy hour that unfolded at my dinner table at an academic conference:

 Did you hear the one about the researcher who gets a phone call from the guy analyzing his data? First the guy congratulates him and says, “The results show a statistically significant difference at the .05 level—p-value .048.” But then, an hour later, the phone rings again. It’s the same guy, but now he’s apologizing. It turns out that the experimenter intended to keep sampling until the result was 1.96 standard deviations away from the 0 null—in either direction—so they had to reanalyze the data (n=169), and the results were no longer statistically significant at the .05 level.

 Much laughter.

 So the researcher is tearing his hair out when the same guy calls back again. “Congratulations!” the guy says. “I just found out that the experimenter actually had planned to take n=169 all along, so the results are statistically significant.”

 Howls of laughter.

 But then the guy calls back with the bad news . . .

It turns out that failing to score a sufficiently impressive effect after n’ trials, the experimenter went on to n” trials, and so on and so forth until finally, say, on trial number 169, he obtained a result 1.96 standard deviations from the null.

It continues this way, and every time the guy calls in and reports a shift in the p-value, the table erupts in howls of laughter! From everyone except me, sitting in stunned silence, staring straight ahead. The hilarity ensues from the idea that the experimenter’s reported psychological intentions about when to stop sampling is altering the statistical results. Continue reading

Categories: Comedy, philosophy of science, Philosophy of Statistics, Statistics | Tags: , , ,

knowledge/evidence not captured by mathematical prob.

Mayo mirror

Equivocations between informal and formal uses of “probability” (as well as “likelihood” and “confidence”) are responsible for much confusion in statistical foundations, as is remarked in a famous paper I was rereading today by Allan Birnbaum:

“It is of course common nontechnical usage to call any proposition probable or likely if it is supported by strong evidence of some kind. .. However such usage is to be avoided as misleading in this problem-area, because each of the terms probability, likelihood and confidence coefficient is given a distinct mathematical and extramathematical usage.” (1969, 139 Note 4).

For my part, I find that I never use probabilities to express degrees of evidence (either in mathematical or extramathematical uses), but I realize others might. Even so, I agree with Birnbaum “that such usage is to be avoided as misleading in” foundational discussions of evidence. We know, infer, accept, and detach from evidence, all kinds of claims without any inclination to add an additional quantity such as a degree of probability or belief arrived at via, and obeying, the formal probability calculus.

It is interesting, as a little exercise, to examine scientific descriptions of the state of knowledge in a field. A few days ago, I posted something from Weinberg on the Higgs particle. Here are some statements, with some terms emphasized:

The general features of the electroweak theory have been well tested; their validity is not what has been at stake in the recent experiments at CERN and Fermilab, and would not be seriously in doubt even if no Higgs particle had been discovered.

I see no suggestion of a formal application of Bayesian probability notions. Continue reading

Categories: philosophy of science, Philosophy of Statistics | Tags: , , ,

“Did Higgs Physicists Miss an Opportunity by Not Consulting More With Statisticians?”

On August 20 I posted the start of  “Discussion and Digest” by Bayesian statistician Tony O’Hagan– an oveview of  responses to his letter (ISBA website) on the use of p-values in analyzing the Higgs data, prompted, in turn, by a query of subjective Bayesian Dennis Lindley.  I now post the final section in which he discusses his own view. I think it raises many  questions of interest both as regards this case, and more generally about statistics and science. My initial July 11 post is here.

“Higgs Boson – Digest and Discussion” By Tony O’Hagan

Discussion

So here are some of my own views on this.

There are good reasons for being cautious and demanding a very high standard of evidence before announcing something as momentous as H. It is acknowledged by those who use it that the 5-sigma standard is a fudge, though. They would surely be willing to make such an announcement if they were, for instance, 99.99% certain of H’s existence, as long as that 99.99% were rigorously justified. 5-sigma is used because they don’t feel able to quantify the probability of H rigorously. So they use the best statistical analysis that they know how to do, but because they also know there are numerous factors not taken into account by this analysis – the multiple testing, the likelihood of unrecognised or unquantified deficiencies in the data, experiment or statistics, and the possibility of other explanations – they ask for what on the face of it is an absurdly high level of significance from that analysis. Continue reading

Categories: philosophy of science, Philosophy of Statistics, Statistics | Tags: ,

Scalar or Technicolor? S. Weinberg, “Why the Higgs?”

CERN’s Large Hadron Collider under construction, 2007

My colleague in philosophy at Va Tech, Ben Jantzen*, sent me this piece by Steven Weinberg on the Higgs. Even though it does not deal with the statistics, it manages to clarify some of the general theorizing more clearly than most of the other things I’ve read. (See also my previous post.)

Why the Higgs?
August 16, 2012
Steven Weinberg

The New York Times Review of Books

The following is part of an introduction to James Baggott’s new book Higgs: The Invention and Discovery of the “God Particle,” which will be published in August by Oxford University Press. Baggott wrote his book anticipating the recent announcement of the discovery at CERN near Geneva—with some corroboration from Fermilab—of a new particle that seems to be the long-sought Higgs particle. Much further research on its exact identity is to come.

It is often said that what was at stake in the search for the Higgs particle was the origin of mass. True enough, but this explanation needs some sharpening.

By the 1980s we had a good comprehensive theory of all observed elementary particles and the forces (other than gravitation) that they exert on one another. One of the essential elements of this theory is a symmetry, like a family relationship, between two of these forces, the electromagnetic force and the weak nuclear force. Electromagnetism is responsible for light; the weak nuclear force allows particles inside atomic nuclei to change their identity through processes of radioactive decay. The symmetry between the two forces brings them together in a single “electroweak” structure. The general features of the electroweak theory have been well tested; their validity is not what has been at stake in the recent experiments at CERN and Fermilab, and would not be seriously in doubt even if no Higgs particle had been discovered. Continue reading

Categories: philosophy of science | Tags: ,

Blog at WordPress.com.