To Quarantine or not to Quarantine?: Science & Policy in the time of Ebola

Unknown

.

 Bioethicist Arthur Caplan gives “7 Reasons Ebola Quarantine Is a Bad, Bad Idea”. I’m interested to know what readers think (I claim no expertise in this area.) My occasional comments are in red. 

“Bioethicist: 7 Reasons Ebola Quarantine Is a Bad, Bad Idea”

In the fight against Ebola some government officials in the U.S. are now managing fear, not the virus. Quarantines have been declared in New York, New Jersey and Illinois. In Connecticut, nine people are in quarantine: two students at Yale; a worker from AmeriCARES; and a West African family.

Many others are or soon will be.

Quarantining those who do not have symptoms is not the way to combat Ebola. In fact it will only make matters worse. Far worse. Why?

  1. Quarantining people without symptoms makes no scientific sense.

They are not infectious. The only way to get Ebola is to have someone vomit on you, bleed on you, share spit with you, have sex with you or get fecal matter on you when they have a high viral load.

How do we know this?

Because there is data going back to 1975 from outbreaks in the Congo, Uganda, Sudan, Gabon, Ivory Coast, South Africa, not to mention current experience in the United States, Spain and other nations.

The list of “the only way to get Ebola” does not suggest it is so extraordinarily difficult to transmit as to imply the policy “makes no scientific sense”. That there is “data going back to 1975″ doesn’t tell us how it was analyzed. They may not be infectious today, but…

  1. Quarantine is next to impossible to enforce.

If you don’t want to stay in your home or wherever you are supposed to stay for three weeks, then what? Do we shoot you, Taser you, drag you back into your house in a protective suit, or what?

And who is responsible for watching you 24-7? Quarantine relies on the honor system. That essentially is what we count on when we tell people with symptoms to call 911 or the health department.

It does appear that this hasn’t been well thought through yet. NY Governor Cuomo said that “Doctors Without Borders”, the group that sponsors many of the volunteers, already requires volunteers to “decompress” for three weeks upon return from Africa, and they compensate their doctors during this time (see the above link). The state of NY would fill in for those sponsoring groups that do not offer compensation (at least in NY). Is the existing 3 week decompression period already a clue that they want people cleared before they return to work?

  1. Quarantine is likely to be challenged in court.

And those challenging it are likely to win. Science does not support it.

Maybe so. I’m fairly sure that future volunteers will have to agree in advance. For those who haven’t agreed, I have no clue about legal precedence.

  1. Large-scale quarantine has not been thought through, in terms of making it bearable for those confined.

If government does not make it tolerable — and they show no signs of doing so, other than succeeding in stigmatizing people who are not dangerous — then people will not honor quarantine.

It isn’t clear it couldn’t be made tolerable. I think people should be compensated if they are required to be quarantined.

  1. Health care workers who take care of those who really do have Ebola at big hospitals, such as Bellevue or Emory, are at the greatest risk.

If you quarantine them you are taking your best professionals offline for three weeks — and there are not a lot of replacements.

  1. Who will volunteer to go to West Africa to stamp out the epidemic, if they know they face three weeks of confinement upon their return?

Those who go are heroes who face hell on earth. Can’t they be trusted to do the right thing and self-monitor when they get back?

A serious worry is, indeed, that the new policy will kill off volunteers.

7. When officials respond to panic with quarantine they basically say they can’t trust public health officials, science and the ethics of doctors and nurses.

There is no substitute for that trust. None. If state and city officials undermine trust out of panic or politics, then they destroy the best weapon we have to control Ebola — good science implemented by heroes.

 I don”t see why elected officials imposing a quarantine means they can’t trust public health officials, although obviously it would be best if health officials are behind it.

It may be too late to reverse the leap to quarantine for those politicians deem at risk. But it is a leap into an unknown that we are likely to come to regret.

First published October 26th 2014, 6:11 pm

ARTHUR CAPLAN

Arthur L. Caplan, Ph.D., is the Drs. William F. and Virginia Connolly Mitty Professor and founding head.

Share comments, and relevant links.

Categories: science communication | Tags: | 35 Comments

3 YEARS AGO: MONTHLY MEMORY LANE

Hand writing a letter with a goose feather

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: October 2011 (I mark in red 3 posts that seem most apt for general background on key issues in this blog*)

*I indicated I’d begin this new, once-a-month feature at the 3-year anniversary. I will repost and comment on one each month. For newcomers, here’s your chance to catch-up; for old timers, this is philosophy: rereading is essential!

Categories: 3-year memory lane, blog contents, Statistics | Leave a comment

September 2014: Blog Contents

metablog old fashion typewriterSeptember 2014: Error Statistics Philosophy
Blog Table of Contents 

Compiled by Jean A. Miller

  • (9/30) Letter from George (Barnard)
  • (9/27) Should a “Fictionfactory” peepshow be barred from a festival on “Truth and Reality”? Diederik Stapel says no (rejected post)
  • (9/23) G.A. Barnard: The Bayesian “catch-all” factor: probability vs likelihood
  • (9/21) Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”
  • (9/18) Uncle Sam wants YOU to help with scientific reproducibility!
  • (9/15) A crucial missing piece in the Pistorius trial? (2): my answer (Rejected Post)
  • (9/12) “The Supernal Powers Withhold Their Hands And Let Me Alone”: C.S. Peirce
  • (9/6) Statistical Science: The Likelihood Principle issue is out…!
  • (9/4) All She Wrote (so far): Error Statistics Philosophy Contents-3 years on
  • (9/3) 3 in blog years: Sept 3 is 3rd anniversary of errorstatistics.com

 

 

 

 

Categories: Announcement, blog contents, Statistics | Leave a comment

PhilStat/Law: Nathan Schachtman: Acknowledging Multiple Comparisons in Statistical Analysis: Courts Can and Must

NAS-3

.

The following is from Nathan Schachtman’s legal blog, with various comments and added emphases (by me, in this color). He will try to reply to comments/queries.

“Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses”

Nathan Schachtman, Esq., PC * October 14th, 2014

In excluding the proffered testimony of Dr. Anick Bérard, a Canadian perinatal epidemiologist in the Université de Montréal, the Zoloft MDL trial court discussed several methodological shortcomings and failures, including Bérard’s reliance upon claims of statistical significance from studies that conducted dozens and hundreds of multiple comparisons.[i] The Zoloft MDL court was not the first court to recognize the problem of over-interpreting the putative statistical significance of results that were one among many statistical tests in a single study. The court was, however, among a fairly small group of judges who have shown the needed statistical acumen in looking beyond the reported p-value or confidence interval to the actual methods used in a study[1].

.

.

A complete and fair evaluation of the evidence in situations as occurred in the Zoloft birth defects epidemiology required more than the presentation of the size of the random error, or the width of the 95 percent confidence interval.  When the sample estimate arises from a study with multiple testing, presenting the sample estimate with the confidence interval, or p-value, can be highly misleading if the p-value is used for hypothesis testing.  The fact of multiple testing will inflate the false-positive error rate. Dr. Bérard ignored the context of the studies she relied upon. What was noteworthy is that Bérard encountered a federal judge who adhered to the assigned task of evaluating methodology and its relationship with conclusions.

*   *   *   *   *   *   *

There is no unique solution to the problem of multiple comparisons. Some researchers use Bonferroni or other quantitative adjustments to p-values or confidence intervals, whereas others reject adjustments in favor of qualitative assessments of the data in the full context of the study and its methods. See, e.g., Kenneth J. Rothman, “No Adjustments Are Needed For Multiple Comparisons,” 1 Epidemiology 43 (1990) (arguing that adjustments mechanize and trivialize the problem of interpreting multiple comparisons). Two things are clear from Professor Rothman’s analysis. First for someone intent upon strict statistical significance testing, the presence of multiple comparisons means that the rejection of the null hypothesis cannot be done without further consideration of the nature and extent of both the disclosed and undisclosed statistical testing. Rothman, of course, has inveighed against strict significance testing under any circumstance, but the multiple testing would only compound the problem.

Second, although failure to adjust p-values or intervals quantitatively may be acceptable, failure to acknowledge the multiple testing is poor statistical practice. The practice is, alas, too prevalent for anyone to say that ignoring multiple testing is fraudulent, and the Zoloft MDL court certainly did not condemn Dr. Bérard as a fraudfeasor[2]. [emphasis mine]

I’m perplexed by this mixture of stances. If you don’t mention the multiple testing for which it is acceptable not to adjust, then you’re guilty of poor statistical practice; but its “too prevalent for anyone to say that ignoring multiple testing is fraudulent”. This appears to claim it’s poor statistical practice if you fail to mention your results are due to multiple testing, but “ignoring multiple testing” (which could mean failing to adjust or, more likely, failing to mention it) is not fraudulent. Perhaps, it’s a questionable research practice QRP. It’s back to “50 shades of grey between QRPs and fraud.”

  […read his full blogpost here]

Previous cases have also acknowledged the multiple testing problem. In litigation claims for compensation for brain tumors for cell phone use, plaintiffs’ expert witness relied upon subgroup analysis, which added to the number of tests conducted within the epidemiologic study at issue. Newman v. Motorola, Inc., 218 F. Supp. 2d 769, 779 (D. Md. 2002), aff’d, 78 Fed. App’x 292 (4th Cir. 2003). The trial court explained:

“[Plaintiff’s expert] puts overdue emphasis on the positive findings for isolated subgroups of tumors. As Dr. Stampfer explained, it is not good scientific methodology to highlight certain elevated subgroups as significant findings without having earlier enunciated a hypothesis to look for or explain particular patterns, such as dose-response effect. In addition, when there is a high number of subgroup comparisons, at least some will show a statistical significance by chance alone.”

I’m going to require, as part of its meaning, that a statistically significant difference not be one due to “chance variability” alone. Then to avoid self contradiction, this last sentence might be put as follows: “when there is a high number of subgroup comparisons, at least some will show purported or nominal or unaudited statistical significance by chance alone. [Which term do readers prefer?] If one hunts down one’s hypothesized comparison in the data, then the actual p-value will not equal, and will generally be greater than, the nominal or unaudited p-value.”

So, I will insert “nominal” where needed below (in red).

Texas Sharpshooter fallacy

Id. And shortly after the Supreme Court decided Daubert, the Tenth Circuit faced the reality of data dredging in litigation, and its effect on the meaning of “significance”:

“Even if the elevated levels of lung cancer for men had been [nominally] statistically significant a court might well take account of the statistical “Texas Sharpshooter” fallacy in which a person shoots bullets at the side of a barn, then, after the fact, finds a cluster of holes and draws a circle around it to show how accurate his aim was. With eight kinds of cancer for each sex there would be sixteen potential categories here around which to “draw a circle” to show a [nominally] statistically significant level of cancer. With independent variables one would expect one statistically significant reading in every twenty categories at a 95% confidence level purely by random chance.”

The Texas sharpshooter fallacy is one of my all time favorites. One purports to be testing the accuracy of his aim, when in fact that is not the process that gave rise to the impressive-looking (nominal) cluster of hits. The results do not warrant inferences about his ability to accurately hit a target, since that hasn’t been well-probed.

  [...read his full blogpost here]

The notorious Wells[4] case was cited by the Supreme Court in Matrixx Initiatives[5] for the proposition that statistical significance was unnecessary. Ironically, at least one of the studies relied upon by the plaintiffs’ expert witnesses in Wells had some outcomes with p-values below five percent. The problem, addressed by defense expert witnesses and ignored by the plaintiffs’ witnesses and Judge Shoob, was that there were over 20 reported outcomes, and probably many more outcomes analyzed but not reported. Accordingly, some qualitative or quantitative adjustment was required in Wells. See Hans Zeisel & David Kaye, Prove It With Figures: Empirical Methods in Law and Litigation 93 (1997)[6].

Maybe Schachtman will be willing to explain the first sentence of the above para. We’ve discussed the Matrixx case several times on this blog, but I don’t know the notorious Wells case.

Reference Manual on Scientific Evidence

David Kaye’s and the late David Freedman’s chapter on statistics in the third, most recent, edition of Reference Manual, offers some helpful insights into the problem of multiple testing:

4. How many tests have been done?

Repeated testing complicates the interpretation of significance levels. If enough comparisons are made, random error almost guarantees that some will yield ‘significant’ findings, even when there is no real effect. To illustrate the point, consider the problem of deciding whether a coin is biased. The probability that a fair coin will produce 10 heads when tossed 10 times is (1/2)10 = 1/1024. Observing 10 heads in the first 10 tosses, therefore, would be strong evidence that the coin is biased. Nonetheless, if a fair coin is tossed a few thousand times, it is likely that at least one string of ten consecutive heads will appear. Ten heads in the first ten tosses means one thing; a run of ten heads somewhere along the way to a few thousand tosses of a coin means quite another. A test—looking for a run of ten heads—can be repeated too often.

Artifacts from multiple testing are commonplace. Because research that fails to uncover significance often is not published, reviews of the literature may produce an unduly large number of studies finding statistical significance.111 Even a single researcher may examine so many different relationships that a few will achieve [nominal] statistical significance by mere happenstance. Almost any large dataset—even pages from a table of random digits—will contain some unusual pattern that can be uncovered by diligent search. Having detected the pattern, the analyst can perform a statistical test for it, blandly ignoring the search effort. [Nominal] statistical significance is bound to follow.

There are statistical methods for dealing with multiple looks at the data, which permit the calculation of meaningful p-values in certain cases.112 However, no general solution is available… . In these situations, courts should not be overly impressed with claims that estimates are [nominally] significant. …”

Reference Manual on Scientific Evidence at 256-57 (3d ed. 2011).

When a lawyer asks a witness whether a sample statistic is “statistically significant,” there is the danger that the answer will be interpreted or argued as a Type I error rate, or worse yet, as a posterior probability for the null hypothesis.  When the sample statistic has a p-value below 0.05, in the context of multiple testing, completeness requires the presentation of the information about the number of tests and the distorting effect of multiple testing on preserving a pre-specified Type I error rate.  Even a [nominally] statistically significant finding must be understood in the full context of the study. [emphasis mine]

I don’t understand the danger of it’s being reported as a Type I error, especially when the next sentence correctly notes “the distorting effect of multiple testing on preserving a pre-specified Type I error rate.” The only danger could be reporting the Type 1 error probability that would have held under the assumption there would be a predesignated hypothesis and no selection effects, when in fact multiple testing occurred. Knowing there was going to be multiple testing, the person could report, pre-data: “Since we are going to be hunting and searching for nominal significance among k factors, the Type I error rate is quite high”.  Or, the predesignated error rate could be low, if each of k tests is adjusted.

Some texts and journals recommend that the Type I error rate not be modified in the paper, as long as readers can observe the number of multiple comparisons that took place and make the adjustment for themselves. [emphasis mine]  Most jurors and judges are not sufficiently knowledgeable to make the adjustment without expert assistance, and so the fact of multiple testing, and its implication, are additional examples of how the rule of completeness may require the presentation of appropriate qualifications and explanations at the same time as the information about “statistical significance.”

This suggestion that readers “make the adjustment for themselves” reminds me of the recommendation that came up in a recent post about taking the stopping rule into account “later on”. If it influences the evidential warrant of the data,  then it makes no sense to say, “here’s the evidence but I engaged in various shenanigans, so now you go figure out what the real evidence is.”

*     *     *     *     *

Despite the guidance provided by the Reference Manual, some courts have remained resistant to the need to consider multiple comparison issues. Statistical issues arise frequently in securities fraud cases against pharmaceutical cases, involving the need to evaluate and interpret clinical trial data for the benefit of shareholders. In a typical case, joint venturers Aeterna Zentaris Inc. and Keryx Biopharmaceuticals, Inc., were both targeted by investors for alleged Rule 10(b)(5) violations involving statements of clinical trial results, made in SEC filings, press releases, investor presentations and investor conference calls from 2009 to 2012. [ii]The clinical trial at issue tested perifosine in conjunction with, and without, other therapies, in multiple arms, which examined efficacy for seven different types of cancer. After a preliminary phase II trial yielded promising results for metastatic colon cancer, the colon cancer arm proceeded. According to plaintiffs, the defendants repeatedly claimed that perifosine had demonstrated “statistically significant positive results.” In re Keryx at *2, 3.

The plaintiffs alleged that defendants’ statements omitted material facts, including the full extent of multiple testing in the design and conduct of the phase II trial, without adjustments supposedly “required” by regulatory guidance and generally accepted statistical principles. The plaintiffs asserted that the multiple comparisons involved in testing perifosine in so many different kinds of cancer patients, at various doses, with and against so many different types of other cancer therapies, compounded by multiple interim analyses, inflated the risk of Type I errors such that some statistical adjustment should have been applied before claiming that a statistically significant survival benefit had been found in one arm, with colorectal cancer patients. In re Keryx at *2-3, *10.

The trial court dismissed these allegation given that the trial protocol had been published, although over two years after the initial press release, which started the class period, and which failed to disclose the full extent of multiple testing and lack of statistical correction, which omitted this disclosure….The trial court was loathe to allow securities fraud claims over allegations of improper statistical methodology, which:

would be equivalent to a determination that if a researcher leaves any of its methodology out of its public statements — how it did what it did or was planning to do — it could amount to an actionable false statement or omission. This is not what the law anticipates or requires.” [emphasis mine]

Talk about an illicit slippery slope. Requiring information on the source of erroneous interpretations of statistical evidence is not “equivalent” to requiring the researcher report every detail about what it was planning to do.

In re Keryx at *10[7]. According to the trial court, providing p-values for comparisons between therapies, without disclosing the extent of unplanned interim analyses or the number of multiple comparisons is “not falsity; it is less disclosure than plaintiffs would have liked.” Id. at *11.

  [...read his full blogpost here]

The court’s characterization of the fraud claims as a challenge to trial methodology rather than data interpretation and communication decidedly evaded the thrust of the plaintiffs’ fraud complaint. Data interpretation will often be part of the methodology outlined in a protocol. The Keryx case also confused criticism of the design and execution of a clinical trial with criticism of the communication of the trial results.

Exactly!

I’m not sure I understand at this point what the “Reference Manual”, or Daubert, or it’s current manifestation, are really requiring (on multiplicity); and as would be expected of any sharp lawyer, Schachtman makes some intricate gradations.

Please see the full blogpost and his extended footnotes here.

One clever gambit I often come across by way of excuse (for QRPs along the lines of selection effects) is that it’s a “philosophical issue”. How can you hold someone accountable for favoring one of rival philosophical positions?  If it’s not put as a “free speech” issue, it’s a “freedom of philosophy” issue.  How con-veenient!

[i] See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.).

[ii]Abely v. Aeterna Zentaris Inc., No. 12 Civ. 4711(PKC), 2013 WL 2399869 (S.D.N.Y. May 29, 2013); In re Keryx Biopharms, Inc., Sec. Litig., 1307(KBF), 2014 WL 585658 (S.D.N.Y. Feb. 14, 2014).

*Schachtman’s legal practice focuses on the defense of product liability suits, with an emphasis on the scientific and medico-legal issues.  He teaches a course in statistics in the law at the Columbia Law School, NYC. 

Categories: P-values, PhilStat Law, Statistics | 12 Comments

Gelman recognizes his error-statistical (Bayesian) foundations

karma

From Gelman’s blog:

“In one of life’s horrible ironies, I wrote a paper “Why we (usually) don’t have to worry about multiple comparisons” but now I spend lots of time worrying about multiple comparisons”

Posted by  on

Exhibit A: [2012] Why we (usually) don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness 5, 189-211. (Andrew Gelman, Jennifer Hill, and Masanao Yajima)

Exhibit B: The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time, in press. (Andrew Gelman and Eric Loken) (Shortened version is here.)

 

The “forking paths” paper, in my reading,  basically argues that mere hypothetical possibilities about what you would or might have done had the data been different (in order to secure a desired interpretation) suffices to alter the characteristics of the analysis you actually did. That’s an error statistical argument–maybe even stronger than what some error statisticians would say. What’s really being condemned are overly flexible ways to move from statistical results to substantive claims. The p-values are illicit when taken to provide evidence for those claims because an actual p-value requires Prob(P < p;Ho) = p (and the actual p-value has become much greater by design). The criticism makes perfect sense if you’re scrutinizing inferences according to how well or severely tested they are. Actual error probabilities are accordingly altered or unable to be calculated. However, if one is going to scrutinize inferences according to severity then the same problematic flexibility would apply to Bayesian analyses, whether or not they have a way to pick up on it. (It’s problematic if they don’t.) I don’t see the magic by which a concern for multiple testing disappears in Bayesian analysis (e.g., in the first paper) except by assuming some prior takes care of it..

Categories: Error Statistics, Gelman | 15 Comments

BREAKING THE (Royall) LAW! (of likelihood) (C)

IMG_1734

.

With this post, I finally get back to the promised sequel to “Breaking the Law! (of likelihood) (A) and (B)” from a few weeks ago. You might wish to read that one first.* A relevant paper by Royall is here.

Richard Royall is a statistician1 who has had a deep impact on recent philosophy of statistics by giving a neat proposal that appears to settle disagreements about statistical philosophy! He distinguishes three questions:

  • What should I believe?
  • How should I act?
  • Is this data evidence of some claim? (or How should I interpret this body of observations as evidence?)

It all sounds quite sensible– at first–and, impressively, many statisticians and philosophers of different persuasions have bought into it. At least they appear willing to go this far with him on the 3 questions.

How is each question to be answered? According to Royall’s commandments writings, what to believe is captured by Bayesian posteriors; how to act, by a behavioristic, N-P long-run performance. And what method answers the evidential question? A comparative likelihood approach. You may want to reject all of them (as I do),2 but just focus on the last.

Remember with likelihoods, the data x are fixed, the hypotheses vary. A great many critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with “the law”. But I fail to see why we should obey it.

To begin with, a report of comparative likelihoods isn’t very useful: H might be less likely than H’, given x, but so what? What do I do with that information? It doesn’t tell me I have evidence against or for either.3 Recall, as well, Hacking’s points here about the variability in the meanings of a likelihood ratio across problems.

Royall: “the likelihood view is that observations [like x and y]…have no valid interpretation as evidence in relation to the single hypothesis H.” (2004, p. 149). In his view, all attempts to say whether x is good evidence for H or even if x is better evidence for H than is y are utterly futile. Only comparing a fixed x to H versus some alternative H’ can work, according to Royall’s likelihoodist.

Which alternative to use in the comparison? Should it be a specific alternative? A vague catchall hypothesis? (See Barnard post.) A maximally likely alternative? An alternative against which a test has high power? The answer differs greatly based on the choice. Moreover, an account restricted to comparisons cannot answer our fundamental question: is x good evidence for H or is it a case of BENT evidence (bad evidence no test)? His likelihood account obeys the Likelihood Principle (LP) or, as he puts it, the “irrelevance of the sample space”. That means ignoring the impact of stopping rules on error probabilities. A 2 s.d. difference from “trying and trying again” (using the two-sided Normal tests in the links) or a fixed sample size registers exactly the same, because the likelihoods are proportional. (On stopping rules, see this post, Mayo and Kruse (2001), EGEK (1996, chapter 10); on the LP see Mayo 2014, and search this blog for quite a lot under SLP).

When I challenged Royall with the optional stopping case at the ecology conference (that gave rise to the Taper and Lele volume), he looked surprised at first, and responded (in a booming voice): “But it’s a law!” (My contribution to the Taper and Lele volume is here.) Philosopher Roger Rosenkrantz remarks:

“The likelihood principle implies…the irrelevance of predesignation, of whether an hypothesis was thought of beforehand or was introduced to explain known effects.” (Rosenkrantz, p. 122)

[What a blissful life these likelihoodists live, in the face of today's data plundering.]

Nor does Royall object to the point Barnard made in criticizing Hacking when he was a likelihoodist:

Turning over the top card of a shuffled deck of playing cards, I find an ace of diamonds:

“According to the law of likelihood, the hypothesis that the deck consists of 52 aces of diamonds (H1) is better supported than the hypothesis that the deck is normal (HN) [by the factor 52]…Some find this disturbing.”

But not Royall.

“Furthermore, it seems unfair; no matter what card is drawn, the law implies that the corresponding trick-deck hypothesis (52 cards just like the one drawn) is better supported than the normal-deck hypothesis. Thus even if the deck is normal we will always claim to have found strong evidence that it is not.”

To Royall, it only shows a confusion between evidence and belief. If you’re not convinced the deck has 52 aces of diamonds “it does not mean that the observation is not strong evidence in favor of H1 versus HN.” It just wasn’t strong enough to overcome your prior beliefs. Now Royall is no Bayesian, at least he doesn’t think a Bayesian computation gives us answers about evidence. (Actually, he alludes to this as a frequentist attempt, at least in Taper and Lele). In his view, evidence comes solely from these (deductively given) comparative likelihoods (1997, 14). (I don’t know if he ever discusses model checking.)  An appeal to beliefs enters only to explain any disagreements with his “law”.

Consider Royall’s treatment of the familiar example where a positive diagnostic result is more probable under “disease” than “no disease”. Then, even if a low prior probability for disease is sufficiently small to result in a low posterior for disease “to interpret the positive test result as evidence that the subject does not have the disease is never appropriate––it is simply and unequivocally wrong. Why is it wrong?” (2004, 122).

Well you already know the answer: it violates “the law”.

“[I]t violates the fundamental principle of statistical reasoning. That principle, the basic rule for interpreting statistical evidence, is what Hacking (1965, 70) named the law of likelihood. It states:

If hypothesis A implies that the probability that a random variable X takes the value x is pA(x), while hypothesis B implies that the probability is pB(x), then the observation X = x is evidence supporting A over B if and only if pA(x) > pB(x), and the likelihood ratio, pA(x)/ pB(x), measures the strength of that evidence.” (Royall, 2004, p. 122)

“This says simply that if an event is more probable under hypothesis A than hypothesis B, then the occurrence of that event is evidence supporting A over B––the hypothesis that did the better job of predicting the event is better supported by its occurrence.” Moreover, “the likelihood ratio, is the exact factor by which the probability ratio [ratio of priors in A and B] is changed”. (ibid. 123)

There are basically two ways to supplement comparative likelihoods: introduce other possible hypotheses (e.g., prior probabilities) or other possible outcomes (e.g., sampling distributions, error probabilities).

NOTES
*Like everyone else, I’m incredibly pressed at the moment. It’s either unpolished blog posts, or no posts. So please send corrections. If I update this, I’ll mark it as part (C, 2nd).

1 Royall, retired from Johns Hopkins, now serves as Chairman of Advisory Board of Analytical Edge Inc. “Prof. Royall is internationally recognized as the father of modern Likelihood methodology, having largely formulated its foundation and demonstrating its viability for representing, interpreting and communicating statistical evidence via the likelihood function.” (Link is here).

[Incidentally, I always attempt to contact people I post on; but the last time I tried to contact Royall, I didn't succeed.]

2 I consider that the proper way to answer questions of evidence is by means of an error statistical account used to assess and control the severity of tests. Comparative likelihoodist accounts fail to provide this.

3 Do not confuse an account’s having a rival–which certainly N-P tests and CIs do–with the account being merely comparative. With the latter, you do not detach an inference, it’s always on the order of x “favors” H over H’ or the like. And remember, that’s ALL statistical evidence is in this account.

REFERENCES

Mayo, D. G. (2004). “An Error-Statistical Philosophy of Evidence,” 79-118, in M. Taper and S. Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. Chicago: University of Chicago Press.

Mayo, D. G. (2014) On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29, no. 2, 227-266.

Mayo, D. G. and Kruse, M. (2001). “Principles of Inference and Their Consequences,” 381-403, in D. Cornfield and J. Williamson (eds.) Foundations of Bayesianism. Dordrecht: Kluwer.

Rosenkrantz, R. (1977) Inference, Method and Decision. Dordrecht: D. Reidel.

Royall, R. (1997) Statistical Evidence: A likelihood paradigm, Chapman and Hall, CRC Press.

Royall, R. (2004), “The Likelihood Paradigm for Statistical Evidence” 119-138; Rejoinder 145-151, in M. Taper, and S. Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. Chicago: University of Chicago Press.

Categories: law of likelihood, Richard Royall, Statistics | 41 Comments

A (Jan 14, 2014) interview with Sir David Cox by “Statistics Views”

Sir David Cox

Sir David Cox

The original Statistics Views interview is here:

“I would like to think of myself as a scientist, who happens largely to specialise in the use of statistics”– An interview with Sir David Cox

FEATURES

  • Author: Statistics Views
  • Date: 24 Jan 2014
  • Copyright: Image appears courtesy of Sir David Cox

Sir David Cox is arguably one of the world’s leading living statisticians. He has made pioneering and important contributions to numerous areas of statistics and applied probability over the years, of which perhaps the best known is the proportional hazards model, which is widely used in the analysis of survival data. The Cox point process was named after him.

Sir David studied mathematics at St John’s College, Cambridge and obtained his PhD from the University of Leeds in 1949. He was employed from 1944 to 1946 at the Royal Aircraft Establishment, from 1946 to 1950 at the Wool Industries Research Association in Leeds, and from 1950 to 1955 worked at the Statistical Laboratory at the University of Cambridge. From 1956 to 1966 he was Reader and then Professor of Statistics at Birkbeck College, London. In 1966, he took up the Chair position in Statistics at Imperial College Londonwhere he later became Head of the Department of Mathematics for a period. In 1988 he became Warden of Nuffield College and was a member of the Department of Statistics at Oxford University. He formally retired from these positions in 1994 but continues to work in Oxford.

Sir David has received numerous awards and honours over the years. He has been awarded the Guy Medals in Silver (1961) and Gold (1973) by the Royal Statistical Society. He was elected Fellow of the Royal Society of London in 1973, was knighted in 1985 and became an Honorary Fellow of the British Academy in 2000. He is a Foreign Associate of the US National Academy of Sciences and a foreign member of the Royal Danish Academy of Sciences and Letters. In 1990 he won the Kettering Prize and Gold Medal for Cancer Research for “the development of the Proportional Hazard Regression Model” and 2010 he was awarded the Copley Medal by the Royal Society.

He has supervised and collaborated with many students over the years, many of whom are now successful in statistics in their own right such as David Hinkley and Past President of the Royal Statistical Society, Valerie Isham. Sir David has served as President of theBernoulli Society, Royal Statistical Society, and the International Statistical Institute.

This year, Sir David is to turn 90*. Here Statistics Views talks to Sir David about his prestigious career in statistics, working with the late Professor Lindley, his thoughts on Jeffreys and Fisher, being President of the Royal Statistical Society during the Thatcher Years, Big Data and the best time of day to think of statistical methods.

1. With an educational background in mathematics at St Johns College, Cambridge and the University of Leeds, when and how did you first become aware of statistics as a discipline?

I was studying at Cambridge during the Second World War and after two years, one was sent either into the Forces or into some kind of military research establishment. There were very few statisticians then, although it was realised there was a need for statisticians. It was assumed that anybody who was doing reasonably well at mathematics could pick up statistics in a week or so! So, aged 20, I went to the Royal Aircraft Establishment in Farnborough, which is enormous and still there to this day if in a different form, and I worked in the Department of Structural and Mechanical Engineering, doing statistical work. So statistics was forced upon me, so to speak, as was the case for many mathematicians at the time because, aside from UCL, there had been very little teaching of statistics in British universities before the Second World War. Afterwards, it all started to expand.

2. From 1944 to 1946 you worked at the Royal Aircraft Establishment and then from 1946 to 1950 at the Wool Industries Research Association in Leeds. Did statistics have any role to play in your first roles out of university?

Totally. In Leeds, it was largely statistics but also to some extent, applied mathematics because there were all sorts of problems connected with the wool and textile industry in terms of the physics, chemistry and biology of the wool and some of these problems were mathematical but the great majority had a statistical component to them. That experience was not totally uncommon at the time and many who became academic statisticians had, in fact, spent several years working in a research institute first.

3. From 1950 to 1955, you worked at the Statistical Laboratory at Cambridge and would have been there at the same time as Fisher and Jeffreys. The late Professor Dennis Lindley, who was also there at that time, told me that the best people working on statistics were not in the statistics department at that time. What are your memories when you look back on that time and what do you feel were your main achievements?

Lindley was exactly right about Jeffreys and Fisher. They were two great scientists outside statistics – Jeffreys founded modern geophysics and Fisher was a major figure in genetics. Dennis was a contemporary and very impressive and effective. We were colleagues for five years and our children even played together.

The first lectures on statistics I attended as a student consisted of a short course by Harold Jeffreys who had at the time a massive reputation as virtually the inventor of modern geophysics. His Theory of Probability, published first as a monograph in physics was and remains of great importance but, amongst other things, his nervousness limited the appeal of his lectures, to put it gently. I met him personally a couple of times – he was friendly but uncommunicative. When I was later at the Statistical Laboratory in Cambridge, relations between the Director, Dr Wishart and R.A. Fisher had been at a very low ebb for 20 years and contact between the Lab and Fisher was minimal. I hear him speak on three of four occasions, interesting if often rambunctious occasions. To some, Fisher showed great generosity but not to the Statistics Lab, which was sad in view of the towering importance of his work.

“To some, Fisher showed great generosity but not to the Statistics Lab, which was sad in view of the towering importance of his work.”

4. You have also taught at many institutions over the years including Princeton, Berkeley, Cambridge, Birkbeck College and Imperial College London before joining Nuffield College here at Oxford. Over the years, how did the teaching of statistics evolve and adapt to meet the changing needs of students?

As I said, when I was a student, there was very little teaching of statistics in British universities. It has evolved over the years and was first primarily a postgraduate subject, taken after reading mathematics if you wished to be a scientific statistician, rather than an economic statistician. You took at least a diploma, or a one-year MA or a doctorate. Then statistics came into mathematics degrees, partly to make them more appealing to a wider audience and that has changed, so nowadays, most statisticians start fairly intensively in an undergraduate course, which has some advantages and some disadvantages.

5. How did your teaching and research motivated and influenced each other? Did you get research ideas from statistics and incorporate them into your teaching?

Much of my research has come from talking to scientists. Sometimes ideas come from lecturing because the way to really understand a subject is to give a course of lectures on the subject and sometimes that throws up more theoretical issues that you might not otherwise have been thought of. The overwhelming majority of my work comes either directly or indirectly from some physical biological or medical problem, but in many different ways – casual conversation sometimes.

6. You have taught many who have gone onto make their own important contributions towards statistics such as David Hinkley whom is now renowned for his work on bootstrap methods and Valerie Isham who recently served as the President of the Royal Statistical Society. The late Professor Dennis Lindley told me that “One of the joys of life is teaching a really good graduate.” Would you be in agreement?

I would say that one of the joys of life is learning from a good graduate. The first duty of a doctoral student is clearly is to educate their supervisor which my own doctoral students have down. Hopefully, they’ve learnt a bit from me occasionally! I am absolutely certain that I learnt a lot from Valerie, for instance, as we’ve worked together on and off for around forty years. Having such students is fantastic. I have been fortunate and happy as at Birkbeck, I had largely evening students. They were highly motivated and very able. Many of the graduate students at Imperial came from other places, or were international students of high standard. Also rather importantly, my students have been very nice people!

7. You are best known for your innovative work on the proportional hazards model, which is now widely used in the analysis of survival data. What research led to this discovery? What set you on the right path?

Two different things – first of all, I had been interested in reliability in an industrial context since I worked in the textile industry and to some extent, when I was at the Royal Aircraft Establishment, when strength of materials was important. I had a long interest in testing the strength and reliability which is also related to looking at the duration of life. Then the more specific thing was that at least four or five people from different areas in the US and the UK said that they had a certain kind of data with people’s survival times under various treatments and all sorts of further aspects with regards to the patient but they did not know how to analyse this data. The work led to one paper but the reason it is so popular is totally accidental. Other people wrote easily useful software in which to implement the method which is not my speciality at all. I had software to implement it but it was not suitable for general use. In a sense, it became almost too easy and so people just started to use the method because it was painless! The proportion of my life that I spent working on the proportional hazards model is, in fact, very small. I had an idea of how to solve it but I could not complete the argument and so it took me about four years on and off, often thinking about it before I went to bed.

(Editor’s note: I tell Sir David that I now had a picture in my head of him pacing the house in his pyjamas at four o’clock in the morning with a hot chocolate in one hand, thinking statistical thoughts and he laughs).

Not quite! It was right before going to bed. There is a well-established literature in mathematics that people who thought about a problem and do not know how to solve it, go to bed thinking about it and wake up the next morning with a solution. It’s not easily explicable but if you’re wide awake, you perhaps argue down the conventional lines of argument but what you need to do is something a bit crazy which you’re more likely to do if you’re half-awake or asleep. Presumably that’s the explanation!

“The proportion of my life that I spent working on the proportional hazards model is, in fact, very small. I had an idea of how to solve it but I could not complete the argument and so it took me about four years on and off…”

8. The getstats campaign by the Royal Statistical Society focuses on improving the public’s understanding of statistics in every-day life. Would you have any advice for them and what areas should they focus on that you feel there should be more awareness of in statistics?

Of course, to some extent, the notion that some very simple and non-technical ideas about collecting data and analysing it are taught to children is very good but then at the other end, there are people who are highly educated but have no sense of statistical arguments, such as many lawyers and senior civil servants. The RSS has done an excellent job in trying to interest MPs in statistical ideas. Both these extremes are important. Sending a very general message to people as far as possible helps, but also sending very focussed messages to key groups of people is more important in the short term. You do see on TV, for instance, that basic principles are being ignored in collecting and analysing evidence. Of course, it’s easy for me to stand on the sidelines and criticise.

9. You have served as the President for several societies over the years including the Royal Statistical Society, the Bernoulli Society and the International Statistical Institute. What are your memories of your time at the RSS for instance and how you helped the society adapt to the changing needs of the statistical community?

It was a bit different in my time. I was the President of the RSS at the time when Margaret Thatcher was PM and massacring the civil service and in particular, the government’s statistical service and there was a lot of activity going on about that. But it was done more by going to see people, talking to them and trying to influence them than writing them formal letters. While, of course, openness is a good thing, it is not always the best way to get results. People can take up inflexible attitudes but if you talk to them quietly in private, they are perhaps then more open to new ideas.

10. You have received numerous awards from the Guy Medal both in Silver and Gold to the Marvin Zelen Leadership Award. Is there a particular award that you were most proud of being awarded?

It would be the Copley Medal from the Royal Society as it was for general science. It is very nice to receive these awards, of course, perhaps particularly because they represent the fact that your friends have put in efforts on your behalf. Therefore, what I really value is not the award but the appreciation of friends and colleagues. That is what is important but the award and degrees are certainly an honour. If you overvalue an award, that can be dangerous.

11. You have written many papers and books. What are the ones that you are most proud of?

The one I’m going to write next, of course! I have flitted about all sorts of different topics different fields of application, different parts of the subject, and so on. Really, I don’t look back very much.

At the moment, I have just finished a book with a colleague called Case Control Studies which is mainly about epidemiological investigation.

“…what I really value is not the award but the appreciation of friends and colleagues.”

12. What has been the best book on statistics that you have ever read?

I honestly don’t know. The position of books is interesting as when I first started in my career, there were hardly any books at all that were treating statistics in a modern way. Then they very slowly began to appear and now there is a flood of them. The standard on the whole published now is very high but there is too much to keep up with!

13. What has been the most exciting development that you have worked on in statistics during your career?

I’m not sure about exciting (!) but one of the most demanding was being involved in the issues about Bovine TB in badgers, which went on for about ten years. It involved a great deal of work, which was very interesting and instructive in all sorts of ways, and not just in statistics.

I’ve been involved in other government-based topics, such as the group which made the first predictions for the AIDS epidemic, which was also very interesting.

14. At the recent Future of Statistical Sciences workshop, there was much talk about Big Data and a concern that many ‘hot areas’ such as big data/data analytics, which have close connections with statistics and the statistical sciences, are being monopolised by computer scientists and/or engineers. What do statisticians need to do to ensure their work and their profession gets noticed?

Do better quality work, which I don’t mean as a criticism as to what is done at the moment but rather, do high quality work that is important in some sense, either intellectually or practically in particular fields. Part of the problem is that relatively speaking, there are not that many statisticians who are trained to the level needed.

15. What do you think the most important recent developments in the field have been? What do you think will be the most exciting and productive areas of research in statistics during the next few years?

The most immediately important is as you said – Big Data, which will bring forward new ideas but it does not mean that old ideas from the more traditional part of the subject is useless. It is the most obvious and biggest challenge.

Ideally, we should be looking at very important practical problems in a different number of fields and see some sort of common element and build the ideas that are necessary in order to tackle any issues that arise. You should not tackle just one issue successfully but tackle a collection of issues – the Big Data aspect is one common theme undoubtedly. It goes beyond statistics – to what extent Big Data can replace small, carefully planned investigations which are much more sharply focussed on a very specific issue.

My intrinsic feeling is that more fundamental progress is more likely to be made by very focused, relatively small scale, intensive investigations than collecting millions of bits of information on millions of people, for example. It’s possible to collect such large data now, but it depends on the quality, which may be very high or not, and if it is not, what do you do about it?

16. Do you think over the years too much research has focussed on less important areas of statistics? Should the gap between research and applications get reduced? How so and by whom?

In British statistics at the moment, the gap between theory and applications is difficult. Theory has almost disappeared. Almost everyone is working on applications. The issue is whether this has gone a bit too far. Everyone has to find their best way of working in principle but if you are a theoretician, then to have really serious contact with applications is for most people, extremely fruitful and indeed almost essential. Some individuals will think that is not true and that it may be better that they sit at their desk and think great thoughts, so to speak! That is another way of working but the danger then is that the great thoughts may have no bearing on the real world. But for most people, it is the interplay which is crucial. Maybe I am not imaginative enough to just sit there and think to myself of abstract problems which are really important enough to spend time on! Others are undoubtedly much better at that which may be their better way of thinking.

“My intrinsic feeling is that more fundamental progress is more likely to be made by very focused, relatively small scale, intensive investigations than collecting millions of bits of information on millions of people, for example. It’s possible to collect such large data now, but it depends on the quality, which may be very high or not, and if it is not, what do you do about it?”

17. What do you see as the greatest challenges facing the profession of statisticians in the coming years?

I know the term ‘the profession of statistics’ is widely used but I am not that keen on it. I would like to think of myself as a scientist, who happens largely to specialise in the use of statistics. That is a question of words to some extent. One answer would be to that the challenge, preferably for an academic statistician, is to be involved in several fields of application in a non-trivial sense and combine the stimulus and the contribution you can make that way with theoretical contributions that those contacts will suggest. As I said before, I don’t think you can lay down a rule as to how what is most productive for everyone.

18. Are there people or events that have been influential in your career? Also, given that you are one of the most well respected statisticians of your generation and many statisticians look up to you, whose work do you admire (it can be someone working now, or someone whose work you admired greatly earlier on in your career?).

The person who influenced me by far the greatest was Henry Daniels. I went to work with him at the Wool Research Association and then he went to Cambridge and from there, Birmingham. He was both a very clever mathematician and a very good statistician. He was also actually a very skilful experimental physicist, which is interesting. At the Wool Research Association, he was a statistician but he also ran a measurement lab (what they called a fibre-measurement lab where he developed all sorts of clever measurement techniques).

Maurice Bartlett, who was at Manchester, UCL and then here in Oxford was another major influence and then in the background were people like R.A. Fisher and Jeffreys. I met Jeffreys a few times and went to his lectures – although he wrote beautifully, his lectures were really rather impossible, which was sad.

Otherwise, I have learnt from almost everybody that I’ve had contact with and I certainly include students amongst them.

19. If you had not got involved in the field of statistics, what do you think you would have done? (Is there another field that you could have seen yourself making an impact on?)

I thought I would go into either theoretical physics or pure mathematics but I’m very glad I didn’t. I’m not clever enough for either of those fields. They are both fascinating subjects but statistics is a much more easily satisfying life because there are so many different directions in which to go. Whereas in pure mathematics, you are possibly doing things that only two other people in the world may understand and that requires a certain austerity of spirit in order to do that, which I do not possess! I also find quantum mechanics absolutely fascinating but I am not original enough to do striking things in that field.

*He turned 90 in July.

Please share your comments.

Categories: Sir David Cox | 3 Comments

Diederik Stapel hired to teach “social philosophy” because students got tired of success stories… or something (rejected post)

Oh My*.images-16

(“But I can succeed as a social philosopher”)

The following is from Retraction Watch. UPDATE: OCT 10, 2014**

Diederik Stapel, the Dutch social psychologist and admitted data fabricator — and owner of 54 retraction notices — is now teaching at a college in the town of Tilburg [i].

According to Omroep Brabant, Stapel was offered the job as a kind of adjunct at Fontys Academy for Creative Industries to teach social philosophy. The site quotes a Nick Welman explaining the rationale for hiring Stapel (per Google Translate):

“It came about because students one after another success story were told from the entertainment industry, the industry which we educate them .”

The students wanted something different.

“They wanted to also focus on careers that have failed. On people who have fallen into a black hole, acquainted with the dark side of fame and success.”

Last month, organizers of a drama festival in The Netherlands cancelled a play co-written by Stapel.

I really think Dean Bon puts the rationale most clearly of all.

…A letter from the school’s dean, Pieter Bon, adds:

We like to be entertained and the length of our lives increases. We seek new ways in which to improve our health and we constantly look for new ways to fill our free time. Fashion and looks are important to us; we prefer sustainable products and we like to play games using smart gadgets. This is why Fontys Academy for Creative Industries exists. We train people to create beautiful concepts, exciting concepts, touching concepts, concepts to improve our quality of life. We train them for an industry in which creativity is of the highest value to a product or service. We educate young people who feel at home in the (digital) world of entertainment and lifestyle, and understand that creativity can also mean business. Creativity can be marketed, it’s as simple as that.

We’re sure Prof. Stapel would agree.

[i] Fontys describes itself thusly: Fontys Academy for Creative Industries (Fontys ACI) in Tilburg has 2500 students working towards a bachelor of Business Administration (International Event, Music & Entertainment Studies and Digital Publishing Studies), a bachelor of Communication (International Event, Music & Entertainment Studies) or a bachelor of Lifestyle (International Lifestyle Studies). Fontys ACI hosts a staff of approximately one hundred (teachers plus support staff) as well as about fifty regular visiting lecturers.

 *I wonder if “social philosophy” is being construed as “extreme postmodernist social epistemology”?  

I guess the students are keen to watch that Fictionfactory Peephole.

**Turns out to have been short-lived. Also admits to sockpuppeting at Retraction watch. Frankly I thought it was more fun to guess who “Paul” was, but they have rules. http://retractionwatch.com/2014/10/08/diederik-stapel-loses-teaching-post-admits-he-was-sockpuppeting-on-retraction-watch/#comments

[ii} One of my April Fool’s Day posts is turning from part fiction to fact.

Categories: Rejected Posts, Statistics | 9 Comments

Oy Faye! What are the odds of not conflating simple conditional probability and likelihood with Bayesian success stories?

Unknown

Faye Flam

Congratulations to Faye Flam for finally getting her article published at the Science Times at the New York Times, “The odds, continually updated” after months of reworking and editing, interviewing and reinterviewing. I’m grateful too, that one remark from me remained. Seriously I am. A few comments: The Monty Hall example is simple probability not statistics, and finding that fisherman who floated on his boots at best used likelihoods. I might note, too, that critiquing that ultra-silly example about ovulation and voting–a study so bad they actually had to pull it at CNN due to reader complaints[i]–scarcely required more than noticing the researchers didn’t even know the women were ovulating[ii]. Experimental design is an old area of statistics developed by frequentists; on the other hand, these ovulation researchers really believe their theory, so the posterior checks out.

The article says, Bayesian methods can “crosscheck work done with the more traditional or ‘classical’ approach.” Yes, but on traditional frequentist grounds. What many would like to know is how to cross check Bayesian methods—how do I test your beliefs? Anyway, I should stop kvetching and thank Faye and the NYT for doing the article at all[iii]. Here are some excerpts:

Statistics may not sound like the most heroic of pursuits. But if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer.

The man owes his life to a once obscure field known as Bayesian statistics — a set of mathematical rules for using new data to continuously update beliefs or existing knowledge.

The method was invented in the 18th century by an English Presbyterian minister named Thomas Bayes — by some accounts to calculate the probability of God’s existence. In this century, Bayesian statistics has grown vastly more useful because of the kind of advanced computing power that didn’t exist even 20 years ago.

It is proving especially useful in approaching complex problems, including searches like the one the Coast Guard used in 2013 to find the missing fisherman, John Aldridge (though not, so far, in the hunt for Malaysia Airlines Flight 370).

Now Bayesian statistics are rippling through everything from physics to cancer research, ecology to psychology. Enthusiasts say they are allowing scientists to solve problems that would have been considered impossible just 20 years ago. And lately, they have been thrust into an intense debate over the reliability of research results.

When people think of statistics, they may imagine lists of numbers — batting averages or life-insurance tables. But the current debate is about how scientists turn data into knowledge, evidence and predictions. Concern has been growing in recent years that some fields are not doing a very good job at this sort of inference. In 2012, for example, a team at the biotech company Amgen announced that they’d analyzed 53 cancer studies and found it could not replicate 47 of them.

Similar follow-up analyses have cast doubt on so many findings in fields such as neuroscience and social science that researchers talk about a “replication crisis”

Some statisticians and scientists are optimistic that Bayesian methods can improve the reliability of research by allowing scientists to crosscheck work done with the more traditional or “classical” approach, known as frequentist statistics. The two methods approach the same problems from different angles.

The essence of the frequentist technique is to apply probability to data. If you suspect your friend has a weighted coin, for example, and you observe that it came up heads nine times out of 10, a frequentist would calculate the probability of getting such a result with an unweighted coin. The answer (about 1 percent) is not a direct measure of the probability that the coin is weighted; it’s a measure of how improbable the nine-in-10 result is — a piece of information that can be useful in investigating your suspicion.

By contrast, Bayesian calculations go straight for the probability of the hypothesis, factoring in not just the data from the coin-toss experiment but any other relevant information — including whether you’ve previously seen your friend use a weighted coin.

Scientists who have learned Bayesian statistics often marvel that it propels them through a different kind of scientific reasoning than they’d experienced using classical methods.

“Statistics sounds like this dry, technical subject, but it draws on deep philosophical debates about the nature of reality,” said the Princeton University astrophysicist Edwin Turner, who has witnessed a widespread conversion to Bayesian thinking in his field over the last 15 years.

Countering Pure Objectivity

Frequentist statistics became the standard of the 20th century by promising just-the-facts objectivity, unsullied by beliefs or biases. In the 2003 statistics primer “Dicing With Death,”Stephen Senn traces the technique’s roots to 18th-century England, when a physician named John Arbuthnot set out to calculate the ratio of male to female births.

…..But there’s a danger in this tradition, said Andrew Gelman, a statistics professor at Columbia. Even if scientists always did the calculations correctly — and they don’t, he argues — accepting everything with a p-value of 5 percent means that one in 20 “statistically significant” results are nothing but random noise.

The proportion of wrong results published in prominent journals is probably even higher, he said, because such findings are often surprising and appealingly counterintuitive, said Dr. Gelman, an occasional contributor to Science Times.

Looking at Other Factors

Take, for instance, a study concluding that single women who were ovulating were 20 percent more likely to vote for President Obama in 2012 than those who were not. (In married women, the effect was reversed.)

Dr. Gelman re-evaluated the study using Bayesian statistics. That allowed him look at probability not simply as a matter of results and sample sizes, but in the light of other information that could affect those results.

He factored in data showing that people rarely change their voting preference over an election cycle, let alone a menstrual cycle. When he did, the study’s statistical significance evaporated. (The paper’s lead author, Kristina M. Durante of the University of Texas, San Antonio, said she stood by the finding.)

Dr. Gelman said the results would not have been considered statistically significant had the researchers used the frequentist method properly. He suggests using Bayesian calculations not necessarily to replace classical statistics but to flag spurious results.

…..Bayesian reasoning combined with advanced computing power has also revolutionized the search for planets orbiting distant stars, said Dr. Turner, the Princeton astrophysicist.

One downside of Bayesian statistics is that it requires prior information — and often scientists need to start with a guess or estimate. Assigning numbers to subjective judgments is “like fingernails on a chalkboard,” said physicist Kyle Cranmer, who helped develop a frequentist technique to identify the latest new subatomic particle — the Higgs boson.

Others say that in confronting the so-called replication crisis, the best cure for misleading findings is not Bayesian statistics, but good frequentist ones. It was frequentist statistics that allowed people to uncover all the problems with irreproducible research in the first place, said Deborah Mayo, a philosopher of science at Virginia Tech. The technique was developed to distinguish real effects from chance, and to prevent scientists from fooling themselves.

Uri Simonsohn, a psychologist at the University of Pennsylvania, agrees. Several years ago, he published a paper that exposed common statistical shenanigans in his field — logical leaps, unjustified conclusions, and various forms of unconscious and conscious cheating.

He said he had looked into Bayesian statistics and concluded that if people misused or misunderstood one system, they would do just as badly with the other. Bayesian statistics, in short, can’t save us from bad science. …

 

Categories: Bayesian/frequentist, Statistics | 47 Comments

Letter from George (Barnard)

Memory Lane: (2 yrs) Sept. 30, 2012George Barnard sent me this note on hearing of my Lakatos Prize. He was to have been at my Lakatos dinner at the LSE (March 17, 1999)—winners are permitted to invite ~2-3 guests*—but he called me at the LSE at the last minute to say he was too ill to come to London.  Instead we had a long talk on the phone the next day, which I can discuss at some point. The main topics were likelihoods and error probabilities. {I have 3 other Barnard letters, with real content, that I might dig out some time.)

*My others were Donald Gillies and Hasok Chang

Categories: Barnard, phil/history of stat | Tags: , | Leave a comment

Should a “Fictionfactory” peepshow be barred from a festival on “Truth and Reality”? Diederik Stapel says no (rejected post)

photo-on-9-17-14-at-9-49-pm1So I hear that Diederik Stapel is the co-author of a book Fictionfactory (in Dutch,with a novelist, Dautzenberg)[i], and of what they call their “Fictionfactory peepshow”, only it’s been disinvited at the last minute from a Dutch festival on“truth and reality” (due to have run 9/26/14), and all because of Stapel’s involvement. Here’s an excerpt from an article in last week’s Retraction Watch (article is here):*

Here’s a case of art imitating science.

The organizers of a Dutch drama festival have put a halt to a play about the disgraced social psychologist Diederik Stapel, prompting protests from the authors of the skit — one of whom is Stapel himself.

According to an article in NRC Handelsblad:

The Amsterdam Discovery Festival on science and art has canceled at the last minute, the play written by Anton Dautzenberg and former professor Diederik Stapel. Co-sponsor, The Royal Netherlands Academy of Arts and Sciences (KNAW), doesn’t want Stapel, who committed science fraud, to perform at a festival that’s associated with the KNAW.

FICTION FACTORY

The management of the festival, planned for September 26th at the Tolhuistuin in Amsterdam, contacted Stapel and Dautzenberg 4 months ago with the request to organize a performance of their book and lecture project ‘The Fictionfactory”. Especially for this festival they [Stapel and Dautzenberg] created a ‘Fictionfactory-peepshow’.

“Last Friday I received a call [from the management of the festival] that our performance has been canceled at the last minute because the KNAW will withdraw their subsidy if Stapel is on the festival program”, says Dautzenberg. “This looks like censorship, and by an institution that also wants to represents arts and experiments”.

Well this is curious, as things with Stapel always are. What’s the “Fichtionfactory Peepshow”? If you go to Stapel’s homepage, it’s all in Dutch, but Google translation isn’t too bad, and I have a pretty good description of the basic idea. So since it’s Saturday night,let’s take a peek, or peep (at what it might have been)…

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here we are at the “Truth and Reality” Festival: first stop (after some cotton candy): the Stapel Fictionfactory Peepshow! It’s all dark, I can’t see a thing. What? It says I have to put some coins in a slot if I want to turn turn him it on (but that they also take credit cards). So I’m to look down in this tiny window. The curtains are opening!…I see a stage with two funky looking guys– one of them is Stapel. They’re reading, or reciting from some magazine with big letters: “Fact, Fiction and the Frictions we Hide”.

Stapel and Dautzenberg

Stapel and Dautzenberg

STAPEL: Welkom.You can ask us any questions! In response, you will always be given an option: ‘Do you want to know the truth or do you want to be comforted with fictions and feel-good fantasy?’

“Well I’ve brought some data with me from a study in social psychology. My question is this: “Is there a statistically significant effect here?”

STAPEL:Do you want to know the truth or do you want to be comforted with fictions and feel-good fantasy?

“Fiction please”.

STAPEL: I can massage your data, manipulate your numbers, reveal the taboos normally kept under wraps. For a few more coins I will let you see the secrets behind unreplicable results, and for a large bill will manufacture for you a sexy statistical story to turn on the editors.

(Then after the dirty business is all done [ii].)

STAPEL: Do you have more questions for me?

“Will it be published (fiction please)?”

STAPEL: “yes”

“will anyone find out about this (fiction please)?”

STAPEL: “No, I mean yes, I mean no.”

 

“I’d like to change to hearing the truth now. I have three questions”.

STAPEL: No problem, we take credit cards. Dank u. What are your questions?’

“Will Uri Simonsohn be able to fraudbust my results using the kind of tests he used on others? and if so, how long will it take him? (truth, please)?

STAPEL: “Yes.But not for at least 6 months to one year.”

“Here’s my final question. Are these data really statistically significant and at what level?” (truth please)

Nothing. Blank screen suddenly! With an acrid smelling puff of smoke, ew. But I’d already given the credit card! (Tricked by the master trickster).

 

What if he either always lies or always tells the truth? Then what would you ask him if you want to know the truth about your data? (Liar’s paradox variant)

Feel free to share your queries/comments.

* I thank Caitlin Parker for sending me the article

[i]Diederik Stapel was found guilty of science fraud in psychology in 2011, made up data out of whole cloth, retracted over 50 papers.. http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html?pagewanted=all&_r=0

Bookjacket:

defictiefabriek_clip_image002

[ii] Perhaps they then ask you how much you’ll pay for a bar of soap (because you’d sullied yourself). Why let potential priming data go to waste?  Oh wait, he doesn’t use real data…. Perhaps the peepshow was supposed to be a kind of novel introduction to research ethics.

 

Some previous posts on Stapel:

 

Categories: Comedy, junk science, rejected post, Statistics | 5 Comments

G.A. Barnard: The Bayesian “catch-all” factor: probability vs likelihood

barnard-1979-picture

G. A. Barnard: 23 Sept 1915-30 July, 2002

Today is George Barnard’s birthday. In honor of this, I have typed in an exchange between Barnard, Savage (and others) on an important issue that we’d never gotten around to discussing explicitly (on likelihood vs probability). Please share your thoughts.

The exchange is from pp 79-84 (of what I call) “The Savage Forum” (Savage, 1962)[i]

 ♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠

BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important.

SAVAGE: Surely, as you say, we cannot always enumerate hypotheses so completely as we like to think. The list can, however, always be completed by tacking on a catch-all ‘something else’. In principle, a person will have probabilities given ‘something else’ just as he has probabilities given other hypotheses. In practice, the probability of a specified datum given ‘something else’ is likely to be particularly vague­–an unpleasant reality. The probability of ‘something else’ is also meaningful of course, and usually, though perhaps poorly defined, it is definitely very small. Looking at things this way, I do not find probabilities unnormalizable, certainly not altogether unnormalizable.

Whether probability has an advantage over likelihood seems to me like the question whether volts have an advantage over amperes. The meaninglessness of a norm for likelihood is for me a symptom of the great difference between likelihood and probability. Since you question that symptom, I shall mention one or two others. …

On the more general aspect of the enumeration of all possible hypotheses, I certainly agree that the danger of losing serendipity by binding oneself to an over-rigid model is one against which we cannot be too alert. We must not pretend to have enumerated all the hypotheses in some simple and artificial enumeration that actually excludes some of them. The list can however be completed, as I have said, by adding a general ‘something else’ hypothesis, and this will be quite workable, provided you can tell yourself in good faith that ‘something else’ is rather improbable. The ‘something else’ hypothesis does not seem to make it any more meaningful to use likelihood for probability than to use volts for amperes.

Let us consider an example. Off hand, one might think it quite an acceptable scientific question to ask, ‘What is the melting point of californium?’ Such a question is, in effect, a list of alternatives that pretends to be exhaustive. But, even specifying which isotope of californium is referred to and the pressure at which the melting point is wanted, there are alternatives that the question tends to hide. It is possible that californium sublimates without melting or that it behaves like glass. Who dare say what other alternatives might obtain? An attempt to measure the melting point of californium might, if we are serendipitous, lead to more or less evidence that the concept of melting point is not directly applicable to it. Whether this happens or not, Bayes’s theorem will yield a posterior probability distribution for the melting point given that there really is one, based on the corresponding prior conditional probability and on the likelihood of the observed reading of the thermometer as a function of each possible melting point. Neither the prior probability that there is no melting point, nor the likelihood for the observed reading as a function of hypotheses alternative to that of the existence of a melting point enter the calculation. The distinction between likelihood and probability seems clear in this problem, as in any other.

BARNARD: Professor Savage says in effect, ‘add at the bottom of list H1, H2,…”something else”’. But what is the probability that a penny comes up heads given the hypothesis ‘something else’. We do not know. What one requires for this purpose is not just that there should be some hypotheses, but that they should enable you to compute probabilities for the data, and that requires very well defined hypotheses. For the purpose of applications, I do not think it is enough to consider only the conditional posterior distributions mentioned by Professor Savage.

LINDLEY: I am surprised at what seems to me an obvious red herring that Professor Barnard has drawn across the discussion of hypotheses. I would have thought that when one says this posterior distribution is such and such, all it means is that among the hypotheses that have been suggested the relevant probabilities are such and such; conditionally on the fact that there is nothing new, here is the posterior distribution. If somebody comes along tomorrow with a brilliant new hypothesis, well of course we bring it in.

BARTLETT: But you would be inconsistent because your prior probability would be zero one day and non-zero another.

LINDLEY: No, it is not zero. My prior probability for other hypotheses may be ε. All I am saying is that conditionally on the other 1 – ε, the distribution is as it is.

BARNARD: Yes, but your normalization factor is now determined by ε. Of course ε may be anything up to 1. Choice of letter has an emotional significance.

LINDLEY: I do not care what it is as long as it is not one.

BARNARD: In that event two things happen. One is that the normalization has gone west, and hence also this alleged advantage over likelihood. Secondly, you are not in a position to say that the posterior probability which you attach to an hypothesis from an experiment with these unspecified alternatives is in any way comparable with another probability attached to another hypothesis from another experiment with another set of possibly unspecified alternatives. This is the difficulty over likelihood. Likelihood in one class of experiments may not be comparable to likelihood from another class of experiments, because of differences of metric and all sorts of other differences. But I think that you are in exactly the same difficulty with conditional probabilities just because they are conditional on your having thought of a certain set of alternatives. It is not rational in other words. Suppose I come out with a probability of a third that the penny is unbiased, having considered a certain set of alternatives. Now I do another experiment on another penny and I come out of that case with the probability one third that it is unbiased, having considered yet another set of alternatives. There is no reason why I should agree or disagree in my final action or inference in the two cases. I can do one thing in one case and other in another, because they represent conditional probabilities leaving aside possibly different events.

LINDLEY: All probabilities are conditional.

BARNARD: I agree.

LINDLEY: If there are only conditional ones, what is the point at issue?

PROFESSOR E.S. PEARSON: I suggest that you start by knowing perfectly well that they are conditional and when you come to the answer you forget about it.

BARNARD: The difficulty is that you are suggesting the use of probability for inference, and this makes us able to compare different sets of evidence. Now you can only compare probabilities on different sets of evidence if those probabilities are conditional on the same set of assumptions. If they are not conditional on the same set of assumptions they are not necessarily in any way comparable.

LINDLEY: Yes, if this probability is a third conditional on that, and if a second probability is a third, conditional on something else, a third still means the same thing. I would be prepared to take my bets at 2 to 1.

BARNARD: Only if you knew that the condition was true, but you do not.

GOOD: Make a conditional bet.

BARNARD: You can make a conditional bet, but that is not what we are aiming at.

WINSTEN: You are making a cross comparison where you do not really want to, if you have got different sets of initial experiments. One does not want to be driven into a situation where one has to say that everything with a probability of a third has an equal degree of credence. I think this is what Professor Barnard has really said.

BARNARD: It seems to me that likelihood would tell you that you lay 2 to 1 in favour of H1 against H2, and the conditional probabilities would be exactly the same. Likelihood will not tell you what odds you should lay in favour of H1 as against the rest of the universe. Probability claims to do that, and it is the only thing that probability can do that likelihood cannot.

You can read the rest of pages 78-103 of the Savage Forum here.

 HAPPY BIRTHDAY GEORGE!

References

[i] Savage, L. (1962), “Discussion”, in The Foundations of Statistical Inference: A Discussion, (G. A. Barnard and D. R. Cox eds.), London: Methuen, 76.
 
 

 

Categories: Barnard, highly probable vs highly probed, phil/history of stat, Statistics | 26 Comments

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

metablog old fashion typewriterMemory lane: Did you ever consider how some of the colorful exchanges among better-known names in statistical foundations could be the basis for high literary drama in the form of one-act plays (even if appreciated by only 3-7 people in the world)? (Think of the expressionist exchange between Bohr and Heisenberg in Michael Frayn’s play Copenhagen, except here there would be no attempt at all to popularize—only published quotes and closely remembered conversations would be included, with no attempt to create a “story line”.)  Somehow I didn’t think so. But rereading some of Savage’s high-flown praise of Birnbaum’s “breakthrough” argument (for the Likelihood Principle) today, I was swept into a “(statistical) theater of the absurd” mindset.

The first one came to me in autumn 2008 while I was giving a series of seminars on philosophy of statistics at the LSE. Modeled on a disappointing (to me) performance of The Woman in Black, “A Funny Thing Happened at the [1959] Savage Forum” relates Savage’s horror at George Barnard’s announcement of having rejected the Likelihood Principle!

The current piece also features George Barnard and since Monday (9/23) is Barnard’s birthday, I’m digging it out of “rejected posts” to reblog it. It recalls our first meeting in London in 1986. I’d sent him a draft of my paper “Why Pearson Rejected the Neyman-Pearson Theory of Statistics” (later adapted as chapter 11 of EGEK) to see whether I’d gotten Pearson right. He’d traveled quite a ways, from Colchester, I think. It was June and hot, and we were up on some kind of a semi-enclosed rooftop. Barnard was sitting across from me looking rather bemused.Barnard-1979-picture

The curtain opens with Barnard and Mayo on the roof, lit by a spot mid-stage. He’s drinking (hot) tea; she, a Diet Coke. The dialogue (is what I recall from the time[i]):

 Barnard: I read your paper. I think it is quite good.  Did you know that it was I who told Fisher that Neyman-Pearson statistics had turned his significance tests into little more than acceptance procedures?

Mayo:  Thank you so much for reading my paper.  I recall a reference to you in Pearson’s response to Fisher, but I didn’t know the full extent.

Barnard: I was the one who told Fisher that Neyman was largely to blame. He shouldn’t be too hard on Egon.  His statistical philosophy, you are aware, was different from Neyman’s.

Mayo:  That’s interesting.  I did quote Pearson, at the end of his response to Fisher, as saying that inductive behavior was “Neyman’s field, not mine”.  I didn’t know your role in his laying the blame on Neyman!

Fade to black. The lights go up on Fisher, stage left, flashing back some 30 years earlier . . . ….

Fisher: Now, acceptance procedures are of great importance in the modern world.  When a large concern like the Royal Navy receives material from an engineering firm it is, I suppose, subjected to sufficiently careful inspection and testing to reduce the frequency of the acceptance of faulty or defective consignments. . . . I am casting no contempt on acceptance procedures, and I am thankful, whenever I travel by air, that the high level of precision and reliability required can really be achieved by such means.  But the logical differences between such an operation and the work of scientific discovery by physical or biological experimentation seem to me so wide that the analogy between them is not helpful . . . . [Advocates of behavioristic statistics are like]

Russians [who] are made familiar with the ideal that research in pure science can and should be geared to technological performance, in the comprehensive organized effort of a five-year plan for the nation. . . .

In the U.S. also the great importance of organized technology has I think made it easy to confuse the process appropriate for drawing correct conclusions, with those aimed rather at, let us say, speeding production, or saving money. (Fisher 1955, 69-70)

Fade to black.  The lights go up on Egon Pearson stage right (who looks like he does in my sketch [frontispiece] from EGEK 1996, a bit like a young C. S. Peirce):

Pearson: There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans. . . . Indeed, to dispel the picture of the Russian technological bogey, I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot . . . . To the best of my ability I was searching for a way of expressing in mathematical terms what appeared to me to be the requirements of the scientist in applying statistical tests to his data.  (Pearson 1955, 204)

Fade to black. The spotlight returns to Barnard and Mayo, but brighter. It looks as if it’s gotten hotter.  Barnard wipes his brow with a white handkerchief.  Mayo drinks her Diet Coke.

Barnard (ever so slightly angry): You have made one blunder in your paper. Fisher would never have made that remark about Russia.

There is a tense silence.

Mayo: But—it was a quote.

End of Act 1.

Given this was pre-internet, we couldn’t go to the source then and there, so we agreed to search for the paper in the library. Well, you get the idea. Maybe I could call the piece “Stat on a Hot Tin Roof.”

If you go see it, don’t say I didn’t warn you.

I’ve gotten various new speculations over the years as to why he had this reaction to the mention of Russia (check discussions in earlier posts with this play). Feel free to share yours. Some new (to me) information on Barnard is in George Box’s recent autobiography.


[i] We had also discussed this many years later, in 1999.

 

Categories: Barnard, phil/history of stat, rejected post, Statistics | Tags: , , , , | 3 Comments

Uncle Sam wants YOU to help with scientific reproducibility!

You still have a few days to respond to the call of your country to solve problems of scientific reproducibility!

The following passages come from Retraction Watch, with my own recommendations at the end.

“White House takes notice of reproducibility in science, and wants your opinion”

ostpThe White House’s Office of Science and Technology Policy (OSTP) is taking a look at innovation and scientific research, and issues of reproducibility have made it onto its radar.

Here’s the description of the project from the Federal Register:

The Office of Science and Technology Policy and the National Economic Council request public comments to provide input into an upcoming update of the Strategy for American Innovation, which helps to guide the Administration’s efforts to promote lasting economic growth and competitiveness through policies that support transformative American innovation in products, processes, and services and spur new fundamental discoveries that in the long run lead to growing economic prosperity and rising living standards.

I wonder what Steven Pinker would say about some of the above verbiage?

And here’s what’s catching the eye of people interested in scientific reproducibility:

(11) Given recent evidence of the irreproducibility of a surprising number of published scientific findings, how can the Federal Government leverage its role as a significant funder of scientific research to most effectively address the problem?

The OSTP is the same office that, in 2013, took what Nature called “a long-awaited leap forward for open access” when it said “that publications from taxpayer-funded research should be made free to read after a year’s delay.That OSTP memo came after more than 65,000 people “signed a We the People petition asking for expanded public access to the results of taxpayer-funded research.”

Have ideas on improving reproducibility? Emails to innovationstrategy@ostp.gov are preferred, according to the notice, which also explains how to fax or mail comments. The deadline is September 23.

Off the top of my head, how about:

Promote the use of methodologies that:

  • control and assess the capabilities of methods to avoid mistaken inferences from data;
  • require demonstrated self-criticism all the way from the data collection, modelling and interpretation (statistical and substantive);
  • describe what is especially shaky or poorly probed thus far (and spell out how subsequent studies are most likely to locate those flaws)[i]

Institute penalties for QRPs and fraud?

Please offer your suggestions in the comments, or directly to Uncle Sam.

 [i]It may require a certain courage on the part of researchers, journalists, referees.

Categories: Announcement, reproducibility | 18 Comments

A crucial missing piece in the Pistorius trial? (2): my answer (Rejected Post)

images-1

Time for a break with a “Rejected Post”[i]

There’s one crucial point that Prosecutor Nell overlooked and failed to employ in the Oscar Pistorius trial–or so it appears. In fact I haven’t heard anyone mention it—so maybe it’s not as critical as I think it is. Before revealing (what I regard as) an important missing piece, I ask readers and legal beagles out there for their informal take.

Here are some items from the announced verdict (which do not directly give away the missing piece, but may be enough to deduce it). (A general article is here.)

Oscar Pistorius ‘not guilty’ of girlfriend’s murder, rules judge Thokozile Masipa

AP | September 11, 2014, 17.09 pm IST

Before the break, Judge Masipa ruled out “dolus eventualis”[ii], saying Mr Pistorius could not have foreseen he would kill the person behind the toilet door.

“How could the accused have reasonably foreseen the shot he fired would have killed the deceased? Clearly he did not subjectively foresee this, that he would have killed the person behind the door, let alone the deceased,” said Judge Masipa.

The judge said the defence argues it is highly improbable the accused could have made this up so quickly and consistently, even in his bail application, [really?]

….Evidence shows that at time he fired shots at toilet door, Mr Pistorius believed the deceased was in the bedroom, the judge says. This belief was communicated to a number of people shortly after the incident, she added.

The judge said there is “nothing in the evidence to suggest that Mr Pistorius’ belief was not genuinely entertained”. She cites reasons including the bathroom window being open, and the toilet door being shut.

… He… [said] he genuinely, though erroneously, believed that his life and that of the deceased was in danger,” the judge said….

The starting point is “whether accused had intention to kill person behind toilet door,” the judge said. Continue reading

Categories: rejected post | 8 Comments

“The Supernal Powers Withhold Their Hands And Let Me Alone” : C.S. Peirce

C. S. Peirce: 10 Sept, 1839-19 April, 1914

C. S. Peirce: 10 Sept, 1839-19 April, 1914

Memory Lane* in Honor of C.S. Peirce’s Birthday:
(Part 3) of “Peircean Induction and the Error-Correcting Thesis”

Deborah G. Mayo
Transactions of the Charles S. Peirce Society 41(2) 2005: 299-319

(9/10) Peircean Induction and the Error-Correcting Thesis (Part I)

(9/10) (Part 2) Peircean Induction and the Error-Correcting Thesis

8. Random sampling and the uniformity of nature

We are now at the point to address the final move in warranting Peirce’s [self-correcting thesis] SCT. The severity or trustworthiness assessment, on which the error correcting capacity depends, requires an appropriate link (qualitative or quantitative) between the data and the data generating phenomenon, e.g., a reliable calibration of a scale in a qualitative case, or a probabilistic connection between the data and the population in a quantitative case. Establishing such a link, however, is regarded as assuming observed regularities will persist, or making some “uniformity of nature” assumption—the bugbear of attempts to justify induction.

But Peirce contrasts his position with those favored by followers of Mill, and “almost all logicians” of his day, who “commonly teach that the inductive conclusion approximates to the truth because of the uniformity of nature” (2.775). Inductive inference, as Peirce conceives it (i.e., severe testing) does not use the uniformity of nature as a premise. Rather, the justification is sought in the manner of obtaining data. Justifying induction is a matter of showing that there exist methods with good error probabilities. For this it suffices that randomness be met only approximately, that inductive methods check their own assumptions, and that they can often detect and correct departures from randomness.

… It has been objected that the sampling cannot be random in this sense. But this is an idea which flies far away from the plain facts. Thirty throws of a die constitute an approximately random sample of all the throws of that die; and that the randomness should be approximate is all that is required. (1.94)

Peirce backs up his defense with robustness arguments. For example, in an (attempted) Binomial induction, Peirce asks, “what will be the effect upon inductive inference of an imperfection in the strictly random character of the sampling” (2.728). What if, for example, a certain proportion of the population had twice the probability of being selected? He shows that “an imperfection of that kind in the random character of the sampling will only weaken the inductive conclusion, and render the concluded ratio less determinate, but will not necessarily destroy the force of the argument completely” (2.728). This is particularly so if the sample mean is near 0 or 1. In other words, violating experimental assumptions may be shown to weaken the trustworthiness or severity of the proceeding, but this may only mean we learn a little less.

Yet a further safeguard is at hand:

Nor must we lose sight of the constant tendency of the inductive process to correct itself. This is of its essence. This is the marvel of it. …even though doubts may be entertained whether one selection of instances is a random one, yet a different selection, made by a different method, will be likely to vary from the normal in a different way, and if the ratios derived from such different selections are nearly equal, they may be presumed to be near the truth. (2.729)

Here, the marvel is an inductive method’s ability to correct the attempt at random sampling. Still, Peirce cautions, we should not depend so much on the self-correcting virtue that we relax our efforts to get a random and independent sample. But if our effort is not successful, and neither is our method robust, we will probably discover it. “This consideration makes it extremely advantageous in all ampliative reasoning to fortify one method of investigation by another” (ibid.). Continue reading

Categories: C.S. Peirce, Error Statistics, phil/history of stat | 11 Comments

Statistical Science: The Likelihood Principle issue is out…!

Stat SciAbbreviated Table of Contents:

Table of ContentsHere are some items for your Saturday-Sunday reading. 

Link to complete discussion: 

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29 (2014), no. 2, 227-266.

Links to individual papers:

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle. Statistical Science 29 (2014), no. 2, 227-239.

Dawid, A. P. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 240-241.

Evans, Michael. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 242-246.

Martin, Ryan; Liu, Chuanhai. Discussion: Foundations of Statistical Inference, Revisited. Statistical Science 29 (2014), no. 2, 247-251.

Fraser, D. A. S. Discussion: On Arguments Concerning Statistical Principles. Statistical Science 29 (2014), no. 2, 252-253.

Hannig, Jan. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 254-258.

Bjørnstad, Jan F. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 259-260.

Mayo, Deborah G. Rejoinder: “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 261-266.

Abstract: An essential component of inference based on familiar frequentist notions, such as p-values, significance and confidence levels, is the relevant sampling distribution. This feature results in violations of a principle known as the strong likelihood principle (SLP), the focus of this paper. In particular, if outcomes x and y from experiments E1 and E2 (both with unknown parameter θ), have different probability models f1( . ), f2( . ), then even though f1(xθ) = cf2(yθ) for all θ, outcomes x and ymay have different implications for an inference about θ. Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox [Ann. Math. Statist. 29 (1958) 357–372] proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which Ei produced the measurement, the assessment should be in terms of the properties of Ei. The surprising upshot of Allan Birnbaum’s [J.Amer.Statist.Assoc.57(1962) 269–306] argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP), entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases also refute [WCP entails SLP].

Key words: Birnbaumization, likelihood principle (weak and strong), sampling theory, sufficiency, weak conditionality

Regular readers of this blog know that the topic of the “Strong Likelihood Principle (SLP)” has come up quite frequently. Numerous informal discussions of earlier attempts to clarify where Birnbaum’s argument for the SLP goes wrong may be found on this blog. [SEE PARTIAL LIST BELOW.[i]] These mostly stem from my initial paper Mayo (2010) [ii]. I’m grateful for the feedback.

In the months since this paper has been accepted for publication, I’ve been asked, from time to time, to reflect informally on the overall journey: (1) Why was/is the Birnbaum argument so convincing for so long? (Are there points being overlooked, even now?) (2) What would Birnbaum have thought? (3) What is the likely upshot for the future of statistical foundations (if any)?

I’ll try to share some responses over the next week. (Naturally, additional questions are welcome.)

[i] A quick take on the argument may be found in the appendix to: “A Statistical Scientist Meets a Philosopher of Science: A conversation between David Cox and Deborah Mayo (as recorded, June 2011)”

 UPhils and responses

 

 

Categories: Birnbaum, Birnbaum Brakes, frequentist/Bayesian, Likelihood Principle, phil/history of stat, Statistics | 40 Comments

All She Wrote (so far): Error Statistics Philosophy Contents-3 years on

 

old blogspot typewriter

.

Error Statistics Philosophy: Blog Contents
By: D. G. Mayo[i]

Each month, I will mark (in red) 3 relevant posts (from that month 3 yrs ago) for readers wanting to catch-up or review central themes and discussions.

September 2011

October 2011

November 2011

December 2011

January 2012

February 2012

March 2012

April 2012

May 2012

June 2012

July 2012

August 2012

September 2012

October 2012

November 2012

December 2012

January 2013

  • (1/2) Severity as a ‘Metastatistical’ Assessment
  • (1/4) Severity Calculator
  • (1/6) Guest post: Bad Pharma? (S. Senn)
  • (1/9) RCTs, skeptics, and evidence-based policy
  • (1/10) James M. Buchanan
  • (1/11) Aris Spanos: James M. Buchanan: a scholar, teacher and friend
  • (1/12) Error Statistics Blog: Table of Contents
  • (1/15) Ontology & Methodology: Second call for Abstracts, Papers
  • (1/18) New Kvetch/PhilStock
  • (1/19) Saturday Night Brainstorming and Task Forces: (2013) TFSI on NHST
  • (1/22) New PhilStock
  • (1/23) P-values as posterior odds?
  • (1/26) Coming up: December U-Phil Contributions….
  • (1/27) U-Phil: S. Fletcher & N.Jinn
  • (1/30) U-Phil: J. A. Miller: Blogging the SLP

February 2013

  • (2/2) U-Phil: Ton o’ Bricks
  • (2/4) January Palindrome Winner
  • (2/6) Mark Chang (now) gets it right about circularity
  • (2/8) From Gelman’s blog: philosophy and the practice of Bayesian statistics
  • (2/9) New kvetch: Filly Fury
  • (2/10) U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof
  • (2/11) U-Phil: Mayo’s response to Hennig and Gandenberger
  • (2/13) Statistics as a Counter to Heavyweights…who wrote this?
  • (2/16) Fisher and Neyman after anger management?
  • (2/17) R. A. Fisher: how an outsider revolutionized statistics
  • (2/20) Fisher: from ‘Two New Properties of Mathematical Likelihood’
  • (2/23) Stephen Senn: Also Smith and Jones
  • (2/26) PhilStock: DO < $70
  • (2/26) Statistically speaking…

March 2013

  • (3/1) capitalizing on chance
  • (3/4) Big Data or Pig Data?
  • (3/7) Stephen Senn: Casting Stones
  • (3/10) Blog Contents 2013 (Jan & Feb)
  • (3/11) S. Stanley Young: Scientific Integrity and Transparency
  • (3/13) Risk-Based Security: Knives and Axes
  • (3/15) Normal Deviate: Double Misunderstandings About p-values
  • (3/17) Update on Higgs data analysis: statistical flukes (1)
  • (3/21) Telling the public why the Higgs particle matters
  • (3/23) Is NASA suspending public education and outreach?
  • (3/27) Higgs analysis and statistical flukes (part 2)
  • (3/31) possible progress on the comedy hour circuit?

April 2013

  • (4/1) Flawed Science and Stapel: Priming for a Backlash?
  • (4/4) Guest Post. Kent Staley: On the Five Sigma Standard in Particle Physics
  • (4/6) Who is allowed to cheat? I.J. Good and that after dinner comedy hour….
  • (4/10) Statistical flukes (3): triggering the switch to throw out 99.99% of the data
  • (4/11) O & M Conference (upcoming) and a bit more on triggering from a participant…..
  • (4/14) Does statistics have an ontology? Does it need one? (draft 2)
  • (4/19) Stephen Senn: When relevance is irrelevant
  • (4/22) Majority say no to inflight cell phone use, knives, toy bats, bow and arrows, according to survey
  • (4/23) PhilStock: Applectomy? (rejected post)
  • (4/25) Blog Contents 2013 (March)
  • (4/27) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill, comedy hour)
  • (4/29) What should philosophers of science do? (falsification, Higgs, statistics, Marilyn)

May 2013

  • (5/3) Schedule for Ontology & Methodology, 2013
  • (5/6) Professorships in Scandal?
  • (5/9) If it’s called the “The High Quality Research Act,” then ….
  • (5/13) ‘No-Shame’ Psychics Keep Their Predictions Vague: New Rejected post
  • (5/14) “A sense of security regarding the future of statistical science…” Anon review of Error and Inference
  • (5/18) Gandenberger on Ontology and Methodology (May 4) Conference: Virginia Tech
  • (5/19) Mayo: Meanderings on the Onto-Methodology Conference
  • (5/22) Mayo’s slides from the Onto-Meth conference
  • (5/24) Gelman sides w/ Neyman over Fisher in relation to a famous blow-up
  • (5/26) Schachtman: High, Higher, Highest Quality Research Act
  • (5/27) A.Birnbaum: Statistical Methods in Scientific Inference
  • (5/29) K. Staley: review of Error & Inference

June 2013

  • (6/1) Winner of May Palindrome Contest
  • (6/1) Some statistical dirty laundry
  • (6/5) Do CIs Avoid Fallacies of Tests? Reforming the Reformers (Reblog 5/17/12):
  • (6/6) PhilStock: Topsy-Turvy Game
  • (6/6) Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)
  • (6/8) Richard Gill: “Integrity or fraud… or just questionable research practices?”
  • (6/11) Mayo: comment on the repressed memory research
  • (6/14) P-values can’t be trusted except when used to argue that p-values can’t be trusted!
  • (6/19) PhilStock: The Great Taper Caper
  • (6/19) Stanley Young: better p-values through randomization in microarrays
  • (6/22) What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri
  • (6/26) Why I am not a “dualist” in the sense of Sander Greenland
  • (6/29) Palindrome “contest” contest
  • (6/30) Blog Contents: mid-year

July 2013

  • (7/3) Phil/Stat/Law: 50 Shades of gray between error and fraud
  • (7/6) Bad news bears: ‘Bayesian bear’ rejoinder–reblog mashup
  • (7/10) PhilStatLaw: Reference Manual on Scientific Evidence (3d ed) on Statistical Significance (Schachtman)
  • (7/11) Is Particle Physics Bad Science? (memory lane)
  • (7/13) Professor of Philosophy Resigns over Sexual Misconduct (rejected post)
  • (7/14) Stephen Senn: Indefinite irrelevance
  • (7/17) Phil/Stat/Law: What Bayesian prior should a jury have? (Schachtman)
  • (7/19) Msc Kvetch: A question on the Martin-Zimmerman case we do not hear
  • (7/20) Guest Post: Larry Laudan. Why Presuming Innocence is Not a Bayesian Prior
  • (7/23) Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs
  • (7/26) New Version: On the Birnbaum argument for the SLP: Slides for JSM talk

August 2013

  • (8/1) Blogging (flogging?) the SLP: Response to Reply- Xi’an Robert
  • (8/5) At the JSM: 2013 International Year of Statistics
  • (8/6) What did Nate Silver just say? Blogging the JSM
  • (8/9) 11th bullet, multiple choice question, and last thoughts on the JSM
  • (8/11) E.S. Pearson: “Ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot”
  • (8/13) Blogging E.S. Pearson’s Statistical Philosophy
  • (8/15) A. Spanos: Egon Pearson’s Neglected Contributions to Statistics
  • (8/17) Gandenberger: How to Do Philosophy That Matters (guest post)
  • (8/21) Blog contents: July, 2013
  • (8/22) PhilStock: Flash Freeze
  • (8/22) A critical look at “critical thinking”: deduction and induction
  • (8/28) Is being lonely unnatural for slim particles? A statistical argument
  • (8/31) Overheard at the comedy hour at the Bayesian retreat-2 years on

September 2013

  • (9/2) Is Bayesian Inference a Religion?
  • (9/3) Gelman’s response to my comment on Jaynes
  • (9/5) Stephen Senn: Open Season (guest post)
  • (9/7) First blog: “Did you hear the one about the frequentist…”? and “Frequentists in Exile”
  • (9/10) Peircean Induction and the Error-Correcting Thesis (Part I)
  • (9/10) (Part 2) Peircean Induction and the Error-Correcting Thesis
  • (9/12) (Part 3) Peircean Induction and the Error-Correcting Thesis
  • (9/14) “When Bayesian Inference Shatters” Owhadi, Scovel, and Sullivan (guest post)
  • (9/18) PhilStock: Bad news is good news on Wall St.
  • (9/18) How to hire a fraudster chauffeur
  • (9/22) Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”
  • (9/23) Barnard’s Birthday: background, likelihood principle, intentions
  • (9/24) Gelman est efffectivement une erreur statistician
  • (9/26) Blog Contents: August 2013
  • (9/29) Highly probable vs highly probed: Bayesian/ error statistical differences

October 2013

  • (10/3) Will the Real Junk Science Please Stand Up? (critical thinking)
  • (10/5) Was Janina Hosiasson pulling Harold Jeffreys’ leg?
  • (10/9) Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock
  • (10/12) Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”
  • (10/19) Blog Contents: September 2013
  • (10/19) Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*
  • (10/25) Bayesian confirmation theory: example from last post…
  • (10/26) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)
  • (10/31) WHIPPING BOYS AND WITCH HUNTERS

November 2013

  • (11/2) Oxford Gaol: Statistical Bogeymen
  • (11/4) Forthcoming paper on the strong likelihood principle
  • (11/9) Null Effects and Replication
  • (11/9) Beware of questionable front page articles warning you to beware of questionable front page articles (iii)
  • (11/13) T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)
  • (11/16) PhilStock: No-pain bull
  • (11/16) S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)
  • (11/18) Lucien Le Cam: “The Bayesians hold the Magic”
  • (11/20) Erich Lehmann: Statistician and Poet
  • (11/23) Probability that it is a statistical fluke [i]
  • (11/27) “The probability that it be a statistical fluke” [iia]
  • (11/30) Saturday night comedy at the “Bayesian Boy” diary (rejected post*)

December 2013

  • (12/3) Stephen Senn: Dawid’s Selection Paradox (guest post)
  • (12/7) FDA’s New Pharmacovigilance
  • (12/9) Why ecologists might want to read more philosophy of science (UPDATED)
  • (12/11) Blog Contents for Oct and Nov 2013
  • (12/14) The error statistician has a complex, messy, subtle, ingenious piece-meal approach
  • (12/15) Surprising Facts about Surprising Facts
  • (12/19) A. Spanos lecture on “Frequentist Hypothesis Testing”
  • (12/24) U-Phil: Deconstructions [of J. Berger]: Irony & Bad Faith 3
  • (12/25) “Bad Arguments” (a book by Ali Almossawi)
  • (12/26) Mascots of Bayesneon statistics (rejected post)
  • (12/27) Deconstructing Larry Wasserman
  • (12/28) More on deconstructing Larry Wasserman (Aris Spanos)
  • (12/28) Wasserman on Wasserman: Update! December 28, 2013
  • (12/31) Midnight With Birnbaum (Happy New Year)

January 2014

  • (1/2) Winner of the December 2013 Palindrome Book Contest (Rejected Post)
  • (1/3) Error Statistics Philosophy: 2013
  • (1/4) Your 2014 wishing well. …
  • (1/7) “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech)
  • (1/11) Two Severities? (PhilSci and PhilStat)
  • (1/14) Statistical Science meets Philosophy of Science: blog beginnings
  • (1/16) Objective/subjective, dirty hands and all that: Gelman/Wasserman blogolog (ii)
  • (1/18) Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]
  • (1/22) Phil6334: “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos (Virginia Tech) UPDATE: JAN 21
  • (1/24) Phil 6334: Slides from Day #1: Four Waves in Philosophy of Statistics
  • (1/25) U-Phil (Phil 6334) How should “prior information” enter in statistical inference?
  • (1/27) Winner of the January 2014 palindrome contest (rejected post)
  • (1/29) BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics
  • (1/31) Phil 6334: Day #2 Slides

February 2014

  • (2/1) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs B-boosts)
  • (2/3) PhilStock: Bad news is bad news on Wall St. (rejected post)
  • (2/5) “Probabilism as an Obstacle to Statistical Fraud-Busting” (draft iii)
  • (2/9) Phil6334: Day #3: Feb 6, 2014
  • (2/10) Is it true that all epistemic principles can only be defended circularly? A Popperian puzzle
  • (2/12) Phil6334: Popper self-test
  • (2/13) Phil 6334 Statistical Snow Sculpture
  • (2/14) January Blog Table of Contents
  • (2/15) Fisher and Neyman after anger management?
  • (2/17) R. A. Fisher: how an outsider revolutionized statistics
  • (2/18) Aris Spanos: The Enduring Legacy of R. A. Fisher
  • (2/20) R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
  • (2/21) STEPHEN SENN: Fisher’s alternative to the alternative
  • (2/22) Sir Harold Jeffreys’ (tail-area) one-liner: Sat night comedy [draft ii]
  • (2/24) Phil6334: February 20, 2014 (Spanos): Day #5
  • (2/26) Winner of the February 2014 palindrome contest (rejected post)
  • (2/26) Phil6334: Feb 24, 2014: Induction, Popper and pseudoscience (Day #4)

March 2014

  • (3/1) Cosma Shalizi gets tenure (at last!) (metastat announcement)
  • (3/2) Significance tests and frequentist principles of evidence: Phil6334 Day #6
  • (3/3) Capitalizing on Chance (ii)
  • (3/4) Power, power everywhere–(it) may not be what you think! [illustration]
  • (3/8) Msc kvetch: You are fully dressed (even under you clothes)?
  • (3/8) Fallacy of Rejection and the Fallacy of Nouvelle Cuisine
  • (3/11) Phil6334 Day #7: Selection effects, the Higgs and 5 sigma, Power
  • (3/12) Get empowered to detect power howlers
  • (3/15) New SEV calculator (guest app: Durvasula)
  • (3/17) Stephen Senn: “Delta Force: To what extent is clinical relevance relevant?” (Guest Post)
  • (3/19) Power taboos: Statue of Liberty, Senn, Neyman, Carnap, Severity
  • (3/22) Fallacies of statistics & statistics journalism, and how to avoid them: Summary & Slides Day #8 (Phil 6334)
  • (3/25) The Unexpected Way Philosophy Majors Are Changing The World Of Business
  • (3/26) Phil6334:Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)
  • (3/28) Severe osteometric probing of skeletal remains: John Byrd
  • (3/29) Winner of the March 2014 palindrome contest (rejected post)
  • (3/30) Phil6334: March 26, philosophy of misspecification testing (Day #9 slides)

April 2014

  • (4/1) Skeptical and enthusiastic Bayesian priors for beliefs about insane asylum renovations at Dept of Homeland Security: I’m skeptical and unenthusiastic
  • (4/3) Self-referential blogpost (conditionally accepted*)
  • (4/5) Who is allowed to cheat? I.J. Good and that after dinner comedy hour. . ..
  • (4/6) Phil6334: Duhem’s Problem, highly probable vs highly probed; Day #9 Slides
  • (4/8) “Out Damned Pseudoscience: Non-significant results are the new ‘Significant’ results!” (update)
  • (4/12) “Murder or Coincidence?” Statistical Error in Court: Richard Gill (TEDx video)
  • (4/14) Phil6334: Notes on Bayesian Inference: Day #11 Slides
  • (4/16) A. Spanos: Jerzy Neyman and his Enduring Legacy
  • (4/17) Duality: Confidence intervals and the severity of tests
  • (4/19) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill)
  • (4/21) Phil 6334: Foundations of statistics and its consequences: Day#12
  • (4/23) Phil 6334 Visitor: S. Stanley Young, “Statistics and Scientific Integrity”
  • (4/26) Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day #13)
  • (4/30) Able Stats Elba: 3 Palindrome nominees for April! (rejected post)

May 2014

  • (5/1) Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle
  • (5/3) You can only become coherent by ‘converting’ non-Bayesianly
  • (5/6) Winner of April Palindrome contest: Lori Wike
  • (5/7) A. Spanos: Talking back to the critics using error statistics (Phil6334)
  • (5/10) Who ya gonna call for statistical Fraudbusting? R.A. Fisher, P-values, and error statistics (again)
  • (5/15) Scientism and Statisticism: a conference* (i)
  • (5/17) Deconstructing Andrew Gelman: “A Bayesian wants everybody else to be a non-Bayesian.”
  • (5/20) The Science Wars & the Statistics Wars: More from the Scientism workshop
  • (5/25) Blog Table of Contents: March and April 2014
  • (5/27) Allan Birnbaum, Philosophical Error Statistician: 27 May 1923 – 1 July 1976
  • (5/31) What have we learned from the Anil Potti training and test data frameworks? Part 1 (draft 2)

June 2014

  • (6/5) Stephen Senn: Blood Simple? The complicated and controversial world of bioequivalence (guest post)
  • (6/9) “The medical press must become irrelevant to publication of clinical trials.”
  • (6/11) A. Spanos: “Recurring controversies about P values and confidence intervals revisited”
  • (6/14) “Statistical Science and Philosophy of Science: where should they meet?”
  • (6/21) Big Bayes Stories? (draft ii)
  • (6/25) Blog Contents: May 2014
  • (6/28) Sir David Hendry Gets Lifetime Achievement Award
  • (6/30) Some ironies in the ‘replication crisis’ in social psychology (4th and final installment)

July 2014

  • (7/7) Winner of June Palindrome Contest: Lori Wike
  • (7/8) Higgs Discovery 2 years on (1: “Is particle physics bad science?”)
  • (7/10) Higgs Discovery 2 years on (2: Higgs analysis and statistical flukes)
  • (7/14) “P-values overstate the evidence against the null”: legit or fallacious? (revised)
  • (7/23) Continued:”P-values overstate the evidence against the null”: legit or fallacious?
  • (7/26) S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)
  • (7/31) Roger Berger on Stephen Senn’s “Blood Simple” with a response by Senn (Guest Posts)

August 2014

  • (08/03) Blogging Boston JSM2014?
  • (08/05) Neyman, Power, and Severity
  • (08/06) What did Nate Silver just say? Blogging the JSM 2013
  • (08/09) Winner of July Palindrome: Manan Shah
  • (08/09) Blog Contents: June and July 2014
  • (08/11) Egon Pearson’s Heresy
  • (08/17) Are P Values Error Probabilities? Or, “It’s the methods, stupid!” (2nd install)
  • (08/23) Has Philosophical Superficiality Harmed Science?
  • (08/29) BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

[i]Table of Contents compiled by N. Jinn & J. Miller)*

*I thank Jean Miller for her assiduous work on the blog, and all contributors and readers for helping “frequentists in exile” to feel (and truly become) less exiled–wherever they may be!

Categories: blog contents, Metablog, Statistics | Leave a comment

3 in blog years: Sept 3 is 3rd anniversary of errorstatistics.com

Where did you hear this?  “Join me, if you will, for a little deep-water drilling, as I cast about on my isle of Elba.” Remember this and this? And this philosophical treatise on “moving blog day”? Oy, did I really write all this stuff?

http://errorstatistics.blogspot.com/2011/09/overheard-at-comedy-hour-at-bayesian_03.html

cake baked by blog staff for 3 year anniversary of errorstatistics.com

I still see this as my rag-tag amateur blog. I never learned html and don’t have time to now. But the blog enterprise was more jocund and easy-going then–just an experiment, really, and a place to discuss our RMM papers. (And, of course, a home for error statistical philosophers-in-exile).

A blog table of contents for all three years will appear tomorrow.

Anyway, 2 representatives from Elba flew into NYC and  baked this cake in my never-used Chef’s oven (based on the cover/table of contents of EGEK 1996). We’ll be celebrating at A Different Place tonight[i]–so if you’re in the neighborhood, stop by after 8pm for an Elba Grease (on me).

Do you want a free signed copy of EGEK? Say why in 25 words or less (to error@vt.edu), and the Fund for E.R.R.O.R.* will send them to the top 3 submissions (by 9/10/14).**

Acknowledgments: I want to thank the many commentators for their frequent insights and for keeping things interesting and lively. Among the regulars, and semi-regulars (but with impact) off the top of my head, and in no order: Senn, Yanofsky, Byrd, Gelman, Schachtman, Kepler, McKinney, S. Young, Matloff, O’Rourke, Gandenberger, Wasserman, E. Berk, Spanos, Glymour, Rohde, Greenland, Omaclaren,someone named Mark, assorted guests, original guests, and anons, and mysterious visitors, related twitterers (who would rather tweet from afar). I’m sure I’ve left some people out. Thanks to students and participants in the spring 2014 seminar with Aris Spanos (slides and lecture notes are still up).

I’m especially grateful to my regular guest bloggers: Stephen Senn and Aris Spanos, and to those who were subjected to deconstructions and to U-Phils in years past. (I may return to that some time.) Other guest posters for 2014 will be acknowledged in the year round up.

I thank blog compilers, Jean Miler and Nicole Jinn, and give special thanks for the tireless efforts of Jean Miller who has slogged through html, or whatever it is, when necessary, has scanned and put up dozens of articles to make them easy for readers to access, taken slow ferries back and forth to the island of Elba, and fixed gazillions of glitches on a daily basis. Last, but not least, to the palindromists who have been winning lots of books recently (1 day left for August submissions).

*Experimental Reasoning, Reliability, Objectivity and Rationality.

** Accompany submissions with an e-mail address and regular address. All submissions remain private. Elba judges decisions are final. Void in any places where prohibited by laws, be they laws of likelihood or Napoleanic laws-in-exile. But seriously, we’re giving away 3 books.

[i]email for directions.

Categories: Announcement, Statistics | 12 Comments

BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

.

.

1.An Assumed Law of Statistical Evidence (law of likelihood)

Nearly all critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with the following general assumption about the nature of inductive evidence or support:

Data x are better evidence for hypothesis H1 than for H0 if x are more probable under H1 than under H0.

Ian Hacking (1965) called this the logic of support: x supports hypotheses H1 more than H0 if H1 is more likely, given x than is H0:

Pr(x; H1) > Pr(x; H0).

[With likelihoods, the data x are fixed, the hypotheses vary.]*

Or,

x is evidence for H1 over H0 if the likelihood ratio LR (H1 over H0 ) is greater than 1.

It is given in other ways besides, but it’s the same general idea. (Some will take the LR as actually quantifying the support, others leave it qualitative.)

In terms of rejection:

“An hypothesis should be rejected if and only if there is some rival hypothesis much better supported [i.e., much more likely] than it is.” (Hacking 1965, 89)

2. Barnard (British Journal of Philosophy of Science )

But this “law” will immediately be seen to fail on our minimal severity requirement. Hunting for an impressive fit, or trying and trying again, it’s easy to find a rival hypothesis H1 much better “supported” than H0 even when H0 is true. Or, as Barnard (1972) puts it, “there always is such a rival hypothesis, viz. that things just had to turn out the way they actually did” (1972 p. 129).  H0: the coin is fair, gets a small likelihood (.5)k given k tosses of a coin, while H1: the probability of heads is 1 just on those tosses that yield a head, renders the sequence of k outcomes maximally likely. This is an example of Barnard’s “things just had to turn out as they did”. Or, to use an example with P-values: a statistically significant difference, being improbable under the null H0 , will afford high likelihood to any number of explanations that fit the data well.

3.Breaking the law (of likelihood) by going to the “second,” error statistical level:

How does it fail our severity requirement? First look at what the frequentist error statistician must always do to critique an inference: she must consider the capability of the inference method that purports to provide evidence for a claim. She goes to a higher level or metalevel, as it were. In this case, the likelihood ratio plays the role of the needed statistic d(X). To put it informally, she asks:

What’s the probability the method would yield an LR disfavoring H0 compared to some alternative H1  even if H0 is true?

Continue reading

Categories: highly probable vs highly probed, law of likelihood, Likelihood Principle, Statistics | 72 Comments

Blog at WordPress.com. The Adventure Journal Theme.

Follow

Get every new post delivered to your Inbox.

Join 465 other followers