S. Stanley Young: Are there mortality co-benefits to the Clean Power Plan? It depends. (Guest Post)




S. Stanley Young, PhD
Assistant Director
Bioinformatics National Institute of Statistical Sciences Research Triangle Park, NC

Are there mortality co-benefits to the Clean Power Plan? It depends.

Some years ago, I listened to a series of lectures on finance. The professor would ask a rhetorical question, pause to give you some time to think, and then, more often than not, answer his question with, “It depends.” Are there mortality co-benefits to the Clean Power Plan? Is mercury coming from power plants leading to deaths? Well, it depends.

So, rhetorically, is an increase in CO2 a bad thing? There is good and bad in everything. Well, for plants an increase in CO2 is a good thing. They grow faster. They convert CO2 into more food and fiber. They give off more oxygen, which is good for humans. Plants appear to be CO2 starved.

It is argued that CO2 is a greenhouse gas and an increase in CO2 will raise temperatures, ice will melt, sea levels will rise, and coastal area will flood, etc. It depends. In theory yes, in reality, maybe. But a lot of other events must be orchestrated simultaneously. Obviously, that scenario depends on other things as, for the last 18 years, CO2 has continued to go up and temperatures have not. So it depends on other factors, solar radiance, water vapor, El Nino, sun spots, cosmic rays, earth presession, etc., just what the professor said.

young pic 1

So suppose ambient temperatures do go up a few degrees. On balance, is that bad for humans? The evidence is overwhelming that warmer is better for humans. One or two examples are instructive. First, Cox et al., (2013) with the title, “Warmer is healthier: Effects on mortality rates of changes in average fine particulate matter (PM2.5) concentrations and temperatures in 100 U.S. cities.” To quote from the abstract of that paper, “Increases in average daily temperatures appear to significantly reduce average daily mortality rates, as expected from previous research.” Here is their plot of daily mortality rate versus Max temperature. It is clear that as the maximum temperature in a city goes up, mortality goes down. So if the net effect of increasing CO2 is increasing temperature, there should be a reduction in deaths.

I have a very large California data set. The data covers eight air basins and the years 2000 to 2012. There are young pic 2over 37,000 exposure days and over two million deaths. The data for Los Angeles for the year 2007 is typical.

The number of Heart or Lung deaths for people 65 and older are given on the left, y-axis. The moving 21-day median number of deaths are given with blue diamonds as time marches to the right. Deaths are high during the winter, when temperatures are lower; the number of deaths are lower during the summer, when the temperatures are higher. These plots are typical. It is known that higher temperatures are associated with lower deaths.

A purported co-benefit of lower CO2 is that there will be lower levels of PM2.5. (PM2.5 is not chemically defined, but is partially made up of combustion products.) It is widely believed that lower levels of PM2.5 will lead to fewer deaths. Here is what Cox et al. (2013) have to say, “Unexpectedly, reductions in PM2.5 do not appear to cause any reductions in mortality rates.” And here is their supporting figure below.
young pic 3
Chay et al. (2003) looked at a reduction in air pollution due to the Clean Air Act. Counties out of compliance were
given stricter air pollution reduction goals. This action by the EPA created a so called natural experiment, Craig et al. (2012). The EPA selected counties did reduce air pollution levels, but there was no reduction in deaths after adjustments for covariates. Chay et al. (2003) say, “We find that regulatory status is associated with large reductions in TSPs pollution but has little association with reductions in either adult or elderly mortality.” So Cox et al. (2013) confirm the finding of Chay et al. (2003) that a reduction in PM2.5 does not lead to a reduction in deaths. Young and Xia (2013) found no assocation of PM2.5 with longevity in western US. Enstrom (2005) and many others have found no association of chronic deaths with PM2.5 in California.

Many claim an association of air pollution with deaths, acute and chronic. How can the two sets of claims be understood? Well, it depends. Greven et al. (2011) say in their abstract, “… we derive a Poisson regression model and estimate two regression coefficients: the “global” coefficient that measures the association between national trends in pollution and mortality; and the “local” coefficient, derived from space by time variation, that measures the association between location-specific trends in pollution and mortality adjusted by the national trends. …Results based on the global coefficient indicate a large increase in the national life expectancy for reductions in the yearly national average of PM2.5. However, this coefficient based on national trends in PM2.5 and mortality is likely to be confounded by other variables trending on the national level. Confounding of the local coefficient by unmeasured factors is less likely, although it cannot be ruled out. Based on the local coefficient alone, we are not able to demonstrate any change in life expectancy for a reduction in PM2.5.” (Italics mine) In plain words, associations measured from location to location, which are likely to be affected by differences in covariates, show an association. Examination of trends within locations, which are less likely to be affected by covariates, show no association. In short, the claims made depend on how well covariates are taken into account. When they are taken into account, Chay et al. (2003), Greven et al. (2011), Cox et al. (2013), Young (2014), there is no association of air pollution with deaths. Chay controls for multiple economic factors. Greven controls for location. Cox controls for temperature. Young controls for time and geography.

Note well: The analysis of Young (2014) uses a moving median within a location (air basin). This analysis is much less likely to be affected by covariates. This analysis finds no assocation of air pollution (PM2.5 or ozone) with deaths. Several figures are instructive. The figures are for LA, but are typical for the other California air basins. First ozone:

young pic 4

The figures were constructed as follows. From the daily death total was subtracted a 21-day moving median. This calculation corrects for the time trend in the data. From the daily air pollution level the 21-day moving median for the air pollution was subtracted. The daily death “deviation” was plotted against the pollution “deviation”. If air pollution was causing deaths, then the density in these three figures should go from lower left to upper right. To examine if previous air pollution, e.g. yesterday or the day before, was associated with current deaths, lags of 0, 1, and 2 days were used, hence the three figures. Plots like these were computed for all eight air basins; the figures for LA are typical. Next we give the same sort of figures, but for PM2.5. Again, LA.

young pic 5

Again, the density is concentrated at zero PM2.5 and zero deaths, and, the important point, there is no tilt of the density from lower left to upper right. And again the plots for LA are typical of the other seven air basins.

Can we say more? Many authors have noted “geographic heterogeneity”, the measured effect of air pollution is not the same in different locations. There is overwhelming evidence for the existence of  geographic heterogeneity. See for example, Krewski et al. (2000), Smith et al. (2009), Greven et al. (2011) and Young and Xia (2013). Multiple authors have not found any association of air pollution with acute deaths in California, Krewski et al. (2000),  Smith et al. (2009), Young and Xia (2013) and Jarrett et al. (2013). Enstrom (2005) found no association with chronic deaths in California. A careful consideration of of this “geographic heterogeneity” is a key to understanding why it is unlikely that air pollution is causing deaths. Given that geographic heterogeneity exists, how should it be interpreted? First, statistical practice says that if interaction exists, then average effects often are misleading. Any recommendations should be for specific situations. In the words of the finance professor, it depends. In this case it makes no sense to regulate air pollution in California more severely than current regulations.

We can consider the question of interactions of air pollution with geography more deeply. Greven et al. (2011) state in their abstract, “Based on the local coefficient alone, we are not able to demonstrate any change in life expectancy for a reduction in PM2.5.” and they go on to say differences in locations (geographic heterogeneity) is most likely due to differences in covariates, e.g. age distributions, income, smoking. Indeed when Chay et al. (2003) corrected their analysis for an extensive list of covariates, they found no effect of the EPA intervention to reduce air pollution.

There is empirical evidence and a logical case that air pollution is (most likely) not causally related to acute deaths. Heart attacks and stroke were recently removed as a possible etiology, Milojevic et al. (2014).

Economics on the back of an envelope

The EPA claims saving 6,600 deaths per year due to CPP. They value each death at nine million dollars giving a co-benefit of $59.4B. But analysis that takes covariates into consideration finds no excess deaths due to ozone or PM2.5. The $59.4B co-benefit is the result of flawed analysis. And what is the cost of the regulation? The EPA says CPP is the most costly regulation it has considered and puts the cost at up to $90B/yr. The National Manufacturers Association puts the cost at $270B/yr, $900/person/year in 2020.

Consider Figure 4b of Young and Xia (2013). The data used in this figure is that used in Pope, Ezzati, and Dockery (2009) and was kindly provided by Arden Pope III. Change in income and air pollution young pic 6from ~1980 to ~2000 was used. Income in thousands of dollars increase over that time period, but differed in magnitude from city to city, the x-axis. Life expectancy increased as well, y-axis. The general trend is very clear, increased income is associated with increased life expectancy. The income-life expectancy relationship is well-known. See the dramatic video by Hans Rosling (2010). To the extent that regulations are expensive, they should move people down and left in this figure with life expectancy less than it would have been. For example, $900 less income is expected to reduce life expectancy by two months.

So, do you want the EPA CPP regulations to extend your life not at all, costing you $900/yr or do you want to have use of your own money and save two months of your life? It depends. EPA decides or you decide.


  1. Increased CO2 is good for plants as plants grow better with increased CO2.
  1. Increases in temperature, however caused, are good for humans as they are less likely to die.
  2. The science literature, when covariates are controlled, is on the side that increased ozone and PM2.5 are not associated with increased deaths.
  3. On balance, the costs of reducing CO2, PM2.5 and ozone are expected to lead to reduced life expectancy.


Chay K, Dobkin C, Greenstone M. (2003) The Clean Air Act of 1970 and adult mortality. J Risk Uncertainty 27, 279-300.

Cox LA Jr, Popken DA, Ricci PF. (2013b) Warmer is healthier: effects on mortality rates of changes in average fine particulate matter (PM2.5) concentrations and temperatures in 100 U.S. cities. Regul Toxicol Pharmacol. 66, 336-346.

Craig P, Cooper C, Gunnell D, Haw S, Lawson K, Macintyre S, Ogilvie D, Petticrew M, Reeves B, Sutton M, Thompson S. (2012) Using natural experiments to evaluate population health interventions: new Medical Research Council guidance. J Epi Community Health. 66,1182-1186.

Enstrom JE. (2005) Fine particulate air pollution and total mortality among elderly Californians. 1973–2002, Inhalation Toxicology 17, 803-816.

Greven S, Dominici F, Zeger S. (2011) An approach to the estimation of chronic air pollution effects using spatio-temporal information. J Amer Stat Assoc. 106, 396-406.

Jerrett M. (2010) California-specific Studies on the PM2.5 Mortality Association. See slides 12 and 13, no increase in “All causes” death rate,

Krewski D, Burnett RT, Goldberg MS, Hoover K, Siemiatycki J, Jerrett M, Abrahamowicz M, White WH. (2000) Reanalysis of the Harvard Six Cities Study and the American Cancer Society Study of Particulate Air Pollution and Mortality. Part II: Sensitivity Analysis, HEI Publications., See Figure 21, in particular.

Milojevic A, Wilkinson P, Armstrong B, Bhaskaran K, Smeeth L, Hajat S. (2014) Short-term effects of air pollution on a range of cardiovascular events in England and Wales: case-crossover analysis of the MINAP database, hospital admissions and mortality. Heart 100, 1093-1098.

Pope III CA, Ezzati E, Dockery DW. (2009) Fine particulate air pollution and life expectancy in the United States, N Engl J Med 360, 376-386.

Rosling H. (2010) 200 countries, 200 years, 4 minutes.

Smith RL, Xu B, Switzer PP. (2009) Reassessing the relationship between ozone and short-term mortality in U.S. urban communities. Inhal Toxicol 29(S2), 37-61.

Young SS, Xia JQ. (2013) Assessing geographic heterogeneity and variable importance in an air pollution data set. Statistical analysis and data mining. 6, 375-386.

Young SS. (2014) Air pollution and daily deaths in California. Proceedings, 2014 Discovery Summit.

NOTE FROM MAYO: I invited Stan Young for an update on his work on this. I let guest posters defend their own arguments, if they wish. A little challenge for a Saturday night.




Categories: evidence-based policy, junk science, Statistics | Tags: | 34 Comments

Msc. Kvetch: What does it mean for a battle to be “lost by the media”?

IMG_17801.  What does it mean for a debate to be “media driven” or a battle to be “lost by the media”? In my last post, I noted that until a few weeks ago, I’d never heard of a “power morcellator.” Nor had I heard of the AAGL–The American Association of Gynecologic Laparoscopists. In an article Battle over morcellation lost ‘in the media’”(Nov 26, 2014) Susan London reports on a recent meeting of the AAGL[i]

The media played a major role in determining the fate of uterine morcellation, suggested a study reported at a meeting sponsored by AAGL.

“How did we lose this battle of uterine morcellation? We lost it in the media,” asserted lead investigator Dr. Adrian C. Balica, director of the minimally invasive gynecologic surgery program at the Robert Wood Johnson Medical School in New Brunswick, N.J.

The “investigation” Balica led consisted of collecting Internet search data using something called the Google Adwords Keyword Planner:

Results showed that the average monthly number of Google searches for the term ‘morcellation’ held steady throughout most of 2013 at about 250 per month, reported Dr. Balica. There was, however, a sharp uptick in December 2013 to more than 2,000 per month, and the number continued to rise to a peak of about 18,000 per month in July 2014. A similar pattern was seen for the terms ‘morcellator,’ ‘fibroids in uterus,’ and ‘morcellation of uterine fibroid.’

The “vitals” of the study are summarized at the start of the article:

Key clinical point: Relevant Google searches rose sharply as the debate unfolded.

Major finding: The mean monthly number of searches for “morcellation” rose from about 250 in July 2013 to 18,000 in July 2014.

Data source: An analysis of Google searches for terms related to the power morcellator debate.

Disclosures: Dr. Balica disclosed that he had no relevant conflicts of interest.

2. Here’s my question: Does a high correlation between Google searches and debate-related terms signify that the debate is “media driven”? I suppose you could call it that, but Dr. Balica is clearly suggesting that something not quite kosher, or not fully factual was responsible for losing “this battle of uterine morcellation”, downplaying the substantial data and real events that drove people (like me) to search the terms upon hearing the FDA announcement in November.

This interval spanned events that included the first report of the issue in the mainstream media (December 2013), the Food and Drug Administration’s initial statement discouraging use of power morcellation for uterine fibroids (April 2014), and the issuance of analyses and rebuttals by several medical professional associations (May 2104 and thereafter).

Subsequent to the AAGL meeting, the FDA issued a new warning on Nov. 24, 2014, not to use power morcellation in the majority of women undergoing hysterectomy or myomectomy for uterine fibroids because “there is no reliable method for predicting whether a woman with fibroids may have a uterine sarcoma” that morcellation could spread. The agency estimated that about 1 in 350 fibroid patients actually have an occult sarcoma.”

So it isn’t as if there was some mass PR campaign without substance. It could have been so charged if, say, the statistics were vastly off.

3.Let’s define “media driven” policies. I suggest that a legitimate and useful sense of an issue or decision being “media driven” would be that people’s positions and actions in relation to it were unduly influenced by an exaggerated or pervasive media onslaught, marked by heavily pushing one side of a debate to the exclusion of other reasonable, alternative positions. So let’s define a decision being “media driven” that way. Certainly I can think of a number of media-driven issues and actions as of late, going by this definition. Finding an issue or decision to be “media-driven”, then, is a basis for disparaging the evidential basis for the decision, as when we say they were just caught up in a “media frenzy”.

I find it interesting that the ability to track word look-ups has suddenly given a new basis for disparaging evidence. My criticism of Balica’s use of search data as a test of a “mere media-driven” effect is that one would fully expect such a correlation in cases where genuine evidence was driving both the search and positions reached about the issue. Thus, it is not a severe, and is in fact a lousy test for showing “media driven” effects. What would need to be shown in this case is that the people making the policy decisions, the FDA, Johnson & Johnson, various hospitals, etc. were unduly influenced by an exaggeration of the facts, out of proportion to the real situation.

Says Balica:

“Medical and surgical practice is going to be changed by the media,” he predicted. Thus, studying how the morcellator controversy unfolded in this venue can help inform strategies for addressing public perceptions”.

“This is just the battle. Hopefully, we aren’t going to lose the war,” he concluded.

I guess they’ll be ready next time with their own PR. What do you think?

[i] You can read the full article at

I just came across an excellent and complete discussion of the case at least up to its date. I wanted to record it here:


Categories: msc kvetch, PhilStat Law, science communication, Statistics | 11 Comments

How power morcellators inadvertently spread uterine cancer

imagesUntil a few weeks ago, I’d never even heard of a “power morcellator.” Nor was I aware of the controversy that has pitted defenders of a woman’s right to choose a minimally invasive laparoscopic procedure in removing fibroids—enabled by the power morcellator–and those who decry the danger it poses in spreading an undetected uterine cancer throughout a woman’s abdomen. The most outspoken member of the anti-morcellation group is surgeon Hooman Noorchashm. His wife, Dr. Amy Reed, had a laparoscopic hysterectomy that resulted in morcellating a hidden cancer, progressing it to Stage IV sarcoma. Below is their video (link is here), followed by a recent FDA warning. I may write this in stages or parts. (I will withhold my view for now, I’d like to know what you think.)

Morcellation: (The full Article is here.)


FDA Safety Communication:images-1

UPDATED Laparoscopic Uterine Power Morcellation in Hysterectomy and Myomectomy: FDA Safety Communication

The following information updates our April 17, 2014 communication.

Date Issued: Nov. 24, 2014

Laparoscopic power morcellators are medical devices used during different types of laparoscopic (minimally invasive) surgeries. These can include certain procedures to treat uterine fibroids, such as removing the uterus (hysterectomy) or removing the uterine fibroids (myomectomy). Morcellation refers to the division of tissue into smaller pieces or fragments and is often used during laparoscopic surgeries to facilitate the removal of tissue through small incision sites.

When used for hysterectomy or myomectomy in women with uterine fibroids, laparoscopic power morcellation poses a risk of spreading unsuspected cancerous tissue, notably uterine sarcomas, beyond the uterus. The FDA is warning against using laparoscopic power morcellators in the majority of women undergoing hysterectomy or myomectomy for uterine fibroids. Health care providers and patients should carefully consider available alternative treatment options for the removal of symptomatic uterine fibroids.

Summary of Problem and Scope: 
Uterine fibroids are noncancerous growths that develop from the muscular tissue of the uterus. Most women will develop uterine fibroids (also called leiomyomas) at some point in their lives, although most cause no symptoms1. In some cases, however, fibroids can cause symptoms, including heavy or prolonged menstrual bleeding, pelvic pressure or pain, and/or frequent urination, requiring medical or surgical therapy.

Many women choose to undergo laparoscopic hysterectomy or myomectomy because these procedures are associated with benefits such as a shorter post-operative recovery time and a reduced risk of infection compared to abdominal hysterectomy and myomectomy2. Many of these laparoscopic procedures are performed using a power morcellator.

Based on an FDA analysis of currently available data, we estimate that approximately 1 in 350 women undergoing hysterectomy or myomectomy for the treatment of fibroids is found to have an unsuspected uterine sarcoma, a type of uterine cancer that includes leiomyosarcoma. At this time, there is no reliable method for predicting or testing whether a woman with fibroids may have a uterine sarcoma.

If laparoscopic power morcellation is performed in women with unsuspected uterine sarcoma, there is a risk that the procedure will spread the cancerous tissue within the abdomen and pelvis, significantly worsening the patient’s long-term survival. While the specific estimate of this risk may not be known with certainty, the FDA believes that the risk is higher than previously understood.

Because of this risk and the availability of alternative surgical options for most women, the FDA is warning against the use of laparoscopic power morcellators in the majority of women undergoing myomectomy or hysterectomy for treatment of fibroids.

Limiting the patients for whom laparoscopic morcellators are indicated, the strong warning on the risk of spreading unsuspected cancer, and the recommendation that doctors share this information directly with their patients, are part of FDA guidance to manufacturers of morcellators. The guidance strongly urges these manufacturers to include this new information in their product labels.

Recommendations for Health Care Providers:

  • Be aware of thefollowingnewcontraindications recommended by the FDA;
    1. Laparoscopic power morcellators are contraindicated for removal of uterine tissue containing suspected fibroids in patients who are peri- or post-menopausal, or are candidates for en bloc tissue removal, for example through the vagina or mini-laparotomy incision. (Note: These groups of women represent the majority of women with fibroids who undergo hysterectomy and myomectomy.)
    2. Laparoscopic power morcellators are contraindicated in gynecologic surgery in which the tissue to be morcellated is known or suspected to contain malignancy.
  • Be aware of the following new boxed warning recommended by the FDA:
The FDA warns that uterine tissue may contain unsuspected cancer. The use of laparoscopic power morcellators during fibroid surgery may spread cancer, and decrease the long-term survival of patients. This information should be shared with patients when considering surgery with the use of these devices.
  • Carefully consider all the available treatment options for women with uterine fibroids.
  • Thoroughly discuss the benefits and risks of all treatments with patients. Be certain to inform the small group of patients for whom laparoscopic power morcellation may be an acceptable therapeutic option that their fibroid(s) may contain unexpected cancerous tissue and that laparoscopic power morcellation may spread the cancer, significantly worsening their prognosis. This population might include some younger women who want to maintain their fertility or women not yet peri-menopausal who wish to keep their uterus after being informed of the risks.

Recommendations for Women:

  • Ask your health care provider to discuss all the options available to treat your condition. There are risks and benefits associated with all medical devices and procedures and you should be aware of them.
  • If your doctor recommends laparoscopic hysterectomy or myomectomy, ask him/her if power morcellation will be performed during your procedure, and to explain why he or she believes it is an appropriate treatment option for you.
  • If you have already undergone a hysterectomy or myomectomy for fibroids, tissue removed during the procedure is typically tested for the presence of cancer. If you were informed these tests were normal and you have no symptoms, routine follow-up with your physician is recommended. Patients with persistent or recurrent symptoms or questions should consult their health care provider.
  • A number of additional surgical treatment options are available for women with symptomatic uterine fibroids including traditional surgical hysterectomy (performed either vaginally or abdominally) and myomectomy, laparoscopic hysterectomy and myomectomy without morcellation, and laparotomy using a smaller incision (minilaparotomy). All treatments carry risk, and you should discuss them thoroughly with your health care provider.

FDA Actions:

The FDA has taken the following actions in light of scientific information that suggests that the use of laparoscopic power morcellators may contribute to the spread and upstaging of unsuspected uterine cancer in women undergoing hysterectomy and myomectomy for fibroids:

  • The FDA conducted a review of published and unpublished scientific literature, including patients operated on from 1980 to 2011 to estimate the prevalence of unsuspected uterine sarcoma and uterine leiomyosarcoma in patients undergoing hysterectomy or myomectomy for presumed benign fibroids (leiomyoma). This analysis led us to believe that the prevalence of unsuspected uterine sarcoma in patients undergoing hysterectomy or myomectomy for presumed benign leiomyoma is 1 in 352 and the prevalence of unsuspected uterine leiomyosarcoma is 1 in 498. Both of these estimates are higher than the clinical community previously understood.
  • Convened a meeting of the Obstetrics and Gynecological Medical Device Advisory Panel in July 2014. The panel discussed patient populations in which laparoscopic power morcellators should not be used, mentioning specifically patients with known or suspected malignancy. The panel also discussed mitigation strategies such as labeling, and suggested that a boxed warning related to the risk of disseminating unsuspected malignancy would be useful.
  • Issued an Immediately In Effect (IIE) guidance that asks manufacturers of new and existing laparoscopic power morcellators to include two contraindications and a boxed warning in their product labeling. This information warns against using laparoscopic power morcellators in the majority of women undergoing myomectomy or hysterectomy and recommends doctors share this information with their patients.
  • Published safety information related to these devices and alternative treatment options for the treatment of fibroids available on its website to help people better understand the risks of laparoscopic power morcellators.

In addition to the most recent contraindications and boxed warning, the FDA continues to consider other steps that may further reduce such risk—such as encouraging innovative ways to better detect uterine cancer and containment systems designed specifically for gynecological surgery.

The FDA will continue to review adverse event reports, peer-reviewed scientific literature, and information from patients, health care providers, gynecologic and surgical professional societies, and medical device manufacturers.



Categories: morcellation: FDA warning, Statistics | Tags: | 6 Comments

“Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance” (Dec 3 Seminar slides)

(May 4) 7 Deborah Mayo  “Ontology & Methodology in Statistical Modeling”Below are the slides from my Rutgers seminar for the Department of Statistics and Biostatistics yesterday, since some people have been asking me for them. The abstract is here. I don’t know how explanatory a bare outline like this can be, but I’d be glad to try and answer questions[i]. I am impressed at how interested in foundational matters I found the statisticians (both faculty and students) to be. (There were even a few philosophers in attendance.) It was especially interesting to explore, prior to the seminar, possible connections between severity assessments and confidence distributions, where the latter are along the lines of Min-ge Xie (some recent papers of his may be found here.)

“Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance”

[i]They had requested a general overview of some issues in philosophical foundations of statistics. Much of this will be familiar to readers of this blog.



Categories: Bayesian/frequentist, Error Statistics, Statistics | 11 Comments

My Rutgers Seminar: tomorrow, December 3, on philosophy of statistics

picture-216-1I’ll be talking about philosophy of statistics tomorrow afternoon at Rutgers University, in the Statistics and Biostatistics Department, if you happen to be in the vicinity and are interested.


Seminar Speaker:     Professor Deborah Mayo, Virginia Tech

Title:           Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance

Time:          3:20 – 4:20pm, Wednesday, December 3, 2014 Place:         552 Hill Center


Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance Getting beyond today’s most pressing controversies revolving around statistical methods, I argue, requires scrutinizing their underlying statistical philosophies.Two main philosophies about the roles of probability in statistical inference are probabilism and performance (in the long-run). The first assumes that we need a method of assigning probabilities to hypotheses; the second assumes that the main function of statistical method is to control long-run performance. I offer a third goal: controlling and evaluating the probativeness of methods. An inductive inference, in this conception, takes the form of inferring hypotheses to the extent that they have been well or severely tested. A report of poorly tested claims must also be part of an adequate inference. I develop a statistical philosophy in which error probabilities of methods may be used to evaluate and control the stringency or severity of tests. I then show how the “severe testing” philosophy clarifies and avoids familiar criticisms and abuses of significance tests and cognate methods (e.g., confidence intervals). Severity may be threatened in three main ways: fallacies of statistical tests, unwarranted links between statistical and substantive claims, and violations of model assumptions.

Categories: Announcement, Statistics | 4 Comments


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: November 2011. I mark in red 3 posts that seem most apt for general background on key issues in this blog.*

  • (11/1) RMM-4:“Foundational Issues in Statistical Modeling: Statistical Model Specification and Validation*” by Aris Spanos, in Rationality, Markets, and Morals (Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”)
  • (11/3) Who is Really Doing the Work?*
  • (11/5) Skeleton Key and Skeletal Points for (Esteemed) Ghost Guest
  • (11/9) Neyman’s Nursery 2: Power and Severity [Continuation of Oct. 22 Post]
  • (11/12) Neyman’s Nursery (NN) 3: SHPOWER vs POWER
  • (11/15) Logic Takes a Bit of a Hit!: (NN 4) Continuing: Shpower (“observed” power) vs Power
  • (11/18) Neyman’s Nursery (NN5): Final Post
  • (11/21) RMM-5: “Low Assumptions, High Dimensions” by Larry Wasserman, in Rationality, Markets, and Morals (Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”) See also my deconstruction of Larry Wasserman.
  • (11/23) Elbar Grease: Return to the Comedy Hour at the Bayesian Retreat
  • (11/28) The UN Charter: double-counting and data snooping
  • (11/29) If you try sometime, you find you get what you need!

*I announced this new, once-a-month feature at the blog’s 3-year anniversary. I will repost and comment on one of the 3-year old posts from time to time. [I’ve yet to repost and comment on the one from Oct. 2011, but will shortly.] For newcomers, here’s your chance to catch-up; for old timers,this is philosophy: rereading is essential!


 Oct. 2011

Sept. 2011 (Within “All She Wrote (so far))












Categories: 3-year memory lane, Bayesian/frequentist, Statistics | Leave a comment

How likelihoodists exaggerate evidence from statistical tests


I insist on point against point, no matter how much it hurts

Have you ever noticed that some leading advocates of a statistical account, say a testing account A, upon discovering account A is unable to handle a certain kind of important testing problem that a rival testing account, account B, has no trouble at all with, will mount an argument that being able to handle that kind of problem is actually a bad thing? In fact, they might argue that testing account B is not a  “real” testing account because it can handle such a problem? You have? Sure you have, if you read this blog. But that’s only a subliminal point of this post.

I’ve had three posts recently on the Law of Likelihood (LL): Breaking the [LL](a)(b)[c], and [LL] is bankrupt. Please read at least one of them for background. All deal with Royall’s comparative likelihoodist account, which some will say only a few people even use, but I promise you that these same points come up again and again in foundational criticisms from entirely other quarters.[i]

An example from Royall is typical: He makes it clear that an account based on the (LL) is unable to handle composite tests, even simple one-sided tests for which account B supplies uniformly most powerful (UMP) tests. He concludes, not that his test comes up short, but that any genuine test or ‘rule of rejection’ must have a point alternative!  Here’s the case (Royall, 1997, pp. 19-20):

[M]edical researchers are interested in the success probability, θ, associated with a new treatment. They are particularly interested in how θ relates to the old treatment’s success probability, believed to be about 0.2. They have reason to hope θ is considerably greater, perhaps 0.8 or even greater. To obtain evidence about θ, they carry out a study in which the new treatment is given to 17 subjects, and find that it is successful in nine.

Let me interject at this point that of all of Stephen Senn’s posts on this blog, my favorite is the one where he zeroes in on the proper way to think about the discrepancy we hope to find (the .8 in this example). (See note [ii])

A standard statistical analysis of their observations would use a Bernouilli (θ) statistical model and test the composite hypotheses H1: θ ≤ 0.2 versus H2: θ > 0.2. That analysis would show that H1 can be rejected in favor of Hat any significance level greater than 0.003, a result that is conventionally taken to mean that the observations are very strong evidence supporting H2 over H1. (Royall, ibid.)

Following Royall’s numbers, the observed success rate is:

m0 = 9/17 = .53, exceeding H1: θ ≤ 0.2 by ~3 standard deviations, as σ / √17 ~ 0.1, yielding significance level ~.003.

So, the observed success rate m0 = .53, “is conventionally taken to mean that the observations are very strong evidence supporting H2 over H1.” (ibid. p. 20) [For a link to an article by Royall, see the references.]

And indeed it is altogether warranted to regard the data as very strong evidence that θ > 0.2—which is precisely what H2 asserts (not fussing with his rather small sample size). In fact, m0 warrants inferring even larger discrepancies, but let’s first see where Royall has stopped in his tracks.[iii]

Royall claims he is unable to allow that m0 = .53 is evidence against the null in the one sided-test we are considering:  H1: θ ≤ 0.2 versus H2: θ > 0.2.

He tells us why in the next paragraph (ibid., p. 20):

But because H1 contains some simple hypotheses that are better supported than some hypotheses in H2 (e.g.,θ = 0.2 is better supported than θ= 0.9 by a likelihood ratio of LR = (0.2/0.9)9(0.8/0.1)8 = 22.2), the law of likelihood does not allow the characterization of these observations as strong evidence for H2 over H1(my emphasis; note I didn’t check his numbers since they hardly matter.)

It appears that Royall views rejecting H1: θ ≤ 0.2 and inferring H2: θ > 0.2 as asserting every parameter point within H2 is more likely than every point in H1! (That strikes me as a highly idiosyncratic meaning.) Whereas, the significance tester just takes it to mean what it says:

to reject H1: θ ≤ 0.2 is to infer some positive discrepancy from .2.

We, who go further, either via severity assessments or confidence intervals, would give discrepancies that were reasonably warranted, as well as those that were tantamount to making great whales out of little guppies (fallacy of rejection)! Conversely, for any discrepancy of interest, we can tell you how well or poorly warranted it is by the data. (The confidence interval theorist would need to supplement the one-sided lower limit which is, strictly speaking, all she gets from the one-sided test. I put this to one side here.)

But Royall is blocked! He’s got to invoke point alternatives, and then give a comparative likelihood ratio (to a point null). Note, too, the point against point requirement is always required (with a couple of exceptions, maybe) for Royall’s comparative likelihoodist; it’s not just in this example where he imagines a far away alternative point of .8. The ordinary significance test is clearly at a great advantage over the point against point hypotheses, given the stated goal here is to probe discrepancies from the null. (See Senn’s point in note [ii] below.)

Not only is the law of likelihood unable to tackle simple one-sided tests, what it allows us to say is rather misleading:

What does it allow us to say? One statement that we can make is that the observations are only weak evidence in favor of θ = 0.8 versus θ = 0.2 (LR = 4). We can also say that they are rather strong evidence supporting θ = 0.5 over any of the values under H1: θ ≤ 0.2 (LR > 89), and at least moderately strong evidence for θ = 0.5 over any value θ > 0.8 (LR) > 22). …Thus we can say that the observation of nine successes in 17 trials is rather strong evidence supporting success rates of about 0.5 over the rate 0.2 that is associated with the old treatment, and at least moderately strong evidence for the intermediate rates versus the rates of 0.8 or greater that we were hoping to achieve. (Royall 1997, p. 20, emphasis is mine)

But this is scarcely “rather strong evidence supporting success rates of about 0.5” over the old treatment. What confidence level would you be using if you inferred m0 is evidence that θ > 0.5? Approximately .5. (It’s the typical comparative likelihood move of favoring the claim that the population value equals the observed value. (*See comments.)

Royall”s “weak evidence in favor of θ = 0.8 versus θ = 0.2 (LR = 4)” fails to convey that there is rather horrible warrant for inferring θ = 0.8–associated with something like 99% error probability! (It’s outside the 2-standard deviation confidence interval, is it not?)

We significance testers do find strong evidence for discrepancies in excess of .3 (~.97 severity or lower confidence level) and decent evidence of excesses of .4 (~.84 severity or lower confidence level).  And notice that all of these assertions are claims of evidence of positive discrepancies from the null H1: θ ≤ 0.2. In short, at best (if we are generous in our reading, and insist on confidence levels at least .5), Royall is rediscovering what the significance tester automatically says in rejecting the null with the data!

His entire analysis is limited to giving a series of reports as to which parameter values the data are comparatively closer to. As I already argued, I regard such an account as bankrupt as an account of inference. It fails to control probabilities of misleading interpretations of data in general, and precludes comparing the warrant for a single H by two data sets x, y. In this post, my aim is different. It is Royall, and some fellow likelihoodists, who lodge the criticism because we significance testers operate with composite alternatives. My position is that dealing with composite alternatives is crucial, and that we succeed swimmingly, while Royall is barely treading water. He will allow much stronger evidence than is warranted in favor of members of H2. Ironically, an analogous move is advocated by those who raise the riot act against P-values for exaggerating evidence against a null! [iv]

Elliott Sober, reporting on the Royall road of likelihoodism, remarks:

The fact that significance tests don’t contrast the null hypothesis with alternatives suffices to show that they do not provide a good rule for rejection. (Sober 2008, 56) 

But there is an alternative, it’s just not limited to a point, the highly artificial case we rarely are involved in testing. Perhaps they are more common in biology. I will assume here that Elliott Sober is mainly setting out some of Royall’s criticisms for the reader, rather than agreeing with them.slide11

According to the law of likelihood, as Sober observes, whether the data are evidence against the null hypothesis depends on which point alternative hypothesis you consider. Does he really want to say that so long as you can identify an alternative that is less likely given the data than is the null, then the data are “evidence in favor of the null hypothesis, not evidence against it” (Sober, 56). Is this a good thing? What about all the points in between?  The significance test above exhausts the parameter space, as do all N-P tests.[v]


[i] I know because, remember, I’m writing a book that’s close to being done.

[ii] “It would be ludicrous to maintain that [the treatment] cannot have an effect which, while greater than nothing, is less than the clinically relevant difference.” (Senn 2008, p. 201)

[iii] Note: a rejection at the 2-standard deviation cut-off would be ~M* = .2 + 2(.1) = .4.

[iv] That is, they allow the low P-value to count as evidence for alternatives we would regard as unwarranted. But I’ll come back to that another time.

[v] In this connection, do we really want to say, about a null with teeny tiny likelihood, that there’s evidence for it, so long as there is a rival, miles away, in the other direction? (Do I feel the J-G-L Paradox coming on? Yes! It’s the next topic in Sober p.56)


Royall, R. (2004), “The Likelihood Paradigm for Statistical Evidence” 119-138; Rejoinder 145-151, in M. Taper, and S. Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. Chicago: University of Chicago Press.

Royall, R.(1997)  Statistical Evidence, A Likelihood Paradigm. Chapman and Hall.

Senn, S. (2007), Statistical Issues in Drug Development, Wiley.

Sober, E. (2008). Evidence and Evolution. CUP.

Categories: law of likelihood, Richard Royall, Statistics | Tags: | 18 Comments

Msc Kvetch: “You are a Medical Statistic”, or “How Medical Care Is Being Corrupted”

1119OPEDmerto-master495A NYT op-ed the other day,”How Medical Care Is Being Corrupted” (by Pamela Hartzband and Jerome Groopman, physicians on the faculty of Harvard Medical School), gives a good sum-up of what I fear is becoming the new normal, even under so-called “personalized medicine”. 

WHEN we are patients, we want our doctors to make recommendations that are in our best interests as individuals. As physicians, we strive to do the same for our patients.

But financial forces largely hidden from the public are beginning to corrupt care and undermine the bond of trust between doctors and patients. Insurers, hospital networks and regulatory groups have put in place both rewards and punishments that can powerfully influence your doctor’s decisions.

Contracts for medical care that incorporate “pay for performance” direct physicians to meet strict metrics for testing and treatment. These metrics are population-based and generic, and do not take into account the individual characteristics and preferences of the patient or differing expert opinions on optimal practice.

For example, doctors are rewarded for keeping their patients’ cholesterol and blood pressure below certain target levels. For some patients, this is good medicine, but for others the benefits may not outweigh the risks. Treatment with drugs such as statins can cause significant side effects, including muscle pain and increased risk of diabetes. Blood-pressure therapy to meet an imposed target may lead to increased falls and fractures in older patients.

Physicians who meet their designated targets are not only rewarded with a bonus from the insurer but are also given high ratings on insurer websites. Physicians who deviate from such metrics are financially penalized through lower payments and are publicly shamed, listed on insurer websites in a lower tier. Further, their patients may be required to pay higher co-payments.

These measures are clearly designed to coerce physicians to comply with the metrics. Thus doctors may feel pressured to withhold treatment that they feel is required or feel forced to recommend treatment whose risks may outweigh benefits.

It is not just treatment targets but also the particular medications to be used that are now often dictated by insurers. Commonly this is done by assigning a larger co-payment to certain drugs, a negative incentive for patients to choose higher-cost medications. But now some insurers are offering a positive financial incentive directly to physicians to use specific medications. For example, WellPoint, one of the largest private payers for health care, recently outlined designated treatment pathways for cancer and announced that it would pay physicians an incentive of $350 per month per patient treated on the designated pathway.

This has raised concern in the oncology community because there is considerable debate among experts about what is optimal. Dr. Margaret A. Tempero of the National Comprehensive Cancer Network observed that every day oncologists saw patients for whom deviation from treatment guidelines made sense: “Will oncologists be reluctant to make these decisions because of an adverse effects on payments?” Further, some health care networks limit the ability of a patient to get a second opinion by going outside the network. The patient is financially penalized with large co-payments or no coverage at all. Additionally, the physician who refers the patient out of network risks censure from the network administration.

When a patient asks “Is this treatment right for me?” the doctor faces a potential moral dilemma. How should he answer if the response is to his personal detriment? Some health policy experts suggest that there is no moral dilemma. They argue that it is obsolete for the doctor to approach each patient strictly as an individual; medical decisions should be made on the basis of what is best for the population as a whole.

Medicine has been appropriately criticized for its past paternalism, where doctors imposed their views on the patient. In recent years, however, the balance of power has shifted away from the physician to the patient, in large part because of access to clinical information on the web.

In truth, the power belongs to the insurers and regulators that control payment. There is now a new paternalism, largely invisible to the public, diminishing the autonomy of both doctor and patient.

In 2010, Congress passed the Physician Payments Sunshine Act to address potential conflicts of interest by making physician financial ties to pharmaceutical and device companies public on a federal website. We propose a similar public website to reveal the hidden coercive forces that may specify treatments and limit choices through pressures on the doctor.

Medical care is not just another marketplace commodity. Physicians should never have an incentive to override the best interests of their patients.

Categories: PhilStat/Med, Statistics | Tags: | 8 Comments

Erich Lehmann: Statistician and Poet

Erich Lehmann 20 November 1917 – 12 September 2009

Erich Lehmann                       20 November 1917 –              12 September 2009

Memory Lane 1 Year (with update): Today is Erich Lehmann’s birthday. The last time I saw him was at the Second Lehmann conference in 2004, at which I organized a session on philosophical foundations of statistics (including David Freedman and D.R. Cox).

I got to know Lehmann, Neyman’s first student, in 1997.  One day, I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that).  He told me he was sitting in a very large room at an ASA meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, dark table sat just one book, all alone, shiny red.  He said he wondered if it might be of interest to him!  So he walked up to it….  It turned out to be my Error and the Growth of Experimental Knowledge (1996, Chicago), which he reviewed soon after. Some related posts on Lehmann’s letter are here and here.

That same year I remember having a last-minute phone call with Erich to ask how best to respond to a “funny Bayesian example” raised by Colin Howson. It is essentially the case of Mary’s positive result for a disease, where Mary is selected randomly from a population where the disease is very rare. See for example here. (It’s just like the case of our high school student Isaac). His recommendations were extremely illuminating, and with them he sent me a poem he’d written (which you can read in my published response here*). Aside from being a leading statistician, Erich had a (serious) literary bent.

Juliet Shafer, Erich Lehmann, D. Mayo

Juliet Shafer, Erich Lehmann, D. Mayo

The picture on the right was taken in 2003 (by A. Spanos).

(2014 update): It was at this meeting that I proposed organizing a session for the 2004 Erich Lehmann Conference that would focus on “Philosophy of Statistics”. He encouraged me to do so. I invited David Freedman (who accepted), and then had the wild idea of inviting Sir David Cox. He too accepted! (Cox and I later combined our contributions into Mayo and Cox 2006).

Mayo, D. G (1997a), “Response to Howson and Laudan,” Philosophy of Science 64: 323-333.

Mayo, D.G. and Cox, D. R. (2006) “Frequentists Statistics as a Theory of Inductive Inference,” Optimality: The Second Erich L. Lehmann Symposium (ed. J. Rojo), Lecture Notes-Monograph series, Institute of Mathematical Statistics (IMS), Vol. 49: 77-97.


(Selected) Books by Lehmann)

  • Testing Statistical Hypotheses, 1959
  • Basic Concepts of Probability and Statistics, 1964, co-author J. L. Hodges
  • Elements of Finite Probability, 1965, co-author J. L. Hodges
  • Lehmann, Erich L.; With the special assistance of H. J. M. D’Abrera (2006). Nonparametrics: Statistical methods based on ranks (Reprinting of 1988 revision of 1975 Holden-Day ed.). New York: Springer. pp. xvi+463. ISBN 978-0-387-35212-1. MR 2279708.
  • Theory of Point Estimation, 1983
  • Elements of Large-Sample Theory (1988). New York: Springer Verlag.
  • Reminiscences of a Statistician, 2007, ISBN 978-0-387-71596-4
  • Fisher, Neyman, and the Creation of Classical Statistics, 2011, ISBN 978-1-4419-9499-8 [published posthumously]

Articles (3 of very many)

Categories: highly probable vs highly probed, phil/history of stat, Sir David Cox, Spanos, Statistics | Tags: , | Leave a comment

Lucien Le Cam: “The Bayesians Hold the Magic”

lecamToday is the birthday of Lucien Le Cam (Nov. 18, 1924-April 25,2000): Please see my updated 2013 post on him.


Categories: Bayesian/frequentist, Statistics | Leave a comment

Why the Law of Likelihood is bankrupt–as an account of evidence



There was a session at the Philosophy of Science Association meeting last week where two of the speakers, Greg Gandenberger and Jiji Zhang had insightful things to say about the “Law of Likelihood” (LL)[i]. Recall from recent posts here and here that the (LL) regards data x as evidence supporting H1 over H0   iff

Pr(x; H1) > Pr(x; H0).

On many accounts, the likelihood ratio also measures the strength of that comparative evidence. (Royall 1997, p.3). [ii]

H0 and H1 are statistical hypothesis that assign probabilities to the random variable X taking value x.  As I recall, the speakers limited  H1 and H0  to simple statistical hypotheses (as Richard Royall generally does)–already restricting the account to rather artificial cases, but I put that to one side. Remember, with likelihoods, the data x are fixed, the hypotheses vary.

1. Maximally likely alternatives. I didn’t really disagree with anything the speakers said. I welcomed their recognition that a central problem facing the (LL) is the ease of constructing maximally likely alternatives: so long as Pr(x; H0) < 1, a maximum likely alternative H1 would be evidentially “favored”. There is no onus on the likelihoodist to predesignate the rival, you are free to search, hunt, post-designate and construct a best (or better) fitting rival. If you’re bothered by this, says Royall, then this just means the evidence disagrees with your prior beliefs.

After all, Royall famously distinguishes between evidence and belief (recall the evidence-belief-action distinction), and these problematic cases, he thinks, do not vitiate his account as an account of evidence. But I think they do! In fact, I think they render the (LL) utterly bankrupt as an account of evidence. Here are a few reasons. (Let me be clear that I am not pinning Royall’s defense on the speakers[iii], so much as saying it came up in the general discussion[iv].)

2. Appealing to prior beliefs to avoid the problem of maximally likely alternatives. Recall Royall’s treatment of maximally likely alternatives in the case of turning over the top card of a shuffled deck, and finding an ace of diamonds:

According to the law of likelihood, the hypothesis that the deck consists of 52 aces of diamonds (H1) is better supported than the hypothesis that the deck is normal (HN) [by the factor 52]…Some find this disturbing.

Not Royall.

Furthermore, it seems unfair; no matter what card is drawn, the law implies that the corresponding trick-deck hypothesis (52 cards just like the one drawn) is better supported than the normal-deck hypothesis. Thus even if the deck is normal we will always claim to have found strong evidence that it is not. (Royall 1997, pp. 13-14)

To Royall, it only shows a confusion between evidence and belief. If you’re not convinced the deck has 52 aces of diamonds “it does not mean that the observation is not strong evidence in favor of H1 versus HN.” It just wasn’t strong enough to overcome your prior beliefs.

The relation to Bayesian inference, as Royall notes, is that the likelihood ratio “that the law [LL] uses to measure the strength of the evidence, is precisely the factor by which the observation X = x would change the probability ratio” Pr(H0) /Pr(H1). (Royall 1997, p. 6). So, if you don’t think the maximally likely alternative is palatable, you can get around it by giving it a suitably low prior degree of probability. But the more likely hypothesis is still favored on grounds of evidence, according to this view. (Do Bayesians agree?)

When this “appeal to beliefs” solution came up in the discussion at this session, some suggested that you should simply refrain from proposing implausible maximally likely alternatives! I think this misses the crucial issues.

3. What’s wrong with the “appeal to beliefs” solution to the (LL) problem: First, there are many cases where we want to distinguish the warrant for one and the same hypothesis according to whether it was constructed post hoc to fit the data or predesignated. The “use constructed” hypothesis H could well be plausible, but we’d still want to distinguish the evidential credit H deserves in the two cases, and appealing to priors does not help.

Second, to suppose one can be saved from the unpleasant consequences of the (LL) by the deus ex machina of a prior is to misidentify what the problem really is—at least when there is a problem (and not all data-dependent alternatives are problematic—see my double-counting papers, e.g., here). In the problem cases, the problem is due to the error probing capability of the overall testing procedure being diminished. You are not “sincerely trying”, as Popper puts it, to find flaws with claims, but instead you are happily finding evidence in favor of a well-fitting hypothesis that you deliberately construct— unless your intuitions tell you it is unbelievable. So now the task that was supposed to be performed by an account of statistical evidence is not being performed by it at all. It has to be performed by you, and you are the most likely one to follow your preconceived opinions and pet theories.You are the one in danger of confirmation bias. If your account of statistical evidence won’t supply tools to help you honestly criticize yourself (let alone allow the rest of us to fraud-bust your inference), then it comes up short in an essential way.

4. The role of statistical philosophy in philosophy of science. I recall having lunch with Royall when we first met (at an ecology conference around 1998) and trying to explain, “You see, in philosophy, we look to statistical accounts in order to address general problems about scientific evidence, inductive inference, and hypothesis testing. And one of the classic problems we wrestle with is that data underdetermine hypotheses; there are many hypotheses we can dream up to “fit” the data. We look to statistical philosophy to get insights into warranted inductive inference, to distinguish ad hoc hypotheses, confirmation biases, etc. We want to bring out the problem with that Texas “sharpshooter” who fires some shots into the side of a barn and then cleverly paints a target so that most of his hits are in the bull’s eye, and then takes this as evidence of his marksmanship. So, the problem with the (LL) is that it appears to license rather than condemn some of these pseudoscientific practices.”

His answer, as near as I can recall, was that he was doing statistics and didn’t know about these philosophical issues. Had it been current times, perhaps I could have been more effective in pointing up the “reproducibility crisis,” “big data,” and “fraud-busting”. Anyway, he wouldn’t relent, even on stopping rules.

But his general stance is one I often hear: We can take into account those tricky moves later on in our belief assignments. The (LL) just gives a measure of the evidence in the data. But this IS later on. Since these gambits can completely destroy your having any respectable evidence whatsoever, you can’t say “the evidence is fine, I’ll correct things with beliefs later on”.

Besides, the influence of the selection effects is not on the believability of H but rather on the capability of the test to have unearthed errors. Their influence is on the error probabilities of the test procedure, and yet the (LL) is conditional on the actual outcome.

5. Why does the likelihoodist not appeal to error probabilities to solve his problem? The answer is that he is convinced that such an appeal is necessarily limited to controlling erroneous actions in the long run. That is why Royall rejects it (claiming it is only relevant for “action”), and only a few of us here in exile have come around to mounting a serious challenge to this extreme behavioristic rationale for error statistical methods. Fisher, E. Pearson, and even Neyman some of the time, rejected such a crass behavioristic rational, as have Birnbaum, Cox, Kempthorne and many other frequentists.(See this post on Pearson.) 

Yet, I have just shown that the criticisms based on error probabilities have scarcely anything to do with the long run, but have everything to do with whether you have done a good job providing evidence for your favored hypothesis right now.

“A likelihood ratio may be a criterion of relative fit but it “is still necessary to determine its sampling distribution in order to control the error involved in rejecting a true hypothesis, because a knowledge of [likelihoods] alone is not adequate to insure control of this error (Pearson and Neyman, 1930, p. 106).

Pearson and Neyman should have been explicit as to how this error control is essential for a strong argument from coincidence in the case at hand.

Ironically, a great many critical discussions of frequentist error statistical inference (significance tests, confidence intervals, P-values, power, etc.) start with assuming “the law (LL)”, when in fact attention to the probativeness of tests by means of the relevant sampling distribution is just the cure the likelihoodist needs.

6. Is it true that all attempts to say whether x is good or terrible evidence for H are utterly futile? Royall says they are, that only comparing a fixed x to H versus some alternative H’ can work.

[T]he likelihood view is that observations [like x and y]…have no valid interpretation as evidence in relation to the single hypothesis H.” (Royall 2004, p. 149).

But we should disagree. We most certainly can say that x is quite lousy evidence for H, if nothing (or very little) has been done to find flaws in H, or if I constructed an H to agree swimmingly with x, but by means that make it extremely easy to achieve, even if H is false.

Finding a non-statistically significant difference on the tested factor, I find a subgroup or post-data endpoint that gives “nominal” statistical significance. Whether Hwas pre-designated or post-designated makes no difference to the likelihood ratio, and the prior given to Hwould be the same whether it was pre- or post-designated. The post-designated alternative might be highly plausible, but I would still want to say that selection effects, cherry-picking, and generally “trying and trying again” alter the stringency of the test. This altered capacity in the test’s picking up on sources of bias and unreliability has no home in the (LL) account of evidence. That is why I say it fails in an essential way, as an account of evidence.

7. So what does the Bayesian say about the (LL)? I take it the Bayesian would deny that the comparative evidence account given by the (LL) is adequate. LRs are important, of course, but there are also prior probability assignments to hypotheses. Yet that would seem to get us right back to Royall’s problem that we have been discussing here.

In this connection, ponder (v).

8. Background. You may wish to review “Breaking the Law! (of likelihood) (A) and (B)”, and Breaking the Royall Law of Likelihood ©. A relevant paper by Royall is here.



[i] The PSA program is here: Program.pdf. Zhang and Gandenberger are both excellent young philosophers of science who engage with real statistical methods.

[ii] For a full statement of the [LL] according to Royall. “If hypothesis A implies that the probability that a random variable X takes the value x is pA(x), while hypothesis B implies that the probability is pB(x), then the observation X = x is evidence supporting A over B if and only if pA(x) > pB(x), and the likelihood ratio, pA(x)/ pB(x), measures the strength of that evidence.” (Royall, 2004, p. 122)

“This says simply that if an event is more probable under hypothesis A than hypothesis B, then the occurrence of that event is evidence supporting A over B––the hypothesis that did the better job of predicting the event is better supported by its occurrence.” Moreover, “the likelihood ratio, is the exact factor by which the probability ratio [ratio of priors in A and B] is changed. (ibid. 123)

Aside from denying the underlined sentence,can a Bayesian violate the [LL]? In comments to this first post, it was argued that they can.

[iii] In fact, Gandenberger’s paper was about why he is not a “methodological likelihoodist” and Zhang was only dealing with a specific criticism of (LL) by Forster.  [Gandenberger’s blog:]

[iv] Granted, the speakers did not declare Royall’s way out of the problem leads to bankruptcy, as I would have wanted them to.

[v] I’m placing this here for possible input later on.  Royall considers the familiar example where a positive diagnostic result is more probable under “disease” than “no disease”. If the prior probability for disease is sufficiently small, it can result in a low posterior for disease.  For Royall, “to interpret the positive test result as evidence that the subject does not have the disease is never appropriate––it is simply and unequivocally wrong. Why is it wrong?” (2004, 122). Because it violates the (LL). This gets to the contrast between “Bayes boosts” and high posterior again. I take it the Bayesian response would be to agree, but still deny there is evidence for disease. Yes? [This is like our example of Isaac who passes many tests of high school readiness, so the LR in favor of his being ready is positive. However, having been randomly selected from “Fewready” town, the posterior for his readiness is still low (despite its having increased).] Severity here seems to be in sync with the B-boosters,at least in direction of evidence.



Mayo, D. G. (2014) On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29, no. 2, 227-266.

Mayo, D. G. (2004). “An Error-Statistical Philosophy of Evidence,” 79-118, in M. Taper and S. Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. Chicago: University of Chicago Press.

Pearson, E.S. & Neyman, J. (1930). On the problem of two samples. In J. Neyman and E.S. Pearson, 1967, Joint Statistical Papers, (99-115). Cambridge: CUP.

Royall, R. (1997) Statistical Evidence: A likelihood paradigm, Chapman and Hall, CRC Press.

Royall, R. (2004), “The Likelihood Paradigm for Statistical Evidence” 119-138; Rejoinder 145-151, in M. Taper, and S. Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. Chicago: University of Chicago Press.




Categories: highly probable vs highly probed, law of likelihood, Richard Royall, Statistics | 62 Comments

A biased report of the probability of a statistical fluke: Is it cheating?

cropped-qqqq.jpg One year ago I reblogged a post from Matt Strassler, “Nature is Full of Surprises” (2011). In it he claims that

[Statistical debate] “often boils down to this: is the question that you have asked in applying your statistical method the most even-handed, the most open-minded, the most unbiased question that you could possibly ask?

It’s not asking whether someone made a mathematical mistake. It is asking whether they cheated — whether they adjusted the rules unfairly — and biased the answer through the question they chose…”

(Nov. 2014):I am impressed (i.e., struck by the fact) that he goes so far as to call it “cheating”. Anyway, here is the rest of the reblog from Strassler which bears on a number of recent discussions:

“…If there are 23 people in a room, the chance that two of them have the same birthday is 50 percent, while the chance that two of them were born on a particular day, say, January 1st, is quite low, a small fraction of a percent. The more you specify the coincidence, the rarer it is; the broader the range of coincidences at which you are ready to express surprise, the more likely it is that one will turn up.

Humans are notoriously incompetent at estimating these types of probabilities… which is why scientists (including particle physicists), when they see something unusual in their data, always try to quantify the probability that it is a statistical fluke — a pure chance event. You would not want to be wrong, and celebrate your future Nobel prize only to receive instead a booby prize. (And nature gives out lots and lots of booby prizes.) So scientists, grabbing their statistics textbooks and appealing to the latest advances in statistical techniques, compute these probabilities as best they can. Armed with these numbers, they then try to infer whether it is likely that they have actually discovered something new or not.

And on the whole, it doesn’t work. Unless the answer is so obvious that no statistical argument is needed, the numbers typically do not settle the question.

Despite this remark, you mustn’t think I am arguing against doing statistics. One has to do something better than guessing. But there is a reason for the old saw: “There are three types of falsehoods: lies, damned lies, and statistics.” It’s not that statistics themselves lie, but that to some extent, unless the case is virtually airtight, you can almost always choose to ask a question in such a way as to get any answer you want. … [For instance, in 1991 the volcano Pinatubo in the Philippines had its titanic eruption while a hurricane (or `typhoon’ as it is called in that region) happened to be underway. Oh, and the collapse of Lehman Brothers on Sept 15, 2008 was followed within three days by the breakdown of the Large Hadron Collider (LHC) during its first week of running… Coincidence?  I-think-so.] One can draw completely different conclusions, both of them statistically sensible, by looking at the same data from two different points of view, and asking for the statistical answer to two different questions.” (my emphasis) Continue reading

Categories: Higgs, spurious p values, Statistics | 7 Comments

“Statistical Flukes, the Higgs Discovery, and 5 Sigma” at the PSA

We had an excellent discussion at our symposium yesterday: “How Many Sigmas to Discovery? Philosophy and Statistics in the Higgs Experiments” with Robert Cousins, Allan Franklin and Kent Staley. Slides from my presentation, “Statistical Flukes, the Higgs Discovery, and 5 Sigma” are posted below (we each only had 20 minutes, so this is clipped,but much came out in the discussion). Even the challenge I read about this morning as to what exactly the Higgs researchers discovered (and I’ve no clue if there’s anything to the idea of a “techni-higgs particle”) — would not invalidate* the knowledge of the experimental effects severely tested.


*Although, as always, there may be a reinterpretation of the results. But I think the article is an isolated bit of speculation. I’ll update if I hear more.

Categories: Higgs, highly probable vs highly probed, Statistics | 26 Comments

Oxford Gaol: Statistical Bogeymen

Memory Lane: 3 years ago. Oxford Jail (also called Oxford Castle) is an entirely fitting place to be on (and around) Halloween! Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba! (It is now a boutique hotel, though many of the rooms are still too jail-like for me.)  My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should, I think, be shelved, not that names should matter.

In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory.  Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended.   But for (most) Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort)Unknown-2

Criticisms then follow readily: the form of one or both:

  • Error probabilities do not supply posterior probabilities in hypotheses, interpreted as if they do (and some say we just can’t help it), they lead to inconsistencies
  • Methods with good long-run error rates can give rise to counterintuitive inferences in particular cases.
  • I have proposed an alternative philosophy that replaces these tenets with different ones:
  • the role of probability in inference is to quantify how reliably or severely claims (or discrepancies from claims) have been tested
  • the severity goal directs us to the relevant error probabilities, avoiding the oft-repeated statistical fallacies due to tests that are overly sensitive, as well as those insufficiently sensitive to particular errors.
  • Control of long run error probabilities, while necessary is not sufficient for good tests or warranted inferences.

Continue reading

Categories: 3-year memory lane, Bayesian/frequentist, Philosophy of Statistics, Statistics | Tags: , | 30 Comments


Hand writing a letter with a goose feather

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: October 2011 (I mark in red 3 posts that seem most apt for general background on key issues in this blog*)

*I indicated I’d begin this new, once-a-month feature at the 3-year anniversary. I will repost and comment on one each month. (I might repost others that I do not comment on, as Oct. 31, 2014). For newcomers, here’s your chance to catch-up; for old timers, this is philosophy: rereading is essential!

Categories: 3-year memory lane, blog contents, Statistics | Leave a comment

September 2014: Blog Contents

metablog old fashion typewriterSeptember 2014: Error Statistics Philosophy
Blog Table of Contents 

Compiled by Jean A. Miller

  • (9/30) Letter from George (Barnard)
  • (9/27) Should a “Fictionfactory” peepshow be barred from a festival on “Truth and Reality”? Diederik Stapel says no (rejected post)
  • (9/23) G.A. Barnard: The Bayesian “catch-all” factor: probability vs likelihood
  • (9/21) Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”
  • (9/18) Uncle Sam wants YOU to help with scientific reproducibility!
  • (9/15) A crucial missing piece in the Pistorius trial? (2): my answer (Rejected Post)
  • (9/12) “The Supernal Powers Withhold Their Hands And Let Me Alone”: C.S. Peirce
  • (9/6) Statistical Science: The Likelihood Principle issue is out…!
  • (9/4) All She Wrote (so far): Error Statistics Philosophy Contents-3 years on
  • (9/3) 3 in blog years: Sept 3 is 3rd anniversary of





Categories: Announcement, blog contents, Statistics | Leave a comment

PhilStat/Law: Nathan Schachtman: Acknowledging Multiple Comparisons in Statistical Analysis: Courts Can and Must



The following is from Nathan Schachtman’s legal blog, with various comments and added emphases (by me, in this color). He will try to reply to comments/queries.

“Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses”

Nathan Schachtman, Esq., PC * October 14th, 2014

In excluding the proffered testimony of Dr. Anick Bérard, a Canadian perinatal epidemiologist in the Université de Montréal, the Zoloft MDL trial court discussed several methodological shortcomings and failures, including Bérard’s reliance upon claims of statistical significance from studies that conducted dozens and hundreds of multiple comparisons.[i] The Zoloft MDL court was not the first court to recognize the problem of over-interpreting the putative statistical significance of results that were one among many statistical tests in a single study. The court was, however, among a fairly small group of judges who have shown the needed statistical acumen in looking beyond the reported p-value or confidence interval to the actual methods used in a study[1].



A complete and fair evaluation of the evidence in situations as occurred in the Zoloft birth defects epidemiology required more than the presentation of the size of the random error, or the width of the 95 percent confidence interval.  When the sample estimate arises from a study with multiple testing, presenting the sample estimate with the confidence interval, or p-value, can be highly misleading if the p-value is used for hypothesis testing.  The fact of multiple testing will inflate the false-positive error rate. Dr. Bérard ignored the context of the studies she relied upon. What was noteworthy is that Bérard encountered a federal judge who adhered to the assigned task of evaluating methodology and its relationship with conclusions.

*   *   *   *   *   *   *

There is no unique solution to the problem of multiple comparisons. Some researchers use Bonferroni or other quantitative adjustments to p-values or confidence intervals, whereas others reject adjustments in favor of qualitative assessments of the data in the full context of the study and its methods. See, e.g., Kenneth J. Rothman, “No Adjustments Are Needed For Multiple Comparisons,” 1 Epidemiology 43 (1990) (arguing that adjustments mechanize and trivialize the problem of interpreting multiple comparisons). Two things are clear from Professor Rothman’s analysis. First for someone intent upon strict statistical significance testing, the presence of multiple comparisons means that the rejection of the null hypothesis cannot be done without further consideration of the nature and extent of both the disclosed and undisclosed statistical testing. Rothman, of course, has inveighed against strict significance testing under any circumstance, but the multiple testing would only compound the problem.

Second, although failure to adjust p-values or intervals quantitatively may be acceptable, failure to acknowledge the multiple testing is poor statistical practice. The practice is, alas, too prevalent for anyone to say that ignoring multiple testing is fraudulent, and the Zoloft MDL court certainly did not condemn Dr. Bérard as a fraudfeasor[2]. [emphasis mine]

I’m perplexed by this mixture of stances. If you don’t mention the multiple testing for which it is acceptable not to adjust, then you’re guilty of poor statistical practice; but its “too prevalent for anyone to say that ignoring multiple testing is fraudulent”. This appears to claim it’s poor statistical practice if you fail to mention your results are due to multiple testing, but “ignoring multiple testing” (which could mean failing to adjust or, more likely, failing to mention it) is not fraudulent. Perhaps, it’s a questionable research practice QRP. It’s back to “50 shades of grey between QRPs and fraud.”

  […read his full blogpost here]

Previous cases have also acknowledged the multiple testing problem. In litigation claims for compensation for brain tumors for cell phone use, plaintiffs’ expert witness relied upon subgroup analysis, which added to the number of tests conducted within the epidemiologic study at issue. Newman v. Motorola, Inc., 218 F. Supp. 2d 769, 779 (D. Md. 2002), aff’d, 78 Fed. App’x 292 (4th Cir. 2003). The trial court explained:

“[Plaintiff’s expert] puts overdue emphasis on the positive findings for isolated subgroups of tumors. As Dr. Stampfer explained, it is not good scientific methodology to highlight certain elevated subgroups as significant findings without having earlier enunciated a hypothesis to look for or explain particular patterns, such as dose-response effect. In addition, when there is a high number of subgroup comparisons, at least some will show a statistical significance by chance alone.”

I’m going to require, as part of its meaning, that a statistically significant difference not be one due to “chance variability” alone. Then to avoid self contradiction, this last sentence might be put as follows: “when there is a high number of subgroup comparisons, at least some will show purported or nominal or unaudited statistical significance by chance alone. [Which term do readers prefer?] If one hunts down one’s hypothesized comparison in the data, then the actual p-value will not equal, and will generally be greater than, the nominal or unaudited p-value.”

So, I will insert “nominal” where needed below (in red).

Texas Sharpshooter fallacy

Id. And shortly after the Supreme Court decided Daubert, the Tenth Circuit faced the reality of data dredging in litigation, and its effect on the meaning of “significance”:

“Even if the elevated levels of lung cancer for men had been [nominally] statistically significant a court might well take account of the statistical “Texas Sharpshooter” fallacy in which a person shoots bullets at the side of a barn, then, after the fact, finds a cluster of holes and draws a circle around it to show how accurate his aim was. With eight kinds of cancer for each sex there would be sixteen potential categories here around which to “draw a circle” to show a [nominally] statistically significant level of cancer. With independent variables one would expect one statistically significant reading in every twenty categories at a 95% confidence level purely by random chance.”

The Texas sharpshooter fallacy is one of my all time favorites. One purports to be testing the accuracy of his aim, when in fact that is not the process that gave rise to the impressive-looking (nominal) cluster of hits. The results do not warrant inferences about his ability to accurately hit a target, since that hasn’t been well-probed. Continue reading

Categories: P-values, PhilStat Law, Statistics | 12 Comments

BREAKING THE (Royall) LAW! (of likelihood) (C)



With this post, I finally get back to the promised sequel to “Breaking the Law! (of likelihood) (A) and (B)” from a few weeks ago. You might wish to read that one first.* A relevant paper by Royall is here.

Richard Royall is a statistician1 who has had a deep impact on recent philosophy of statistics by giving a neat proposal that appears to settle disagreements about statistical philosophy! He distinguishes three questions:

  • What should I believe?
  • How should I act?
  • Is this data evidence of some claim? (or How should I interpret this body of observations as evidence?)

It all sounds quite sensible– at first–and, impressively, many statisticians and philosophers of different persuasions have bought into it. At least they appear willing to go this far with him on the 3 questions.

How is each question to be answered? According to Royall’s commandments writings, what to believe is captured by Bayesian posteriors; how to act, by a behavioristic, N-P long-run performance. And what method answers the evidential question? A comparative likelihood approach. You may want to reject all of them (as I do),2 but just focus on the last.

Remember with likelihoods, the data x are fixed, the hypotheses vary. A great many critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with “the law”. But I fail to see why we should obey it.

To begin with, a report of comparative likelihoods isn’t very useful: H might be less likely than H’, given x, but so what? What do I do with that information? It doesn’t tell me I have evidence against or for either.3 Recall, as well, Hacking’s points here about the variability in the meanings of a likelihood ratio across problems. Continue reading

Categories: law of likelihood, Richard Royall, Statistics | 41 Comments

Diederik Stapel hired to teach “social philosophy” because students got tired of success stories… or something (rejected post)

Oh My*.images-16

(“But I can succeed as a social philosopher”)

The following is from Retraction Watch. UPDATE: OCT 10, 2014**

Diederik Stapel, the Dutch social psychologist and admitted data fabricator — and owner of 54 retraction notices — is now teaching at a college in the town of Tilburg [i].

According to Omroep Brabant, Stapel was offered the job as a kind of adjunct at Fontys Academy for Creative Industries to teach social philosophy. The site quotes a Nick Welman explaining the rationale for hiring Stapel (per Google Translate):

“It came about because students one after another success story were told from the entertainment industry, the industry which we educate them .”

The students wanted something different.

“They wanted to also focus on careers that have failed. On people who have fallen into a black hole, acquainted with the dark side of fame and success.”

Last month, organizers of a drama festival in The Netherlands cancelled a play co-written by Stapel.

I really think Dean Bon puts the rationale most clearly of all.

…A letter from the school’s dean, Pieter Bon, adds:

We like to be entertained and the length of our lives increases. We seek new ways in which to improve our health and we constantly look for new ways to fill our free time. Fashion and looks are important to us; we prefer sustainable products and we like to play games using smart gadgets. This is why Fontys Academy for Creative Industries exists. We train people to create beautiful concepts, exciting concepts, touching concepts, concepts to improve our quality of life. We train them for an industry in which creativity is of the highest value to a product or service. We educate young people who feel at home in the (digital) world of entertainment and lifestyle, and understand that creativity can also mean business. Creativity can be marketed, it’s as simple as that.

We’re sure Prof. Stapel would agree.

[i] Fontys describes itself thusly: Fontys Academy for Creative Industries (Fontys ACI) in Tilburg has 2500 students working towards a bachelor of Business Administration (International Event, Music & Entertainment Studies and Digital Publishing Studies), a bachelor of Communication (International Event, Music & Entertainment Studies) or a bachelor of Lifestyle (International Lifestyle Studies). Fontys ACI hosts a staff of approximately one hundred (teachers plus support staff) as well as about fifty regular visiting lecturers.

 *I wonder if “social philosophy” is being construed as “extreme postmodernist social epistemology”?  

I guess the students are keen to watch that Fictionfactory Peephole.

**Turns out to have been short-lived. Also admits to sockpuppeting at Retraction watch. Frankly I thought it was more fun to guess who “Paul” was, but they have rules.

[ii} One of my April Fool’s Day posts is turning from part fiction to fact.

Categories: Rejected Posts, Statistics | 9 Comments

Oy Faye! What are the odds of not conflating simple conditional probability and likelihood with Bayesian success stories?


Faye Flam

Congratulations to Faye Flam for finally getting her article published at the Science Times at the New York Times, “The odds, continually updated” after months of reworking and editing, interviewing and reinterviewing. I’m grateful too, that one remark from me remained. Seriously I am. A few comments: The Monty Hall example is simple probability not statistics, and finding that fisherman who floated on his boots at best used likelihoods. I might note, too, that critiquing that ultra-silly example about ovulation and voting–a study so bad they actually had to pull it at CNN due to reader complaints[i]–scarcely required more than noticing the researchers didn’t even know the women were ovulating[ii]. Experimental design is an old area of statistics developed by frequentists; on the other hand, these ovulation researchers really believe their theory, so the posterior checks out.

The article says, Bayesian methods can “crosscheck work done with the more traditional or ‘classical’ approach.” Yes, but on traditional frequentist grounds. What many would like to know is how to cross check Bayesian methods—how do I test your beliefs? Anyway, I should stop kvetching and thank Faye and the NYT for doing the article at all[iii]. Here are some excerpts:

Statistics may not sound like the most heroic of pursuits. But if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer.

Continue reading

Categories: Bayesian/frequentist, Statistics | 47 Comments

Blog at The Adventure Journal Theme.


Get every new post delivered to your Inbox.

Join 484 other followers