Author Archives: Mayo

PhilStock: Applectomy? (rejected post)

apple-chart-660x196Apple (AAPL) stock  is a perfect example of how psychology, fear and superstition enter into stock prices as much as do measures of valuation. Any predictions for this afternoon’s earnings? In general, here’s a field where regardless of what happens, “experts” never have to say they were wrong–especially about Tech. So, certainly we don’t. Thus, a wild guess–AAPL (currently down 300 points over its high)  goes up with earnings, but not massively (~5-10pts). Still, there’s such a fear of its being “RIMMED” (i.e., dramatically losing its status as top tech, as did Research in Motion), that it may be beaten down some more.

(To be placed in rejected posts blog)

Categories: Rejected Posts | 6 Comments

Majority say no to inflight cell phone use, knives, toy bats, bow and arrows, according to survey

headlesstsaThe Transportation Security Authority (TSA) has just announced it is backing off its decision to permit, beginning Thursday, 25 April, pocket knives, toy bats, golf clubs (limit 2), lacrosse sticks, billiard cues, ski poles, fishing reels, and other assorted sports equipment, at least for the time being. See my post on “risk based security” Apparently, Pistole (TSA chief) could not entirely ignore the vociferous objections of numerous stakeholders, whom he had not even bothered to consult,  after all. Recall that the former TSA chief, Hawley, had actually wanted to go further, saying

 “They ought to let everything on that is sharp and pointy. Battle axes, machetes … you will not be able to take over the plane. It is as simple as that,” he said. (Link is here.)

I don’t have a strong feeling about blades, but I am very much in sync with the survey that influenced Pistole’s about face as regards cell phones (against) and liquids in carry-ons (for).

Vast majority of Americans say no to cell phone use and pocket knives inflight according to new survey

In a new, nationwide survey, Travel Leaders Group asked Americans across the country if they are in favor of the change and 73% of those polled do not want pocket knives allowed in airplane cabins. Also, a vast majority (nearly 80%) indicate they do not want fellow airline passengers to have the ability to make cell phone calls inflight. The survey includes responses from 1,788 consumers throughout the United States and was conducted by Travel Leaders Group – an $18 billion powerhouse in the travel industry – from March 15 to April 8, 2013.

“The results are very clear. Most Americans would prefer the status quo with regard to cell phone use inflight. Because so many planes are flying at near capacity and many passengers already feel a lack of personal space within the airplane cabin, it’s understandable that they want to continue to have some amount of peace and quiet whether they are on a short commuter flight or a flight that lasts several hours,” stated Travel Leaders Group CEO Barry Liben.

I’m really heartened to see that people are flouting the knee-jerk expectation that they’d want as much high tech as possible, and are weighing in against cell phones on planes. Recall my post on cell phones (now in rejected posts). Here are some of the statistics from the survey:

When asked, “Are you in favor of this change or against it?” 73% of those polled said they are not in favor of allowing pocket knives on planes.

I’m OK with it.

23.6%

I’m OK with everything except   pocket knives.

18.2%

I don’t think these items   should be allowed.

54.8%

I don’t know.

3.5%


Cell Phone Use Inflight

Studies are underway to determine if full cell phone use is safe while inflight and a decision on whether to allow such use (not just “airplane mode”) is expected this summer.  In Travel Leaders Group’s survey, nearly 80% of those polled are against allowing passengers to make cell phone calls during flight.  Here are the detailed responses:

234-young-man-with-cell-phone

I am opposed to it.

47.9%

I am in favor as long as it   is not used for conversations.

31.3%

I am in favor of it.

10.7%

I don’t know.

10.1%

Additional Statistics and Findings:

  • Eliminate One TSA Security Measure: With regard to TSA security screening at the airport, when asked, “Which of the following TSA security measures would you most like to eliminate?” the top responses were: “removing of shoes” (27.9%), “limits on liquids in carry-on baggage” (24.1%), and “none, do not eliminate any security measures” (19.8%).

  • Airport Security Satisfaction: When asked, “What is your level of satisfaction with airport security today?” 82.0% indicate they are satisfied or neutral with today’s security measures (62.2% indicate they are “satisfied,”19.8% are “neither satisfied nor unsatisfied” and 18.0% are “unsatisfied”).

  • Coach Class Flyers: When asked, “Do you ever fly in Coach Class?” over 94% of those polled said “Yes.” And of those who indicate they fly in Coach Class, when asked what makes flying in Coach most uncomfortable, the top responses were: “Lack of leg room” (49.5%); “seat size” (17.2%) and “pitch of the seat – person in front of me reclines too much” (15.0%).

  • This is the fifth consecutive year for this travel survey.  American consumers were engaged predominantly through social media channels such as Facebook and Twitter, as well as through direct contact with travel clients for the following Travel Leaders Group companies: Nexion, Results! Travel, Travel Leaders, Tzell Travel Group and Vacation.com.  (www.travelleadersgroup.com)

 So a tiny bit of good news among the forced air traffic control reductions and FAA cuts that began yesterday: See
http://rejectedpostsofdmayo.com/2013/04/22/msc-kvetch-air-traffic-control-cuts/

Categories: Uncategorized | 6 Comments

Stephen Senn: When relevance is irrelevant

Stephen Senn(guest post) When Relevance is Irrelevant, by Stephen Senn

Head of Competence Center for Methodology and Statistics (CCMS)

Applied statisticians tend to perform analyses on additive scales and additivity is an important aspect of an analysis to try to check. Consider survival analysis. The most important model used, the default in many cases, is the proportional hazards model introduced by David Cox in 1972[1] and sometimes referred to as Cox regression. In fact, from one point of view, analysis takes place on the log-hazard scale and so the model could equally be referred to by the rather clumsier title additive log-hazards model and there is quite a literature on how the proportionality (or equivalently, additivity) assumption can be checked.

Words have a definite power on the mind and you sometimes encounter the nonsensical claim that if the proportionality assumption does not apply you should consider a log-rank test instead. In fact, when testing the null hypothesis that two treatments are identical, neither the log-rank test nor the score test using the proportional hazards model require the assumption of proportionality: the assumption is trivially satisfied by the fact of two treatments being identical. Furthermore the log-rank test is just a special case of proportional hazards: the score test for a proportional hazards model without any covariates is the log-rank test. Finally, it is easy to produce examples where proportional hazards would apply in a model with covariates but not in the model without covariates but very difficult to produce the converse.

An objection often made regarding such models is that they are very difficult for physicians to understand. My reply is to ask what is preferable: a difficult truth or an easy lie? Ah yes, it is sometimes countered, but surely I agree on the importance of clinical relevance. It is surely far more useful to express the results of a proportional hazards analysis in clinically relevant terms that can be understood, such as difference in median length of survival or the difference in the event rate up to a particular census point (say one year after treatment).

A disturbing paper by Snapinn and Jiang[2] points to a problem, however, and to explain it I can do no better that cite the abstract:

The standard analysis of a time-to-event variable often involves the calculation of a hazard ratio based on a survival model such as Cox regression; however, many people consider such relative measures of effect to be poor expressions of clinical meaningfulness. Two absolute measures of effect are often used to assess clinical meaningfulness: (1) many disease areas frequently use the absolute difference in event rates (or its inverse, the number-needed-to-treat) and (2) oncology frequently uses the difference between the median survival times in the two groups. While both of these measures appear reasonable, they directly contradict each other. This paper describes the basic mathematics leading to the two measures and shows examples. The contradiction described here raises questions about the concept of clinical meaningfulness. (p2341)

To see the problem, consider the following. The more serious the disease, the less a given difference in the rate at which people die will impact on the time survived and hence on differences in median survival. However, generally, the higher the baseline mortality rate the greater the difference in survival at a given time point that will be conveyed by a given treatment benefit.

If you find this less than clear, you have my sympathy. The only solution I can offer is to suggest that you read the paper by Snappin and Jiang[2]. However, in that case also consider the following point. If the point is so subtle, how many physicians who cannot understand proportional hazards can understand numbers needed to treat or differences in median survival? My opinion is that they can be counted on the fingers of one foot. Continue reading

Categories: Statistics | 10 Comments

Does statistics have an ontology? Does it need one? (draft 2)

questionmark pinkChance, rational beliefs, decision, uncertainty, probability, error probabilities, truth, random sampling, resampling, opinion, expectations. These are some of the concepts we bandy about by giving various interpretations to mathematical statistics, to statistical theory, and to probabilistic models. But are they real? The question of “ontology” asks about such things, and given the “Ontology and Methodology” conference here at Virginia Tech (May 4, 5), I’d like to get your thoughts (for possible inclusion in a Mayo-Spanos presentation).*  Also, please consider attending**.

Interestingly, I noticed the posts that have garnered the most comments have touched on philosophical questions of the nature of entities and processes behind statistical idealizations (e.g.,https://errorstatistics.com/2012/10/18/query/).copy-cropped-ampersand-logo-blog1

1. When an interpretation is supplied for a formal statistical account, its theorems may well turn out to express approximately true claims, and the interpretation may be deemed useful, but this does not mean the concepts give correct descriptions of reality. The interpreted axioms, and inference principles, are chosen to reflect a given philosophy, or set of intended aims: roughly, to use probabilistic ideas (i) to control error probabilities of methods (Neyman-Pearson, Fisher), or (ii) to assign and update degrees of belief, actual or rational (Bayesian).  But this does not mean its adherents have to take seriously the realism of all the concepts generated. In fact ,we often (on this blog) see supporters of various stripes of frequentist and Bayesian accounts running far away from taking their accounts literally, even as those interpretations are, or at least were, the basis and motivation for the development of the formal edifice (“we never meant this literally”).  But are these caveats on the same order? Or do some threaten the entire edifice of the account?

Starting with the error statistical account, recall Egon Pearson in his “Statistical Concepts in Their Relation to Reality” making it clear to Fisher that the business of controlling erroneous actions in the long run, acceptance sampling in industry and 5-year plans, only arose with Wald, and were never really part of the original Neyman-Pearson tests (declaring that the behaviorist philosophy was Neyman’s, not his).  The paper itself may be found here. I was interested to hear (Mayo 2005)  Neyman’s arch opponent, Bruno de Finetti, remark (quite correctly) that the expression “inductive behavior…that was for Neyman simply a slogan underlining and explaining the difference between his, the Bayesian and the Fisherian formulations” became with Abraham Wald’s work, “something much more substantial” (de Finetti 1972, 176).

Granted, it has not been obvious to people just how to interpret N-P tests “evidentially “ or “inferentially”—the subject of my work over many years. But there always seemed to me to be enough hints and examples to see what was intended: A statistical hypothesis H assigns probabilities to possible outcomes, and the warrant for accepting H as adequate—for an error statistician– is in terms of how well corroborated H is: how well H has stood up to tests that would have detected flaws in H, at least with very high probability. So the grounds for holding or using H are error statistical. The control and assessment of error probabilities may be used inferentially to determine the capabilities of methods to detect the adequacy/inadequacy of models, and express the extent of the discrepancies that have been identified. We also employ these ideas to detect gambits that make it too easy to find evidence for claims, even if the claims have been subjected to weak tests and biased procedures. A recent post is here.

The account has never professed to supply a unified logic, or any kind of logic for inference. The idea that there was a single rational way to make inferences was ridiculed by Neyman (whose birthday is April 16). Continue reading

Categories: Bayesian/frequentist, Error Statistics, Statistics | 61 Comments

O & M Conference (upcoming) and a bit more on triggering from a participant…..

copy-cropped-ampersand-logo-blog1I notice that one of the contributed speakers, Koray Karaca*, at the upcoming Ontology and Methodology Conference at Virginia Tech (May 4-5) focuses his paper on triggering!  I entirely agree with the emphasis on the need to distinguish different questions at multiple stages of an inquiry or research endeavor from the design, collection and modeling of data to a series of hypotheses, questions, problems, and threats of error.  I do note a couple of queries below that I hope will be discussed at some point. Here’s part of his abstract…which may be found on the just created O & M Conference Blog (link is also at the O&M page on this blog). Recent posts on the Higgs data analysis are herehere, and here  Kent Staley had a recent post on the Higgs as well. (For earlier Higgs discussions search this blog.)

Koray Karaca
The method of robustness analysis and the problem of data-selection at the ATLAS experiment

In the first part, I characterize and distinguish between two problems of “methodological justification” that arise in the context of scientific experimentation. What I shall call the “problem of validation” concerns the accuracy and reliability of experimental procedures through which a particular set of experimental data is first acquired and later transformed into an experimental result. Therefore, the problem of validation can be phrased as follows: how to justify that a particular set of data as well as the procedures that transform it into an experimental result are accurate and reliable, so that the experimental result obtained at the end of the experiment can be taken as valid.  On the other hand, what I shall call the “problem of exploration” is concerned with the methodological question of whether an experiment is able, either or both, (1) to provide a genuine test of the conclusions of a scientific theory or hypothesis if the theory in question has not been previously (experimentally) tested, or to provide a novel test if the theory or hypothesis in question has already been tested, and (2) to discover completely novel phenomena; i.e., phenomena which have not been predicted by present theories and detected in previous theories. Even though the problem of validation and the ways it is dealt with in scientific practice has been thoroughly discussed in the literature of scientific experimentation, the significance of the problem of exploration has not yet been fully appreciated. In this work, I shall address this problem and examine the way it is handled in the present-day high collision-rate particle physics experiments. To this end, I shall consider the ATLAS experiment, which is one of the Large Hadron Collider (LHC) experiments currently running at CERN. …What are called “interesting events” are those collision events that are taken to serve to test the as-yet-untested predictions of the Standard Model of particle physics (SM) and its possible extensions, as well as to discover completely novel phenomena not predicted before by any theories or theoretical models.

To read the rest of the abstract, go to our just-made-public O & M conference blog.

First let me say that I’m delighted this case will be discussed at the O&M conference, and look forward to doing so. Here are a couple of reflections from the abstract, partly on terminology. First, I find it interesting that he places “tiggering” (what I alluded to in my last post as a behavioristic, pre-data, task) under “exploratory”. He may be focussed more on what occurs (in relation to this one episode anyhow) when data are later used to check for indications of anomalies for the Standard Model Higgs–having been “parked” for later analysis.  I thought the exploratory stage is usually a stage of informal or semi-formal data analysis to find interesting patterns and potential ingredients (variables, functions) for models, model building, and possible theory development.  When Strassler heard there would be “parked data” for probing anomalies, I take it his theories kicked in to program those exotic indicators. Second, it seems to me that philosophers of science and “confirmation theorists” of various sorts, have focussed on when “data,” all neat and tidied up, count as supporting, confirming, falsifying hypotheses and theories.  I wouldn’t have thought the problem of data collection, modeling or justifying data was “thoroughly discussed”–It absolutely should be– just that it seems all-too-rare. I may be wrong (I’d be glad to see references).

*Koray is a postdoctoral research fellow at the University of Wuppertal, and he knows I’m mentioning him here.

Categories: experiment & modeling | 7 Comments

Statistical flukes (3): triggering the switch to throw out 99.99% of the data

Unknown-1This is the last of my 3 parts on “statistical flukes” in the Higgs data analysis. The others are here and here.  Kent Staley had a recent post on the Higgs as well. 

Many preliminary steps in the Higgs data generation and analysis fall under an aim that I call “behavioristic” and performance oriented: the goal being to control error rates on the way toward finding out something else–here, excess events or bumps of interest.

(a) Triggering. First of all, 99.99% of the data must be thrown away!  So there needs to be a trigger to accept or reject” collision data for analysis–whether for immediate processing or for later on, as in so-called “data parking”.

With triggering we are not far off the idea that a result of a “test”, or single piece of data analysis, is to take one “action” or another:

reject the null -> retain the data;

do not reject -> discard the data.

(Here the null might, in effect, hypothesize that the data are not interesting.) It is an automatic classification scheme, given limits of processing and storing; the goal of controlling the rates of retaining uninteresting and discarding potentially interesting data is paramount.[i] It is common for performance oriented tasks to enter, especially in getting the data for analysis, and they too are very much under the error statistical umbrella.

Particle physicist Matt Strassler has excellent discussions of triggering and parking on his blog “Of Particular Significance”. Here’s just one passage:

Data Parking at CMS (and the Delayed Data Stream at ATLAS) takes advantage of the fact that the computing bottleneck for dealing with all this data is not data storage, but data processing. The experiments only have enough computing power to process about 300 – 400 bunch-crossings per second. But at some point the experimenters concluded that they could afford to store more than this, as long as they had time to process it later. That would never happen if the LHC were running continuously, because all the computers needed to process the stored data from the previous year would instead be needed to process the new data from the current year. But the 2013-2014 shutdown of the LHC, for repairs and for upgrading the energy from 8 TeV toward 14 TeV, allows for the following possibility: record and store extra data in 2012, but don’t process it until 2013, when there won’t be additional data coming in. It’s like catching more fish faster than you can possibly clean and cook them — a complete waste of effort — until you realize that summer’s coming to an end, and there’s a huge freezer next door in which you can store the extra fish until winter, when you won’t be fishing and will have time to process them.

(b) Bump indication. Then there are rules for identifying bumps, excesses more than 2 or 3 standard deviations above what is expected or predicted. This may be the typical single significance test serving as more of an indicator rule.  Observed signals are classified as either rejecting, or failing to reject, a null hypothesis of “mere background”; non-null indications are bumps, deemed potentially interesting. Estimates of the magnitude of any departures are reported and graphically displayed. They are not merely searching for discrepancies with the “no Higgs particle” hypothesis, they are looking for discrepancies with the simplest type, the simple Standard Model Higgs. I discussed this in my first flukes post. Continue reading

Categories: Error Statistics | Tags: , | 1 Comment

Who is allowed to cheat? I.J. Good and that after dinner comedy hour….

UnknownIt was from my Virginia Tech colleague I.J. Good (in statistics), who died four years ago (April 5, 2009), at 93, that I learned most of what I call “howlers” on this blog. His favorites were based on the “paradoxes” of stopping rules.

“In conversation I have emphasized to other statisticians, starting in 1950, that, in virtue of the ‘law of the iterated logarithm,’ by optional stopping an arbitrarily high sigmage, and therefore an arbitrarily small tail-area probability, can be attained even when the null hypothesis is true. In other words if a Fisherian is prepared to use optional stopping (which usually he is not) he can be sure of rejecting a true null hypothesis provided that he is prepared to go on sampling for a long time. The way I usually express this ‘paradox’ is that a Fisherian [but not a Bayesian] can cheat by pretending he has a plane to catch like a gambler who leaves the table when he is ahead” (Good 1983, 135) [*]

This paper came from a conference where we both presented, and he was extremely critical of my error statistical defense on this point. (I was a year out of grad school, and he a University Distinguished Professor.) 

One time, years later, after hearing Jack give this howler for the nth time, “a Fisherian [but not a Bayesian] can cheat, etc.,” I was driving him to his office, and suddenly blurted out what I really thought:

“You know Jack, as many times as I have heard you tell this, I’ve always been baffled as to its lesson about who is allowed to cheat. Error statisticians require the overall and not the ‘computed’ significance level be reported. To us, what would be cheating would be reporting the significance level you got after trying and trying again in just the same way as if the test had a fixed sample size. True, we are forced to fret about how stopping rules alter the error probabilities of tests, while the Bayesian is free to ignore them, but why isn’t the real lesson that the Bayesian is allowed to cheat?” (A published version of my remark may be found in EGEK p. 351: “As often as my distinguished colleague presents this point…”)

 To my surprise, or actually shock, after pondering this a bit, Jack said something like, “Hmm, I never thought of it this way.”

images-3By the way, the story of the “after dinner Bayesian comedy hour” on this blog, did not allude to Jack but to someone who gave a much more embellished version. Since it’s Saturday night, let’s once again listen into the comedy hour that unfolded at my dinner table at an academic conference:

 Did you hear the one about the researcher who gets a phone call from the guy analyzing his data? First the guy congratulates him and says, “The results show a Continue reading

Categories: Bayesian/frequentist, Comedy, Statistics | Tags: , , | 68 Comments

Guest Post. Kent Staley: On the Five Sigma Standard in Particle Physics

Kent Staley

Kent Staley
Associate Professor
Department of philosophy
Saint Louis University

Regular visitors to Error Statistics Philosophy may recall a discussion that broke out here and on other sites last summer when the CMS and ATLAS collaborations at the Large Hadron Collider announced that they had discovered a new particle in their search for the Higgs boson that had at least some of the properties expected of the Higgs. Both collaborations emphasized that they had results that were significant at the level of “five sigma,” and the press coverage presented this is a requirement in high energy particle physics for claiming a new discovery. Both the use of significance testing and the reliance on the five sigma standard became a matter of debate.

Mayo has already commented on the recent updates to the Higgs search results (here and here); these seem to have further solidified the evidence for a new boson and the identification of that boson with the Higgs of the Standard Model. I have been thinking recently about the five sigma standard of discovery and what we might learn from reflecting on its role in particle physics. (I gave a talk on this at a workshop sponsored by the “Epistemology of the Large Hadron Collider” project at Wuppertal [i], which included both philosophers of science and physicists associated with the ATLAS collaboration.)

Just to refresh our memories, back in July 2012, Tony O’Hagan posted at the ISBA forum (prompted by “a question from Dennis Lindley”) three questions regarding the five-sigma claim:

  1. “Why such an extreme evidence requirement?} We know from a Bayesian perspective that this only makes sense if (a) the existence of the Higgs boson (or some other particle sharing some of its properties) has extremely small prior probability and/or (b) the consequences of erroneously announcing its discovery are dire in the extreme. Neither seems to be the case, so why 5-sigma?
  2. “Rather than ad hoc justification of a p-value, it is of course better to do a proper Bayesian analysis. Are the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?
  3. “We know that given enough data it is nearly always possible for a significance test to reject the null hypothesis at arbitrarily low p-values, simply because the parameter will never be exactly equal to its null value. And apparently the LHC has accumulated a very large quantity of data. So could even this extreme p-value be illusory?”

O’Hagan received a lot of responses to this post, and he very helpfully wrote up and posted a digest of those responses, discussed on this blog here and here. Continue reading

Categories: Error Statistics, P-values, Statistics | 26 Comments

Flawed Science and Stapel: Priming for a Backlash?

my 1st fraud kitDeiderik Stapel is back in the news, given the availability of the English translation of the Tilberg (Levelt and Noort Committees) Report as well as his book, Ontsporing (Dutch for “Off the Rails”), where he tries to explain his fraud. An earlier post on him is here. While the disgraced social psychologist was shown to have fabricated the data for something like 50 papers, it seems that some people think he deserves a second chance. A childhood friend, Simon Kuper, in an article “The Sin of Bad Science,” describes a phone conversation with Stapel:

“I’ve lost everything,” the disgraced former psychology professor tells me over the phone from the Netherlands. He is almost bankrupt. … He has tarnished his own discipline of social psychology. And he has become a national pariah. …

Very few social psychologists make stuff up, but he was working in a discipline where cavalier use of data was common. This is perhaps the main finding of the three Dutch academic committees which investigated his fraud. The committees found many bad practices: researchers who keep rerunning an experiment until they get the right result, who omit inconvenient data, misunderstand statistics, don’t share their data, and so on….

Chapter 5 of the Report, pp 47-54, is extremely illuminating about the general practices they discovered in examining Stapel’s papers, I recommend it.

Social psychology might recover. However, Stapel might not. A country’s way of dealing with sinners is often shaped by its religious heritage. In Catholicism, sinners can get absolution in the secrecy of confession. … …In many American versions of Protestantism, the sinner can be “born again”. …Stapel’s misfortune is to be Dutch. The dominant Dutch tradition is Calvinist, and Calvinism believes in eternal sin. …But the downside to not forgiving sinners is that there are almost no second acts in Dutch lives.

http://www.ft.com/intl/cms/s/2/d1e53488-48cd-11e2-a6b3-00144feab49a.html#axzz2PAPIxuHx

But it isn’t just old acquaintances who think Stapel might be ready for a comeback. A few researchers are beginning to defend the field from the broader accusations the Report wages against the scientific integrity of social psychology. They do not deny the “cavalier” practices, but regard them as acceptable and even necessary! This might even pave the way for Stapel’s rehabilitation. An article by a delegate for the 3rd World Conference on Research Integrity (wcri2013.org) in Montreal, Canada, in May reports on members of a new group critical of the Report, including some who were interviewed by the Tilberg Committees: Continue reading

Categories: junk science, Statistics | 21 Comments

possible progress on the comedy hour circuit?

Image of business woman rolling a giant stoneIt’s not April Fool’s Day yet, so I take it that Corey Yanofsky, one of the top 6 commentators on this blog, is serious in today’s exchange, despite claiming to be a Jaynesian (whatever that is). I dare not scratch too deep or look too close…along the lines of not looking a gift horse in the mouth, or however that goes. So here’s a not-too selective report from our exchange in the comments on my previous blogpost:

Mayo: You wrote:”I think I wrote something to the effect that your philosophy was the only one I have encountered that could possibly put frequentist procedures on a sound footing; I stand by that.” I’m curious as to why I deserve this honor ….

Corey: 
Mayo: It was always obvious no competent frequentist statistician would use a procedure criticized by the howlers; the problem was that I had never seen a compelling explanation why (beyond “that’s obviously stupid”). So you deserve the honor for putting forth a single principle from which error statistical procedures flow that refutes all of the howlers at once.

Mayo
: Corey: Wow, that’s a big concession even coupled with your remaining doubts….maybe I should highlight this portion of our exchange for our patient readers, looking for any sign of progress…

Corey:
 Mayo: Feel free to highlight it. I will point out that this “concession” shouldn’t be news to you: in an email I sent you on September 11, 2012, I wrote, ‘I now appreciate how the severity-based approach fully addresses all the typical criticisms offered during “Bayesian comedy hour”. Now, when I encounter these canards in Bayesian writings, I feel chagrin that they are being propagated; I certainly shall not be repeating them myself.’

Mayo: Ok, so you get an Honorable Mention, especially as I’m always pushing this bolder, or maybe it’s a stone egg. It will be a miracle if any to-be-published Bayesian texts or new editions excise some of the howlers!

But I still don’t understand the hesitancy in coming over to the error statistical side….

Categories: Uncategorized | 42 Comments

Higgs analysis and statistical flukes (part 2)

imagesEveryone was excited when the Higgs boson results were reported on July 4, 2012 indicating evidence for a Higgs-like particle based on a “5 sigma observed effect”. The observed effect refers to the number of excess events of a given type that are “observed” in comparison to the number (or proportion) that would be expected from background alone, and not due to a Higgs particle. This continues my earlier post. This, too, is a rough outsider’s angle on one small aspect of the statistical inferences involved. (Doubtless there will be corrections.) But that, apart from being fascinated by it, is precisely why I have chosen to discuss it: we should be able to employ a general philosophy of inference to get an understanding of what is true about the controversial concepts we purport to illuminate, e.g., significance levels.

Following an official report from ATLAS, researchers define a “global signal strength” parameter “such that μ = 0 corresponds to the background only hypothesis and μ = 1 corresponds to the SM Higgs boson signal in addition to the background” (where SM is the Standard Model). The statistical test may be framed as a one-sided test, where the test statistic (which is actually a ratio) records differences in the positive direction, in standard deviation (sigma) units. Reports such as: Continue reading

Categories: P-values, statistical tests, Statistics | 33 Comments

Is NASA suspending public education and outreach?

nasa.07In connection to my last post on public communication of science, a reader sent me this.[i]

NASA Internal Memo: Guidance for Education and Public Outreach Activities Under Sequestration

Source: NASA Internal Memo: Guidance for Education and Public Outreach Activities Under Sequestration

Posted Friday, March 22, 2013

Subject: Guidance for Education and Public Outreach Activities Under Sequestration

As you know, we have taken the first steps in addressing the mandatory spending cuts called for in the Budget Control Act of 2011. The law mandates a series of indiscriminate and significant across-the-board spending reductions totaling $1.2 trillion over 10 years.

As a result, we are forced to implement a number of new cost-saving measures, policies, and reviews in order to minimize impacts to the mission-critical activities of the Agency. We have already provided new guidance regarding conferences, travel, and training that reflect the new fiscal reality in which we find ourselves. Some have asked for more specific guidance at it relates to public outreach and engagement activities. That guidance is provided below. Continue reading

Categories: science communication | 2 Comments

Telling the public why the Higgs particle matters

UnknownThere’s been some controversy in the past two days regarding public comments made about the importance of the Higgs. Professor Matt Strassler, on his blog, “Of Particular Significance,” expresses a bit of outrage:

“Why, Professor Kaku? Why?”

Posted on March 19, 2013 | 70 Comments

Professor Michio Kaku, of City College (part of the City University of New York), is well-known for his work on string theory in the 1960s and 1970s, and best known today for his outreach efforts through his books and his appearances on radio and television.  His most recent appearance was a couple of days ago, in an interview on CBS television, which made its way into this CBS news article about the importance of the Higgs particle.

Unfortunately, what that CBS news article says about “why the Higgs particle matters” is completely wrong.  Why?  Because it’s based on what Professor Kaku said about the Higgs particle, and what he said is wrong.  Worse, he presumably knew that it was wrong.  (If he didn’t, that’s also pretty bad.) It seems that Professor Kaku feels it necessary, in order to engage the imagination of the public, to make spectacular distortions of the physics behind the Higgs field and the Higgs particle, even to the point of suggesting the Higgs particle triggered the Big Bang. Continue reading

Categories: science communication | Leave a comment

Update on Higgs data analysis: statistical flukes (part 1)

physics pic yellow particle burst blue coneI am always impressed at how researchers flout the popular philosophical conception of scientists as being happy as clams when their theories are ‘born out’ by data, while terribly dismayed to find any anomalies that might demand “revolutionary science” (as Kuhn famously called it). Scientists, says Kuhn, are really only trained to do “normal science”—science within a paradigm of hard core theories that are almost never, ever to be questioned.[i] It is rather the opposite, and the reports out last week updating the Higgs data analysis reflect this yen to uncover radical anomalies from which scientists can push the boundaries of knowledge. While it is welcome news that the new data do not invalidate the earlier inference of a Higgs-like particle, many scientists are somewhat dismayed to learn that it appears to be quite in keeping with the Standard Model. In a March 15 article in National Geographic News:

Although a full picture of the Higgs boson has yet to emerge, some physicists have expressed disappointment that the new particle is so far behaving exactly as theory predicts. Continue reading

Categories: significance tests, Statistics | 30 Comments

Normal Deviate: Double Misunderstandings About p-values

Sisyphus

sisyphean task

I’m really glad to see that the Normal Deviate has posted about the error in taking the p-value as any kind of conditional probability. I consider the “second” misunderstanding to be the (indirect) culprit behind the “first”.

Double Misunderstandings About p-values

March 14, 2013 – 7:57 pm

It’s been said a million times and in a million places that a p-value is not the probability of H0  given the data.

But there is a different type of confusion about p-values. This issue arose in a discussion on Andrew’s blog.

Andrew criticizes the New York times for giving a poor description of the meaning of p-values. Of course, I agree with him that being precise about these things is important. But, in reading the comments on Andrew’s blog, it occurred to me that there is often a double misunderstanding.

First, let me say that I am neither defending nor criticizing p-values in this post. I am just going to point out that there are really two misunderstandings floating around. Continue reading

Categories: P-values | 3 Comments

Risk-Based Security: Knives and Axes

headlesstsaAfter a 6-week hiatus from flying, I’m back in the role of female opt-out[i] in a brand new Delta[ii] terminal with free internet and ipads[iii]. I heard last week that the TSA plans to allow small knives in carry-ons, for the first time since 9/11, as “part of an overall risk-based security approach”. But now it appears that flight attendants, pilot unions, a number of elected officials, and even federal air marshals are speaking out against the move, writing letters and petitions of opposition.

“The Flight Attendants Union Coalition, representing nearly 90,000 flight attendants, and the Coalition of Airline Pilots Associations, which represents 22,000 airline pilots, also oppose the rule change.”

Former flight attendant Tiffany Hawk is “stupefied” by the move, “especially since the process that turns checkpoints into maddening logjams — removing shoes, liquids and computers — remains unchanged,” she wrote in an opinion column for CNN. Link is here. Continue reading

Categories: evidence-based policy, Rejected Posts, Statistics | 17 Comments

S. Stanley Young: Scientific Integrity and Transparency

Stanley Young recently shared his summary testimony with me, and has agreed to my posting it.

YoungPhoto2008 S. Stanley Young, PhD
Assistant Director for Bioinformatics
National Institute of Statistical Sciences
Research Triangle Park, NC

One-page Summary Young
Testimony of Committee on Science, Space and Technology, 5 March 2013
Scientific Integrity and Transparency
S. Stanley Young, PhD, FASA, FAAAS

Integrity and transparency are two sides of the same coin. Transparency leads to integrity. Transparency means that study protocol, statistical analysis code and data sets used in papers supporting regulation by the EPA should be publicly available as quickly as possible and not just going forward. Some might think that peer review is enough to ensure the validity of claims made in scientific papers. Peer review only says that the work meets the common standards of the discipline and on the face of it, the claims are plausible, Feinstein, Science, 1988. Peer review is not enough. Continue reading

Categories: evidence-based policy, Statistics | 10 Comments

Blog Contents 2013 (Jan & Feb)

Error Statistics Philosophy BLOG:Table of Contents 2013 (Jan & Feb)metablog old fashion typewriter
Organized by  Nicole Jinn & Jean Miller 

January 2013

(1/2) Severity as a ‘Metastatistical’ Assessment
(1/4) Severity Calculator
(1/6) Guest post: Bad Pharma? (S. Senn)
(1/9) RCTs, skeptics, and evidence-based policy
(1/10) James M. Buchanan
(1/11) Aris Spanos: James M. Buchanan: a scholar, teacher and friend
(1/12) Error Statistics Blog: Table of Contents
(1/15) Ontology & Methodology: Second call for Abstracts, Papers Continue reading

Categories: Metablog | Leave a comment

Stephen Senn: Casting Stones

senncropped1Casting Stones, by Stephen Senn*

At the end of last year I received a strange email from the editor of the British Medical Journal(BMJ) appealing for  ‘evidence’ to persuade the UK parliament of the necessity of making sure that data for clinical trials conducted by the pharmaceutical industry are made readily available to all and sundry.  I don’t disagree with this aim. In fact in an article(1) I published over a dozen years ago I wrote ‘No sponsor who refuses to provide end-users with trial data deserves to sell drugs.’(P26)

However, the way in which the BMJ is choosing to collect evidence does not set a good example. It is one I hope that all scientists would disown and one of which even journalists should be ashamed.

The letter reads

“Dear Prof Senn,

We need your help to show the House of Commons Science and Technology Select Committee the true scale of the problem of missing clinical data by collating a list of examples. Continue reading

Categories: evidence-based policy, Statistics | 28 Comments

Big Data or Pig Data?

pig-bum-textI don’t know if my reading of this Orwellian* piece is in sync with what Rameez intended, but he thought it was fine for me to post it here. See what you think: 

“Big Data or Pig Data” (A fable on huge amounts of data and why we don’t need models) by Remeez Rahman, computer scientist: posted at Realm of the SCENSCI

 There was a pig who wanted to be a scientist. He was not interested in models. When asked how he planned on making sense of the world, the pig would say in a deep mysterious voice, “I don’t do models: the world is my model” and then with a twinkle in his eyes, look at his interlocutor smugly.

By his phrase, “I don’t do models, the world is my model”, he meant that the world’s data was enough for him, the pig scientist. The more the data, the more accurately the pig declared, he would be able to predict what might happen in the world. Continue reading

Categories: Statistics | 22 Comments

Blog at WordPress.com.