Mayo: Meanderings on the Onto-Methodology Conference

mayo blackboard b&w 2Writing a blog like this, a strange and often puzzling exercise[1], does offer a forum for sharing half-baked chicken-scratchings from the back of frayed pages on themes from our Onto-Meth[2] conference from two weeks ago[3]. (The previous post had notes from blogger and attendee, Gandenberger.)

Onto-Meth conference

Onto-Meth conference

Several of the talks reflect a push-back against the idea that the determination of “ontology” in science—e.g., the objects and processes of theories, models and hypotheses—is (or should strive to correspond to?)  “real” objects in the world and/or what is approximately the case about them. Instead, at least some of the speakers wish to liberate ontology to recognize how “merely” pragmatic goals, needs, and desires are not just second-class citizens, but can and do (and should?) determine the categories of reality. Well there are a dozen equivocations here, most of which we did not really discuss at the conference.

In my own half of the Spanos-Mayo (D & P presentation[4]) I granted and even promoted the idea of a methodology that was pragmatic while also objective, so I’m not objecting to that part. The measurement of my weight is a product of “discretionary” judgments (e.g., to weigh in pounds with a scale having a given precision), but it is also a product of how much I really weigh (no getting around it). By understanding the properties of methodological tools and measuring systems, it is possible to “subtract out” the influence of the judgments to get at what is actually the case. At least approximately. But that view is different, it seems to me, from someone like Larry Laudan (at least in his later metamorphosis). Even though he considers his “reticulated” view a fairly hard-nosed spin on the Kuhnian idea of scientific paradigms as invariably containing an ontology (e.g., theories), a methodology, and (what he called) an “axiology” or set of aims (OMA), Laudan seems to think standards are so variable that what counts as evidence is constantly fluctuating (aside from maybe retaining the goal of fitting diverse facts). So I wonder if these pragmatic leanings are more like Laudan or more like me (and my view here, I take it, is essentially that of Peirce). I am perfectly sympathetic to the piecemeal “locavoracity” idea in Ruesche, by the way.

My worry, one of them, is that all kinds of rival entities and processes arise to account for (accord with, predict, and purportedly explain) data and patterns in data, and don’t we need ways to discriminate them? During the open discussion, I mentioned several examples, some of which I can make out all scrunched up in the corners of my coffee-logged program, such as appeals to “cultural theories” of risk and risk perceptions. These theories say appeals to supposedly “real” hazards, e.g, chance of disease, death, catastrophe, and other “objective” risk assessments are wrong.  They say it is not only possible but preferable (truer?) to capture attitudes toward risks, e.g., GM foods, nuclear energy, climate change, breast implants, etc. by means of one or another favorite politico-cultural grid-group categories (e.g., marginal-individualists, passive-egalitarians, hierarchical-border people, fatalists,  etc.). (Your objections to these vague category schemes are often taken as further evidence that you belong in one of the pigeon-holes!) And the other day I heard a behavioral economist declare that he had found the “mechanism” to explain deciding between options in virtually all walks of life using a regression parameter, he called it beta, and guess what? beta = 1/3! He proved it worked statistically too. He might be right, he had a lot of data. Anyway, in my deliberate attempt to trigger discussion at the conference end, I was wondering if some of the speakers and/or attendees (Danks, Woodward, Glymour? Anyone?) had anything to say about cases that some of us might wish to call reification.

I am tempted to view Woodward’s idea of developing heuristics for choosing variables as well-captured by my favorite goal: finding things out via severe tests (be stringent but learn something, promote error correcting improvements, etc.) One can start almost anywhere, and with adequate error probes speed up the goals of finding things out (yet another Peircean theme). Woodward did not say whether the rationale behind his heuristics would be something along these lines. But a rational is needed, or so I would claim. I was trying (in the discussion) to drive home this felt need to articulate a rationale, without which I suspect one overlooks the creative drive toward satisfying these heuristics; I mean why prefer these heuristics? That they may be found satisfied in “successful science” (after the fact) would not necessarily mean they identified forward-looking rules or criteria.

Maybe it’s the contrarian in me, but I might like to add a heuristic such as:

  • find ways to suspect your variables and model even though all the previous heuristic rules are well-satisfied.

Or,

  • pursue variables that fail to satisfy your preference for “variables that have unambiguous effects under manipulation”—as of now, given all we know–and discover a novel way to discriminate them anyway.

Or, another way to get at my contrarian inklings, suppose variables have been chosen along the lines of Woodward’s heuristics, and everything seems hunky dory. What impetus is there to find out how the model may be wrong (despite satisfying all those nice expectations)? Retrospectively, these rules might be satisfied, but prospectively, might not they encourage leaning back? (not to allude to the one year anniversary of Facebook’s IPO).

There are some other chicken scratchings I may come back to if I hear from anyone….


[1]But from time to time, someone tells me that found something of value….

[2] Ben Jantzen discovered that abbreviating our conference this way would lead people to methamphetamine websites; so we didn’t use it officially. Thus I use it here.

[3] I’ve been deeply engaged in something I’ll explain later on–not to mention traveling to faraway places–and anyway, for some reason the blog is getting tons and tons of spam. I’m not sure what has changed over at wordpress.

[4] Dog and pony. I may post my pony slides.

Categories: O & M conference, Statistics | 6 Comments

Gandenberger on Ontology and Methodology (May 4) Conference: virginia Tech

greg pic

Gregory Gandenberger
Ph.D graduate student: Dept. of History and Philosophy of Science & Dept. of Statistics
University of Pittsburgh
http://gsganden.tumblr.com/

Onto-Meth conference

Onto-Meth conference


Some Thoughts on the O&M 2013 Conference
I was struck by how little speakers at the Ontology and Methodology conference engaged with the realism/antirealism debate. Laura Ruetsche defended a version of Arthur Fine’s Natural Ontological Attitude (NOA) in the first talk of the conference, but none of the speakers after her addressed the debate directly. David Danks and Jim Woodward made it particularly clear that they were deliberately avoiding questions about realism in favor of questions about what kinds of ontologies our theories should have in order to best serve the various purposes for which we develop them.

I am not criticizing the speakers! I am inclined to agree with Clark Glymour that the kinds of questions Danks and Woodward addressed are more interesting and important than questions about “what’s really real.” On the other hand, I worry that we lose something when we focus only on the use of science toward such ends as prediction and control. During the discussion period at the end of the conference, Peter Godfrey-Smith argued that science has some value simply for telling us what really is the case. For instance, science tells us that all living things on earth have a common ancestor, and that fact is a good thing to know regardless of whether or not it helps us predict or control anything.

One feature of the realism/antirealism debate that has long bothered me is that it treats all of “our best sciences” as if they had roughly the same epistemic status. In fact, realism about quantum field theory, for instance, is much harder to defend than realism about evolutionary biology. I am inclined to dismiss the realism debate as ill-formed insofar as it presumes that the question of scientific realism is a single question that spans all of the sciences. I am also suspicious of the debate in its bread-and-butter domain of fundamental physics. It is not clear to me that there is such a thing as fundamental physics; that if there is such a thing as fundamental physics, then it is converging toward a unified ontology; that if it is converging toward a unified ontology, then we can make sense of the question whether or not that ontology is correct; or that if we can make sense of the question whether or not that ontology is correct, then we have the means to give a justified answer to that question.

Nevertheless, as Glymour pointed out during the open discussion period, there are still good and open questions to address about whether and how we are justified in believing that science tells us the truth in other domains (such as evolutionary theory) where the realism question seems relatively well-formed and answerable. We can dismiss questions about “what’s really real” at a “fundamental level” while still thinking that philosophers of science should have a story to tell the 46% of Americans who believe that human beings were created in more or less their current form within the last 10,000 years—not a story about how science serves purposes of prediction and control, but a story about how science can help us find the truth.

Categories: O & M conference | 3 Comments

“A sense of security regarding the future of statistical science…” Anon review of Error and Inference

errorinferencebookcover-e1335149598836-1Aris Spanos, my colleague and co-author (Economics),recently came across this seemingly anonymous review of our Error and Inference (2010) [E & I]. It’s interesting that the reviewer remarks that “The book gives a sense of security regarding the future of statistical science and its importance in many walks of life.” I wish I knew just what the reviewer means–but it’s appreciated regardless.

2010 American Statistical Association and the American Society for Quality

TECHNOMETRICS, AUGUST 2010, VOL. 52, NO. 3, Book Reviews, 52:3, pp. 362-370.

Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, edited by Deborah G. MAYO and Aris SPANOS, New York: Cambridge University Press, 2010, ISBN 978-0-521-88008-4, xvii+419 pp., $60.00.

This edited volume contemplates the interests of both scientists and philosophers regarding gathering reliable information about the problem/question at hand in the presence of error, uncertainty, and with limited data information.

The volume makes a significant contribution in bridging the gap between scientific practice and the philosophy of science. The main contribution of this volume pertains to issues of error and inference, and showcases intriguing discussions on statistical testing and providing alternative strategy to Bayesian inference. In words, it provides cumulative information towards the philosophical and methodological issues of scientific inquiry at large.

The target audience of this volume is quite general and open to a broad readership. With some reasonable knowledge of probability theory and statistical science, one can get the maximum benefit from most of the chapters of the volume. The volume contains original and fascinating articles by eminent scholars (nine, including the editors) who range from names in statistical science to philosophy, including D. R. Cox, a name well known to statisticians.

The editors have done a superb job in presenting, organizing, and structuring the material in a logical order. The “Introduction and Background” is nicely presented and summarized, allowing for a smooth reading of the rest of the volume. There is a broad range of carefully selected topics from various related fields reflecting recent developments in these areas. The rest of the volume is divided in nine chapters/sections as follows:

1. Learning from Error, Severe Testing, and the Growth of Theoretical

Knowledge

2. The Life of Theory in the New Experimentalism

3. Revisiting Critical Rationalism

4. Theory Confirmation and Novel Evidence

5. Induction and Severe Testing

6. Theory Testing in Economics and the Error-Statistical Perspective

7. New Perspectives on (Some Old) Problems of Frequentist Statistics

8. Casual Modeling, Explanation and Severe Testing

9. Error and Legal Epistemology

In summary, this volume contains a wealth of knowledge and fascinating debates on a host of important and controversial topics equally important to the philosophy of science and scientific practice. This is a must-read—I enjoyed reading it and I am sure you will too! The book gives a sense of security regarding the future of statistical science and its importance in many walks of life. I also want to take the opportunity to suggest another seemingly related book by Harman and Kulkarni (2007). The review of this book was appeared in Technometricsin May 2008 (Ahmed 2008).

The following are chapters in E & I (2010) written by Mayo and/or Spanos, if you’re interested. If you produce a palindrome meeting the extremely simple requirements for May (by May 25 or so), you can win a free copy!

  • Spanos, A. (2010). Theory Testing in Economics and the Error-Statistical Perspective in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D. Mayo and A. Spanos eds.), Cambridge: Cambridge University Press:  202-246.
  • Spanos, A. (2010). On a New Philosophy of Frequentist Inference Exchanges with David Cox and Deborah G. Mayo in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D. Mayo and A. Spanos eds.), Cambridge: Cambridge University Press:  325-330.
  • Spanos, A. (2010). Graphical Causal Modeling ad Error Statistics Exchanges with Clark Glymour in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D. Mayo and A. Spanos eds.), Cambridge: Cambridge University Press:  364-375.
Categories: Review of Error and Inference, Statistics | 3 Comments

‘No-Shame’ Psychics Keep Their Predictions Vague: New Rejected post

imagesSee new rejected post.(You may comment here or on the Rejected Posts blog)

Categories: msc kvetch, rejected post | Leave a comment

If it’s called the “The High Quality Research Act,” then ….

Unknown-2Among the (less technical) items sent my way over the past few days are discussions of the so-called High Quality Research Act. I’d not heard of it, but it’s apparently an outgrowth of the recent hand-wringing over junk science, flawed statistics, non-replicable studies, and fraud (discussed at times on this blog). And it’s clearly a hot topic. Let me just run this by you and invite your comments (before giving my impression). Following the Bill, below, is a list of five NSF projects about which the HQRA’s sponsor has requested further information, and then part of an article from today’s New Yorker on this “divisive new bill”: “Not Safe for Funding: The N.S.F. and the Economics of Science”.

[DISCUSSION DRAFT]

A BILL

April 18, 2013

TO [BE SUPPLIED]

Be it enacted by the Senate and House of Representatives of the United States of America in Congress assembled,

SECTION 1. SHORT TITLE.

This act may be cited as the “High Quality Research Act”.

SECTION 2. HIGH QUALITY RESEARCH.

(a) CERTIFICATION.—prior to making an award of any contract or grant funding for a scientific research project, the Director of the NSF shall publish a statement on the public website of the Foundation that certifies that the research project—

(1) is in the interests of the U.S. to advance the national health, prosperity, or welfare, and to secure the national defense by promoting the progress of science;

(2) is the finest quality, is ground breaking, and answers questions or solves problems that are of utmost importance to society at large; and

(3) is not duplicative of other research projects being funded by the Foundation or other Federal Science agencies.

(b) TRANSFER OF FUNDS.—Any unobligated funds for projects ot meeting the requirements of subjection (a) may be awarded to other scientific research projects that do meet such requirements.

(e) INITIAL IMPLEMENTATION REPORT.—Not later than 60 days after the date of enactment of this Act, the Director shall report to the Committee on Commerce, Science, and Transportation of the Senate and the Committee on Science, Space, and Technology of the House of Representatives on how the requirements set for in subsection (a) are being implemented.

(d) NATIONAL SCIENCE BOARD IMPLEMENTATION REPORT. __ Not later than 1 year after the date of enactment of this act, the national science board shall report to the committee on commerce, science, and transportation of the senate and the committee on science, space and technology of the house of representatives its findings and recommendations on how the requirements of subsection (a) are being implemented.

etc. etc.

Link to the Bill

Rep. Lamar Smith,author of the Bill, listed five NSF projects about which he has requested further information. 

1. Award Abstract #1247824: “Picturing Animals in National Geographic, 1888-2008,” March 15, 2013, ($227,437); 

2. Award Abstract #1230911: “Comparative Histories of Scientific Conservation: Nature, Science, and Society in Patagonian and Amazonian South America,” September 1, 2012 ($195,761);

3. Award Abstract #1230365: “The International Criminal Court and the Pursuit of Justice,” August 15, 2012 ($260,001);

4. Award Abstract #1226483, “Comparative Network Analysis: Mapping Global Social Interactions,” August 15, 2012, ($435,000); and

5. Award Abstract #1157551: “Regulating Accountability and Transparency in China’s Dairy Industry,” June 1, 2012 ($152,464).

________________________

MAY 9, 2013

NOT SAFE FOR FUNDING: THE N.S.F. AND THE ECONOMICS OF SCIENCE

POSTED BY DYLAN WALSH (The New Yorker)

Last month, Representative Lamar Smith, chairman of the House Committee on Science, Space, and Technology, introduced a divisive new bill, the High Quality Research Act, that would change the criteria by which the National Science Foundation evaluates research projects and awards funding. (The N.S.F., with a budget of seven billion dollars, funds roughly twenty per cent of federally supported basic research in American universities.) Currently, proposals are evaluated through a traditional peer-review process, in which scientists and experts with knowledge of the relevant fields evaluate the projects’ “intellectual merits” and “broader impacts.” Peer review is a central tenet of modern academic science, and, according to critics, the new bill threatens to supersede it with politics.

John Holdren, the director of the White House Office of Science and Technology Policy, said last week that “adding Congress as reviewers is a mistake.” Representative Eddie Bernice Johnson warned more forcefully that Representative Smith was “sending a chilling message to the entire scientific community that peer review may always be trumped by political review.” But in a statement, Representative Smith said the draft bill “improves on [the peer-review process] by adding a layer of accountability.” The bill’s new three-point criteria for funding require that a project be “in the interests of the United States to advance the national health, prosperity, or welfare, and to secure the national defense”; solve “problems that are of the utmost importance to society at large”; and not be “duplicative of other research projects being funded by the Foundation or other Federal science agencies.”

Implicit in the proposal’s language is a desire for oversight built into the process of determining which areas of study are significant. (Representative Smith cited five N.S.F.-funded social science projects, with concerns as to whether they “adhere to NSF’s ‘intellectual merit’ guideline”(above). To Smith’s point, despite sizable public investment in research and development (nearly $150 billion this year), relatively scant attention is devoted to investigating whether the process of science, in its current form, is well-designed for generating knowledge.

As it turns out, the N.S.F. recently awarded a grant to Kevin Zollman, an assistant professor of philosophy at Carnegie Mellon University, to investigate what drives researchers to pursue particular projects, and how they secure funding for them. In an ideal world, all science would proceed from the simple and lofty social goal of expanding human knowledge, but researchers are of course subject to the constraints of economic reality. To disentangle these knotted incentives, Zollman is applying a branch of game theory known as “mechanism design,” which uses simple models to understand how individual players within a system balance competing motivations to arrive at different ends.

Zollman’s first object of study is the “priority rule,” which states that all of the credit earned by scientific discovery—the money, the praise, the promotion and adulation—is directed to the first researcher across the finish line, whether by weeks, days, or hours. The priority rule was first elaborated in the nineteen-fifties by the sociologist Robert K. Merton, who observed that “In the institution of science, originality is at a premium.” He described its role over centuries of scientific progress, with fierce and stubborn quarrels punctuating the time line of discovery: Newton and Leibniz fought over the invention of calculus, Galileo and Simon Mayr over the discovery of Jupiter’s moons.

As researchers scramble to carve out intellectual real estate, however, overly aggressive competition and singular focus on originality can elicit a host of negative behaviors: bias toward reporting positive or rushed results, withholding or fabricating data, and counterproductive levels of output. Of fifty million scholarly articles published since 1665, more than half of them appeared in the last twenty-five years. While a number of factors contribute to this glut, one of them is the pressure to publish results even when they’re not immediately relevant. “Much of the recent scientific literature is repetitive, unimportant, poorly conceived or executed, and oversold; perhaps deservingly, much of it is ignored,” wrote the microbiologist Ferric Fang in a 2012 editorial. The problem is becoming particularly acute as a growing pool of scientists face a shrinking pot of money, both from drying stimulus funds scheduled to disappear this September and 2.9-percent budget cuts forced on the N.S.F. through sequestration. The currently overcharged pull of priority, by spurring unnecessary questions and hairs to be split, can thus dilute or stifle the advancement of science.

In light of these problems, Representative Smith proposed his bill. Yet the economic straits in science already encourage research that targets specific, funder-driven priorities over riskier, more open-ended questions. As the Nobel laureate Roger Kornberg lamented in 2007, “If the work that you propose to do isn’t virtually certain of success, then it won’t be funded.” Elevated political scrutiny would likely only reduce the willingness of agencies like the N.S.F. to fund projects without clearly defined, or even expected, outcomes. In tension with this reality is the fact that revolution in science is often indebted to prolonged exploration of basic research projects, which may at their outset fail to meet Representative Smith’s criteria. Forecasting how well research will “advance the national health, prosperity, or welfare,” or predetermining the degree to which a project is or is not “groundbreaking,” demands an unlikely prescience. Further embedding specific requirements in grant allocation could make improvements at the margins by defunding egregiously conspicuous research. But it also threatens to close off a large landscape of research questions with unforeseen potential.

The article is here.

Categories: junk science, science communication, Statistics | 14 Comments

Professorships in Scandal?

Unknown-1On page 1 of the New York Times yesterday was an article, “The Last Refuge From Scandal? Professorships”:

The traditional path to an academic job is long and laborious: the solitude and penury of graduate study, the scramble for one of the few open positions in each field, the blood sport of competitive publishing. But while colleges have always courted accomplished public figures, a leap to the front of the class has now become a natural move for those who have suffered spectacular career flameouts. At this point, the transition from public disgrace to college lectern is so familiar that when Mr. Galliano merely stepped foot on the campus of Central Saint Martins, an art and design school in London, speculation rippled around the world— incorrectly — that he would soon be teaching there.

I guess this shouldn’t surprise anyone. Sexy course titles and “novelty academics” are pretty old-hat; power and scandal, even if on the sleazy side,attract students; and if students are buying, universities can’t be blamed for selling. Or can they?  Here are some examples they cite:

After a sex scandal forced Eliot Spitzer from the governor’s mansion in Albany, he turned up at City College, teaching a course called “Law and Public Policy.” …

More recently, Parsons the New School for Design announced that John Galliano, the celebrated clothing designer who lost his job at Christian Dior after unleashing a torrent of anti-Semitic vitriol in a bar, would be leading a four-day workshop and discussion called “Show Me Emotion.”

And David H. Petraeus, the general turned intelligence chief turned ribald punch line, will have not one college paycheck, but two. Last month, the City University of New York said he would be the next visiting professor of public policy at Macaulay Honors College. On Thursday, the University of Southern California announced that Mr. Petraeus would also be teaching there…

Despite a petition objecting to Galliano, there seems to be little public concern that offering such courses threatens a university’s ethical standards, especially, perhaps, if “only” sexual transgressions are involved.  Still, while I can see students wanting to enroll in a course taught by a Petreaus or a Spitzer, I doubt the same would be true for one run by a Deiderick Stapel*.  Is it because in the former cases the scandal does not directly touch on their accomplishments? Is there a justifiable principle of distinction operating?**   (Or might it depend on the course?)

….Though they rarely pay much, arrangements like these have obvious benefits. For the new professors, the jobs offer a chance to do something positive rather than sitting home with their regrets, and to begin rehabilitating their image by associating themselves with intellectual pursuits. The students get to learn about history from people who made it — though the lessons generally steer well clear of the professors’ less noble accomplishments. And the colleges get to hire someone who might otherwise be out of reach…

But if scandal presents these schools with a bargain, it can also carry a cost. Mr. Galliano has apologized for his remarks, but an online petition opposing his arrival at Parsons has nonetheless attracted more than 2,000 signatures. Joel Towers, the executive dean of Parsons, has called the class a chance “to learn from positive and negative examples.” Mr. Galliano is volunteering his time.

Anthony D. Weiner, the former congressman who had a new kind of sex scandal played out in 140-character Twitter posts, said he had had “very preliminary discussions” about a teaching role somewhere but decided it would create “too much of a commotion.” He is now contemplating a run for mayor of New York, but says he has not ruled out teaching.

Aside from offering avenues of rehabilitation for fallen politicians, ex-powerful men, and celebrities, it may be argued that these courses offer valuable learning opportunities for students. Yet apparently they deliberately steer clear of their particular scandals.

For Mr. Spitzer’s weekly seminar, which he taught from fall 2009 to spring 2012, he was paid slightly less than $5,000 a semester, which he donated to the school after other professors said it was on the high side of what adjuncts earn. ….Ms. Lynch [one of his students] recalled classroom discussions as intense and wide-ranging — with limits. “He would never let us get to a point, obviously, where we were going to address him stepping down,” she said…

Mr. McGreevey, who resigned as New Jersey governor after announcing that he was gay and had had an extramarital affair with an aide on his payroll, said that teaching offered a way “not merely to be engaged in the backwash of the recent past but to look forward.” He has taught since 2007 at Kean, a public university in New Jersey, earning $3,600 a semester.

He said he hoped his story might encourage others to teach, adding, “We need teachers and adjuncts who bring from their own private life a wealth of experiences to share with America’s future leaders, however flawed those experiences may be.” But Reina I. Valenzuela, who took his class in 2009 and still raves about it, said the class was structured around case studies, none of which were his own. “It was never a topic of discussion,” she said.

David A. Paterson, who stepped in to finish Mr. Spitzer’s term as governor, ended his campaign to win his own full term after revelations that his administration had intervened on behalf of an aide who was accused of assault. Rather than try his luck in the voting booth, he headed to New York University, where he led a seminar called “The Teachable Art of Governing.” (“I wanted to teach history, and they looked at me like I had three heads,” he recalled, saying that N.Y.U. responded, in essence: “‘You kidding? What do you think we brought you here for?’ ”)

The full article is here….***

*But my April fool’s post turned out not to be so very far from true, as I had assumed. Search this blog for Stapel, if interested.

**Even so, it might be awkward for a Spitzer to charge a student for plagiarizing.

***Catching up on blogs since the conference, I note too that Gelman has something on “cleaning up science”.

Categories: rejected post | 3 Comments

Schedule for Ontology & Methodology, 2013

copy-cropped-ampersand-logo-blog1

May 4 (Saturday):

8:30-9:00: Pastries & Coffee (Continental Breakfast) outside of Pamplin 2030

MORNING SESSIONS:

9:00-9:15—Welcome talk
9:15-10:00 Ruetsche: “Method, Metaphysics, and Quantum Theory”
10:00-10:25: Discussion

10:25-10:40 coffee break

10:40-11:05 Shech, “Phase Transitions, Ontology and Earman’s Sound Principle”
11:05-11:20: Discussion

11:20-12:05 Godfrey-Smith, “Evolution and Agency: A Case Study in Ontology and Methodology”
12:05-12:30: Discussion

12:30-1:30 Box Lunch

AFTERNOON SESSIONS:

1:30-1:55: Clatterbuck, “Drift Beyond Wright-Fisher: The Predictive Inequivalency of Drift Models”
1:55-2:10: Discussion

2:10- 2:55: Mayo & Spanos, “Ontology & Methodology in Statistical Modeling”
2:55-3:20: Discussion

3:20-3:30 coffee break

3:30-3:55: Karaca, “The method of robustness analysis and the problem of data-selection at the ATLAS experiment”
3:55-4:10: Discussion

4:10 – 4:55 Patton, “Theory Assessment and Ontological Argument”
4:55-5:20: Discussion

5:20- 5:45 Marcellesi, “Counterfactuals and ‘nonmanipulable’ properties: When bad methodology leads to bad metaphysics (and vice-versa)”
5:45-6:00: Discussion

7:15–DINNER

May 5 (Sunday):

8:30-9:00: Pastries & Coffee (Continental Breakfast) outside of Pamplin 2030

MORNING SESSIONS:

9:00-9:45: Danks, “The Myriad Influences of Goals on Ontology”

9:45-10:10 Discussion

10:10-10:55 Hoover, “The Ontological Status of Shocks and Trends in Macroeconomics”
10:55-11:20: Discussion

11:20-11:35 Coffee break

11:35-12:00  Angner, “Behavioral vs. Neoclassical Economics: A Weberian Analysis”
12:00-12:15: Discussion

12:15-1:45 BREAK FOR LUNCH

AFTERNOON SESSIONS:

1:45-2:30 Woodward, “The Problem of Variable Choice”
2:30-2:55: Discussion

2:55-3:40 Jantzen, “The Algebraic Conception of Natural Kinds”
3:40-4:05: Discussion

4:05-5:15 Overview: Open Discussion & libations

Categories: Announcement | Leave a comment

What should philosophers of science do? (Higgs, statistics, Marilyn)

Marilyn Monroe not walking past a Higgs boson and not making it decay, whatever philosophers might say.

Marilyn Monroe not walking past a Higgs boson and not making it decay, whatever philosophers might say.

My colleague, Lydia Patton, sent me this interesting article, “The Philosophy of the Higgs,” (from The Guardian, March 24, 2013) when I began the posts on “statistical flukes” in relation to the Higgs experiments (here and here); I held off posting it partly because of the slightly sexist attention-getter pic  of Marilyn (in reference to an “irrelevant blonde”[1]), and I was going to replace it, but with what?  All the men I regard as good-looking have dark hair (or no hair). But I wanted to take up something in the article around now, so here it is, a bit dimmed. Anyway apparently MM was not the idea of the author, particle physicist Michael Krämer, but rather a group of philosophers at a meeting discussing philosophy of science and science. In the article, Krämer tells us:

For quite some time now, I have collaborated on an interdisciplinary project which explores various philosophical, historical and sociological aspects of particle physics at the Large Hadron Collider (LHC). For me it has always been evident that science profits from a critical assessment of its methods. “What is knowledge?”, and “How is it acquired?” are philosophical questions that matter for science. The relationship between experiment and theory (what impact does theoretical prejudice have on empirical findings?) or the role of models (how can we assess the uncertainty of a simplified representation of reality?) are scientific issues, but also issues from the foundation of philosophy of science. In that sense they are equally important for both fields, and philosophy may add a wider and critical perspective to the scientific discussion. And while not every particle physicist may be concerned with the ontological question of whether particles or fields are the more fundamental objects, our research practice is shaped by philosophical concepts. We do, for example, demand that a physical theory can be tested experimentally and thereby falsified, a criterion that has been emphasized by the philosopher Karl Popper already in 1934. The Higgs mechanism can be falsified, because it predicts how Higgs particles are produced and how they can be detected at the Large Hadron Collider.

On the other hand, some philosophers tell us that falsification is strictly speaking not possible: What if a Higgs property does not agree with the standard theory of particle physics? How do we know it is not influenced by some unknown and thus unaccounted factor, like a mysterious blonde walking past the LHC experiments and triggering the Higgs to decay? (This was an actual argument given in the meeting!) Many interesting aspects of falsification have been discussed in the philosophical literature. “Mysterious blonde”-type arguments, however, are philosophical quibbles and irrelevant for scientific practice, and they may contribute to the fact that scientists do not listen to philosophers.

I entirely agree that philosophers have wasted a good deal of energy maintaining that it is impossible to solve Duhemian problems of where to lay the blame for anomalies. They misrepresent the very problem by supposing there is a need to string together a tremendously long conjunction consisting of a hypothesis H and a bunch of auxiliaries Ai which are presumed to entail observation e. But neither scientists nor ordinary people would go about things in this manner. The mere ability to distinguish the effects of different sources suffices to pinpoint blame for an anomaly. For some posts on falsification, see here and here*.

The question of why scientists do not listen to philosophers was also a central theme of the recent inaugural conference of the German Society for Philosophy of Science. I attended the conference to present some of the results of our interdisciplinary research group on the philosophy of the Higgs. I found the meeting very exciting and enjoyable, but was also surprised by the amount of critical self-reflection.

In the opening talk Peter Godfrey-Smith from the City University of New York emphasized three roles for philosophy: an integrative role, whereby philosophy can assess and connect various fields with an emphasis on generic categories and perspectives; an incubator role, where philosophy develops new ideas in a broad and speculative form, which are then pursued in a more focussed and specific way within an individual science; and an educative role, where philosophy teaches various general skills, including critical and abstract thinking. The problem I see with the integrative and incubator roles of philosophy is the high degree of specialization in modern science. It is very hard for a philosopher to keep up with scientific progress, and how could one integrate various fields without having fully appreciated the essential features of the individual sciences? As Margaret Morrison from the University of Toronto pointed out in her talk, if philosophy steps back too far from the individual sciences, the account becomes too general and isolated from scientific practice. On the other hand, if philosophy is too close to an individual science, it may not be philosophy any longer.

I think philosophy of science should not consider itself primarily as a service to science, but rather identify and answer questions within its own domain. I certainly would not be concerned if my own research went unnoticed by biologists, chemists, or philosophers, as long as it advances particle physics. On the other hand, as Morrison pointed out, science does generate its own philosophical problems, and philosophy may provide some kind of broader perspective for understanding those problems.

So then, should we physicists listen to philosophers?

An emphatic “No!”, if philosophers want to impose their preconceptions of how science should be done. I do not subscribe to Feyerabend’s provocative claim that “anything goes” in science, but I believe that many things go, and certainly many things should be tried.

But then, “Yes!”, we should listen, as philosophy can provide a critical assessment of our methods, in particular if we consider physics to be more than predicting numbers and collecting data, but rather an attempt to understand and explain the world. And even if philosophy might be of no direct help to science, it may be of help to scientists through its educational role, and sharpen our awareness of conceptional problems in our research**.

What I want to talk about are the roles of philosophers of science. While I do not disagree with the roles Godfrey-Smith allots philosophers of science, to incubate, integrate, and educate (about things like logic and critical thinking), and his list would not preclude what I have in mind, I would press to go much further. To focus just on one of my own areas of interest, there is enormous unclarity in discussions by statistical practitioners regarding such philosophical notions as objectivity, truth, falsifiability, evidence, inductive inference, and the roles of probability in modeling and inference. It is as if a certain trepidation and groupthink take over when it comes to philosophically tinged notions, and philosophers are rarely consulted to lend insight.  When they are, I’m afraid, they do not escape the criticism Stephen Weinberg raises in the linked Godfrey-Smith article (i.e., being wedded to a position that grows out of “theory-laden” philosophy, where the theories are philosophical.) Fresh methodological problems arise in practice, but philosophers of science are not consulted. Nor is it surprising. Peter Achinstein[2] has often said that scientists do not and should not consult philosophical accounts about evidence,because while scientists evaluate evidence empirically, philosophical accounts are merely based on a priori computations. Sad, if still true.

By and large, philosophers of science have reneged on the promise of the 80s to be relevant to science. In some areas, in particular the one I know best, philosophers of science have gone backwards. Philosophers of statistics were ahead of their time in the 70s and early 80s, engaging in discussions side by side with statistical practitioners (Godambe and Sprott 1971, Harper and Hooker, 1977 come to mind.) Contributions to the field were as likely to be by a philosopher as by a statistician. I talk about this much more elsewhere (e.g., the introduction to Mayo and Spanos, Error and Inference (CUP 2010), so I’m being quick here. Soon after I got my Ph.D, things seemed to dissipate…

Nowadays, while the foundations of statistics are being considered anew by many statisticians, philosophers of statistics are almost nowhere to be found.  Arguments given for some very popular slogans (mostly by non-philosophers), are too readily taken on faith as canon by others, and are repeated as gospel. Examples are easily found: all models are false, no models are falsifiable, everything is subjective, or equally subjective and objective, and the only properly epistemological use of probability is to supply posterior probabilities for quantifying actual or rational degrees of belief. Then there is the cluster of “howlers” allegedly committed by frequentist error statistical methods repeated verbatim (discussed on this blog). Margaret Morrison is right that many ask: is truly relevant philosophy really philosophy? I and a few others[3] think the answer is Yes! I have organized conferences[4] and published papers that address these issues, and it is the focus of a current book, nearing completion.

Even in the Higgs example, recall the controversy about whether particle physicists were misinterpreting their p-values; the letter-writing campaign by subjective Bayesians, etc. [5]. There is a valid question as to whether it is the philosopher of X’s responsibility to solve philosophical problems in domain X; and the answer will surely depend on the field. But in statistical science—itself sometimes regarded as “applied philosophy of science,” –I say the answer is, emphatically, yes! Their failure to do so has left them out of one of the most interesting periods in the areas of statistical science as well as machine learning.

*For a unit on Popper that includes Duhem’s problem and falsification, see http://errorstatistics.com/2012/02/01/no-pain-philosophy-skepticism-rationality-popper-and-all-that-part-2-duhems-problem-methodological-falsification/

*Michael Krämer is a theoretical particle physicist at the RWTH Aachen University and likes philosophy. Follow him on Twitter at @mikraemer


[1] The article’s subtitle is: “Particle physicist Michael Krämer hangs out with philosophers and learns that one should be wary of irrelevant blondes” (whatever that means).

[2] See, for example, p. 2 of the intro to E&I.

[3] Clark Glymour, Jim Woodward come to mind. They, as well as Godfrey-Smith, will be at our O&M conference at Virginia Tech next weekend!

[4] E.g.,Statistical Science and Philosophy of Science: Where Do/Should They Meet? For selected contributions and related papers see http://www.rmm-journal.de/htdocs/st01.html. Several of these papers have been discussed in “U-Phils” on this blog. Search for the author or title.

[5] O’Hagan: digest of  responses, discussed on this blog here and here.

  • Achinstein, P. (2001), The Book of Evidence, Oxford: Oxford University Press.
  • Cox and Mayo 2010.
  • Godambe, V.  and Sprott, D., (eds), (1971). Foundations of Statistical Inference, Holt, Rinehart and Winston of Canada, Toronto, 1971.
  • Harper, W. L. and Hooker C. A. (eds.) (1976): Foundations of Probability Theory, Statistical Inference and Statistical Theories of Science. Vol. 2, Dordrecht, The Netherlands: D. Reidel
  • Mayo, D. G. and Spanos, A. (2010). “Introduction and Background: Part I: Central Goals, Themes, and Questions; Part II The Error-Statistical Philosophy” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 1-14, 15-27.
Categories: Higgs, Statistics, StatSci meets PhilSci | 88 Comments

Getting Credit (or blame) for Something You Didn’t Do (BP oil spill, comedy hour)

UnknownThree years ago, many of us were glued to the “spill cam” showing, in real time, the gushing oil from the April 20, 2010 explosion sinking the Deepwater Horizon oil rig in the Gulf of Mexico, killing 11, and spewing oil until July 15. Trials have been taking place this month, as people try to meet the 3 year deadline to sue BP and others. But what happened to the 200 million gallons of oil?  (Is anyone up to date on this?)  Has it vanished or just sunk to the bottom of the sea by dispersants which may have caused hidden destruction of sea life? I don’t know, but given it’s Saturday night around the 3 year anniversary, let’s listen into a reblog of a spill-related variation on the second of two original “overheard at the comedy hour” jokes. 

In effect, it accuses the frequentist error-statistical account of licensing the following (make-believe) argument after the 2010 oil spill:

Oil Exec: We had highly reliable evidence that H: the pressure was at normal levels on April 20, 2010!

Senator: But you conceded that whenever your measuring tool showed dangerous or ambiguous readings, you continually lowered the pressure, and that the stringent “cement bond log” test was entirely skipped.

 Oil Exec:  Granted, we omitted reliable checks on April 20, 2010, but usually we do a better job—I am reporting the average!  You see, we use a randomizer that most of the time directs us to run the gold-standard check on pressure. But, but April  20 just happened to be one of those times we did the nonstringent test; but on average we do ok.

Senator:  But you don’t know that your system would have passed the more stringent test you didn’t perform!

Oil Exec:  That’s the beauty of the the frequentist test!

Even if we grant (for the sake of the joke) that overall, this “test” rarely errs in the report it outputs (pass or fail),  that is irrelevant to appraising the inference from the data on April 20, 2010 (which would have differed had the more stringent test been run). That interpretation violates the severity criterion:  the observed passing result was altogether common if generated from a source where the pressure level was unacceptably high, Therefore it misinterprets the actual data. The question is why anyone would saddle the frequentist with such shenanigans on averages?  … Lest anyone think I am inventing a criticism, here is a familiar statistical instantiation, where the choice for each experiment is given to be .5 (Cox 1958).

Two Measuring Instruments with Different Precisions:

 A single observation X is to be made on a normally distributed random variable with unknown mean m, but the measurement instrument is chosen by a coin flip: with heads we use instrument E’ with a known small variance, say 10-4, while with tails, we use E”, with a known large variance, say 104. The full data indicates whether E’ or E” was performed, and the particular value observed, which we can write as x’ and x”, respectively. (This example comes up in, ton o’bricks).

In applying our test T+ (see November 2011 blog post ) to a null hypothesis, say, µ = 0, the “same” value of X would correspond to a much smaller p-value were it to have come from E’ than if it had come from E”.  Denote the two p-values as p’ and p”, respectively.  However, or so the criticism proceeds, the error statistician would report the average p-value:  .5(p’ + p”).

But this would give a misleading assessment of the precision and corresponding severity with either measurement! Instead you should report the p-value of the result in the experiment actually run (this is Cox’s Weak Conditionality Principle, WCP).

But what could lead the critic to suppose the error statistician must average over experiments not even performed?  Rule #2 for legitimate criticism is to give the position being criticized the most generous construal one can think of.  Perhaps the critic supposes what is actually a distortion of even the most radical behavioristic construal:

  •   If you consider outcomes that could have occurred in hypothetical repetitions of this experiment, you must also consider other experiments you did not run (but could have been run) in reasoning from the data observed (from the test you actually ran), and report some kind of frequentist average!

The severity requirement makes explicit that such a construal is to be rejected—I would have thought it obvious, and not in need of identifying a special principle. Since it wasn’t, I articulated this special notion for interpreting tests and the corresponding severity criterion.

Let me now give a special (the first!) honorary mention to Christian Robert [2] on this point, as raised in Cox and Mayo (2010).  He writes p. 9 http://arxiv.org/abs/1111.5827:

A compelling section is the one about the weak conditionality principle (pp.294- 298), as it objects to the usual statement that a frequency approach breaks this principle. In a mixture experiment about the same parameter θ, inferences made conditional on the experiment “are appropriately drawn in terms of the sampling behaviour in the experiment known to have been performed” (p. 296). This seems hardly objectionable, as stated. And I must confess the sin of stating the opposite as The Bayesian Choice has this remark (Robert (2007), Example 1.3.7, p.18) that the classical confidence interval averages over the experiments. The term experiment validates the above conditioning in that several experiments could be used to measure θ, each with a different p-value. I will not argue with this.

He would want me to mention that he does raise some caveats:

I could, however, [argue] about ‘conditioning is warranted to achieve objective frequentist goals’ (p. 298) in that the choice of the conditioning, among other things, weakens the objectivity of the analysis. In a sense the above pirouette out of the conditioning principle paradox suffers from the same weakness, namely that when two distributions characterise the same data (the mixture and the conditional distributions), there is a choice to be made between “good” and “bad”.

But there is nothing arbitrary about regarding as “good” the only experiment actually run and from which the actual data arose.  The severity criterion only makes explicit what is/should be already obvious. Objectivity, for us, is directed by the goal of making correct and warranted inferences, not freedom from thinking. After all, any time an experiment E is performed, the critic could insist that the decision to perform E is the result of some chance circumstances and with some probability we might have felt differently that day and have run some other test, perhaps a highly imprecise test or a much more precise test or anything in between, and demand that we report whatever average properties they come up with.  The error statistician can only shake her head in wonder that this gambit is at the heart of criticisms of frequentist tests.

Still, we exiled ones can’t be too fussy, and Robert still gets the mention for conceding that we have  a solid leg on which to pirouette.


[1] You can search the blog for connections between this event, the June 2010 conference at the LSE (especially the RMM volume), my introduction to deepwater drilling, and the blog’s “mascot” stock, Diamond offshore, DO, which, incidentally, just had earnings.

[2] There have been around 4-5 others since then, not sure.


Categories: Bayesian/frequentist, Comedy, Statistics | 2 Comments

Blog Contents 2013 (March)

metablog old fashion typewriterError Statistics Philosophy Blog: March 2013* (Frequentists in Exile-the blog)**:

(3/1) capitalizing on chance
(3/4) Big Data or Pig Data?
(3/7) Stephen Senn: Casting Stones
(3/10) Blog Contents 2013 (Jan & Feb)
(3/11) S. Stanley Young: Scientific Integrity and Transparency
(3/13) Risk-Based Security: Knives and Axes
(3/15) Normal Deviate: Double Misunderstandings About p-values
(3/17) Update on Higgs data analysis: statistical flukes (1)
(3/21) Telling the public why the Higgs particle matters
(3/23) Is NASA suspending public education and outreach?
(3/27) Higgs analysis and statistical flukes (part 2)
(3/31) possible progress on the comedy hour circuit?

*March was incredibly busy here; I’m saving up several partially-baked posts on draft. Also, while I love this old typewriter, I’ve had to have special keys made for common statistical symbols, and that has delayed me some. I hope people will scan the previous contents starting from the beginning (e.g., with “prionvac“): it’s philosophy, remember, and philosophy has to be reread many times over.  January and February 2013 contents are here.

**compiled by Jean Miller and Nicole Jinn.

Categories: Metablog, Statistics | Leave a comment

PhilStock: Applectomy? (rejected post)

apple-chart-660x196Apple (AAPL) stock  is a perfect example of how psychology, fear and superstition enter into stock prices as much as do measures of valuation. Any predictions for this afternoon’s earnings? In general, here’s a field where regardless of what happens, “experts” never have to say they were wrong–especially about Tech. So, certainly we don’t. Thus, a wild guess–AAPL (currently down 300 points over its high)  goes up with earnings, but not massively (~5-10pts). Still, there’s such a fear of its being “RIMMED” (i.e., dramatically losing its status as top tech, as did Research in Motion), that it may be beaten down some more.

(To be placed in rejected posts blog)

Categories: Rejected Posts | 6 Comments

Majority say no to inflight cell phone use, knives, toy bats, bow and arrows, according to survey

headlesstsaThe Transportation Security Authority (TSA) has just announced it is backing off its decision to permit, beginning Thursday, 25 April, pocket knives, toy bats, golf clubs (limit 2), lacrosse sticks, billiard cues, ski poles, fishing reels, and other assorted sports equipment, at least for the time being. See my post on “risk based security” Apparently, Pistole (TSA chief) could not entirely ignore the vociferous objections of numerous stakeholders, whom he had not even bothered to consult,  after all. Recall that the former TSA chief, Hawley, had actually wanted to go further, saying

 “They ought to let everything on that is sharp and pointy. Battle axes, machetes … you will not be able to take over the plane. It is as simple as that,” he said. (Link is here.)

I don’t have a strong feeling about blades, but I am very much in sync with the survey that influenced Pistole’s about face as regards cell phones (against) and liquids in carry-ons (for).

Vast majority of Americans say no to cell phone use and pocket knives inflight according to new survey

In a new, nationwide survey, Travel Leaders Group asked Americans across the country if they are in favor of the change and 73% of those polled do not want pocket knives allowed in airplane cabins. Also, a vast majority (nearly 80%) indicate they do not want fellow airline passengers to have the ability to make cell phone calls inflight. The survey includes responses from 1,788 consumers throughout the United States and was conducted by Travel Leaders Group – an $18 billion powerhouse in the travel industry – from March 15 to April 8, 2013.

“The results are very clear. Most Americans would prefer the status quo with regard to cell phone use inflight. Because so many planes are flying at near capacity and many passengers already feel a lack of personal space within the airplane cabin, it’s understandable that they want to continue to have some amount of peace and quiet whether they are on a short commuter flight or a flight that lasts several hours,” stated Travel Leaders Group CEO Barry Liben.

I’m really heartened to see that people are flouting the knee-jerk expectation that they’d want as much high tech as possible, and are weighing in against cell phones on planes. Recall my post on cell phones (now in rejected posts). Here are some of the statistics from the survey:

When asked, “Are you in favor of this change or against it?” 73% of those polled said they are not in favor of allowing pocket knives on planes.

I’m OK with it.

23.6%

I’m OK with everything except   pocket knives.

18.2%

I don’t think these items   should be allowed.

54.8%

I don’t know.

3.5%


Cell Phone Use Inflight

Studies are underway to determine if full cell phone use is safe while inflight and a decision on whether to allow such use (not just “airplane mode”) is expected this summer.  In Travel Leaders Group’s survey, nearly 80% of those polled are against allowing passengers to make cell phone calls during flight.  Here are the detailed responses:

234-young-man-with-cell-phone

I am opposed to it.

47.9%

I am in favor as long as it   is not used for conversations.

31.3%

I am in favor of it.

10.7%

I don’t know.

10.1%

Additional Statistics and Findings:

  • Eliminate One TSA Security Measure: With regard to TSA security screening at the airport, when asked, “Which of the following TSA security measures would you most like to eliminate?” the top responses were: “removing of shoes” (27.9%), “limits on liquids in carry-on baggage” (24.1%), and “none, do not eliminate any security measures” (19.8%).

  • Airport Security Satisfaction: When asked, “What is your level of satisfaction with airport security today?” 82.0% indicate they are satisfied or neutral with today’s security measures (62.2% indicate they are “satisfied,”19.8% are “neither satisfied nor unsatisfied” and 18.0% are “unsatisfied”).

  • Coach Class Flyers: When asked, “Do you ever fly in Coach Class?” over 94% of those polled said “Yes.” And of those who indicate they fly in Coach Class, when asked what makes flying in Coach most uncomfortable, the top responses were: “Lack of leg room” (49.5%); “seat size” (17.2%) and “pitch of the seat – person in front of me reclines too much” (15.0%).

  • This is the fifth consecutive year for this travel survey.  American consumers were engaged predominantly through social media channels such as Facebook and Twitter, as well as through direct contact with travel clients for the following Travel Leaders Group companies: Nexion, Results! Travel, Travel Leaders, Tzell Travel Group and Vacation.com.  (www.travelleadersgroup.com)

 So a tiny bit of good news among the forced air traffic control reductions and FAA cuts that began yesterday: See

http://rejectedpostsofdmayo.com/2013/04/22/msc-kvetch-air-traffic-control-cuts/

Categories: Uncategorized | 6 Comments

Stephen Senn: When relevance is irrelevant

Stephen Senn(guest post) When Relevance is Irrelevant, by Stephen Senn

Head of Competence Center for Methodology and Statistics (CCMS)

Applied statisticians tend to perform analyses on additive scales and additivity is an important aspect of an analysis to try to check. Consider survival analysis. The most important model used, the default in many cases, is the proportional hazards model introduced by David Cox in 1972[1] and sometimes referred to as Cox regression. In fact, from one point of view, analysis takes place on the log-hazard scale and so the model could equally be referred to by the rather clumsier title additive log-hazards model and there is quite a literature on how the proportionality (or equivalently, additivity) assumption can be checked.

Words have a definite power on the mind and you sometimes encounter the nonsensical claim that if the proportionality assumption does not apply you should consider a log-rank test instead. In fact, when testing the null hypothesis that two treatments are identical, neither the log-rank test nor the score test using the proportional hazards model require the assumption of proportionality: the assumption is trivially satisfied by the fact of two treatments being identical. Furthermore the log-rank test is just a special case of proportional hazards: the score test for a proportional hazards model without any covariates is the log-rank test. Finally, it is easy to produce examples where proportional hazards would apply in a model with covariates but not in the model without covariates but very difficult to produce the converse.

An objection often made regarding such models is that they are very difficult for physicians to understand. My reply is to ask what is preferable: a difficult truth or an easy lie? Ah yes, it is sometimes countered, but surely I agree on the importance of clinical relevance. It is surely far more useful to express the results of a proportional hazards analysis in clinically relevant terms that can be understood, such as difference in median length of survival or the difference in the event rate up to a particular census point (say one year after treatment).

A disturbing paper by Snapinn and Jiang[2] points to a problem, however, and to explain it I can do no better that cite the abstract:

The standard analysis of a time-to-event variable often involves the calculation of a hazard ratio based on a survival model such as Cox regression; however, many people consider such relative measures of effect to be poor expressions of clinical meaningfulness. Two absolute measures of effect are often used to assess clinical meaningfulness: (1) many disease areas frequently use the absolute difference in event rates (or its inverse, the number-needed-to-treat) and (2) oncology frequently uses the difference between the median survival times in the two groups. While both of these measures appear reasonable, they directly contradict each other. This paper describes the basic mathematics leading to the two measures and shows examples. The contradiction described here raises questions about the concept of clinical meaningfulness. (p2341)

To see the problem, consider the following. The more serious the disease, the less a given difference in the rate at which people die will impact on the time survived and hence on differences in median survival. However, generally, the higher the baseline mortality rate the greater the difference in survival at a given time point that will be conveyed by a given treatment benefit.

If you find this less than clear, you have my sympathy. The only solution I can offer is to suggest that you read the paper by Snappin and Jiang[2]. However, in that case also consider the following point. If the point is so subtle, how many physicians who cannot understand proportional hazards can understand numbers needed to treat or differences in median survival? My opinion is that they can be counted on the fingers of one foot.

Let me explain the point at issue by analogy. If one were to study road traffic accidents one would find that among the very many factors affecting seriousness of the consequences of an accident would be the relative velocity at impact. However, if one looks at Newton’s laws of motion one finds that the second law speaks of the relationship between force, mass, and acceleration but not velocity. Now it is clear that a) acceleration being a concept that is defined (or at least understood) in terms of velocity (it is, indeed, derivative of velocity) it is a more complicated concept than velocity b) all cars have speedometers that show velocity but not acceleration c) the traffic laws are couched in terms of velocity rather than acceleration and d) it seems that from the point of view of human health it is velocity that is important.

None of this remotely constitutes an argument for replacing Newton’s second law. On the contrary what it implies is that you might need to work a little to use Newton’s laws to translate the effect of relative velocity into accident survivability. However, any attempt to simplify will run the danger of being an oversimplification.

This point was very well understood by a somewhat neglected scientist, the centenary of whose death falls this year:  James Berry( 1852-1913) an English executioner or hangman but one who recognised the value of physics. Ronald Meek, the Marxist economist whose work I was expected to study when a student of Economics and Statistics in the 1970s, devotes a chapter of his entertaining book, Figuring out Society[3], to Mr Berry. Berry decided that the length of the drop in an execution required scientific study: too short and death was not instantaneous, too long and decapitation ensued. He soon realised that a simple law of hanging, linear in the height of the drop, was wrong and instead came upon the idea of a ‘striking force’. This enabled him to hang criminals along a curve. (I understand that in American universities professors also sometimes execute judgement along a curve.)   The striking force required was adjusted according to the weight and neck musculature of the condemned and the height was then determined from the curve.

Many of the current proponents of evidence based medicine could learn from Berry’s example. NNTs derived from clinical trials are misleading indicators as to what will happen in clinical practice. For that to be the case would require that the patients in the clinical trial we run were a representative sample of the population of patients. They are not, and if they were the fact that we set such store on concurrent control would be inexplicable. To translate the results of clinical trials into practice may require a lot of work involving modelling and further background information. ‘Additive at the point of analysis but relevant at the point of application’ should be the motto.  Sometimes short cuts lead to long delays.

References

1.              Cox DR. Regression models and life-tables (with discussion). Journal of the Royal Statistical Society Series B 1972; 34: 187-220.

2.              Snapinn S, Jiang Q. On the clinical meaningfulness of a treatment’s effect on a time-to-event variable. Statistics in Medicine 2011; 30: 2341-2348.

3.              Meek RL. Figuring out society. Fontana, 1971.

Categories: Statistics | 9 Comments

Does statistics have an ontology? Does it need one? (draft 2)

questionmark pinkChance, rational beliefs, decision, uncertainty, probability, error probabilities, truth, random sampling, resampling, opinion, expectations. These are some of the concepts we bandy about by giving various interpretations to mathematical statistics, to statistical theory, and to probabilistic models. But are they real? The question of “ontology” asks about such things, and given the “Ontology and Methodology” conference here at Virginia Tech (May 4, 5), I’d like to get your thoughts (for possible inclusion in a Mayo-Spanos presentation).*  Also, please consider attending**.

Interestingly, I noticed the posts that have garnered the most comments have touched on philosophical questions of the nature of entities and processes behind statistical idealizations (e.g.,http://errorstatistics.com/2012/10/18/query/).copy-cropped-ampersand-logo-blog1

1. When an interpretation is supplied for a formal statistical account, its theorems may well turn out to express approximately true claims, and the interpretation may be deemed useful, but this does not mean the concepts give correct descriptions of reality. The interpreted axioms, and inference principles, are chosen to reflect a given philosophy, or set of intended aims: roughly, to use probabilistic ideas (i) to control error probabilities of methods (Neyman-Pearson, Fisher), or (ii) to assign and update degrees of belief, actual or rational (Bayesian).  But this does not mean its adherents have to take seriously the realism of all the concepts generated. In fact ,we often (on this blog) see supporters of various stripes of frequentist and Bayesian accounts running far away from taking their accounts literally, even as those interpretations are, or at least were, the basis and motivation for the development of the formal edifice (“we never meant this literally”).  But are these caveats on the same order? Or do some threaten the entire edifice of the account?

Starting with the error statistical account, recall Egon Pearson in his “Statistical Concepts in Their Relation to Reality” making it clear to Fisher that the business of controlling erroneous actions in the long run, acceptance sampling in industry and 5-year plans, only arose with Wald, and were never really part of the original Neyman-Pearson tests (declaring that the behaviorist philosophy was Neyman’s, not his).  The paper itself may be found here. I was interested to hear (Mayo 2005)  Neyman’s arch opponent, Bruno de Finetti, remark (quite correctly) that the expression “inductive behavior…that was for Neyman simply a slogan underlining and explaining the difference between his, the Bayesian and the Fisherian formulations” became with Abraham Wald’s work, “something much more substantial” (de Finetti 1972, 176).

Granted, it has not been obvious to people just how to interpret N-P tests “evidentially “ or “inferentially”—the subject of my work over many years. But there always seemed to me to be enough hints and examples to see what was intended: A statistical hypothesis H assigns probabilities to possible outcomes, and the warrant for accepting H as adequate—for an error statistician– is in terms of how well corroborated H is: how well H has stood up to tests that would have detected flaws in H, at least with very high probability. So the grounds for holding or using H are error statistical. The control and assessment of error probabilities may be used inferentially to determine the capabilities of methods to detect the adequacy/inadequacy of models, and express the extent of the discrepancies that have been identified. We also employ these ideas to detect gambits that make it too easy to find evidence for claims, even if the claims have been subjected to weak tests and biased procedures. A recent post is here.

The account has never professed to supply a unified logic, or any kind of logic for inference. The idea that there was a single rational way to make inferences was ridiculed by Neyman (whose birthday is April 16).

2. Proposed (“we never meant this literally”) withdrawals  from the Bayesian interpretations do not seem so innocuous. Perhaps some will say this just shows my bias. Let me grant that the popular idea of interpreting prior probability distributions as non-subjective, in some sense or other, is not so radical (though I’d still want to know how to interpret posteriors and why). But what we usually see now is some blurring of the two: touting the advantage of Bayesian methods because they incorporate background beliefs, while also advertising “conventional” (default, reference, or “objective”) priors as having minimal influence on inference. [1] See “Grace and amen Bayesianism within this deconstruction. Also relevant: Irony and Bad Faith: Deconstructing Bayesians.

Perhaps the most popular view nowadays regards the prior as some kind of uninterpreted mathematical construct, merely serving to get a posterior. These same Bayesians, some of them, advocate “testing” the prior, but this is hard to grasp if we do not know what the priors intend to be, or stand for.  Then there are those Bayesians, perhaps they are a radical (but influential) subgroup, who deny the machine of updating by Bayes theorem altogether.  In Gelman (2011) (our special topic of RMM):

“Our key departure from the mainstream Bayesian view (as expressed, for example, [in Wikipedia]) is that we do not attempt to assign posterior probabilities to models or to select or average over them using posterior probabilities. Instead, we use predictive checks to compare models to data and use the information thus learned about anomalies to motivate model improvements.” (p. 71).

In Gelman and Robert (2013), we hear that a major source of Bayesian criticism comes from assuming “that Bayesians actually seem to believe their assumptions rather than merely treating them as counters in a mathematical game.” (p. 3) This comes as a surprise to those of us who thought the Bayesians really meant it. So what is the game being played?

[W]e make strong assumptions and use subjective knowledge in order to make inferences and predictions that can be tested by comparing to observed and new data (see Gelman and Shalizi, 2012, or Mayo, 1996 for a similar attitude coming from a non-Bayesian direction). (p. 3)

So maybe some kind of a “non-Bayesian checking of Bayesian models” would offer more a more promising foundation, at least for Gelman’s brand of “Bayesian falsificationism” (Gelman 2011). See my 2013 Comments on Gelman and Shalizi [2]. On the face of it, any inference, whether to the adequacy of a model (for a given purpose), or to a posterior probability, can be said to be warranted just to the extent that the inference has withstood severe testing: one with a high probability of having found flaws were they present.  The ontology matters less than the epistemology.

Thus, the severity idea, could conceivably illuminate what’s going on with Gelman’s model checking; I find the idea promising, but do not really know what he thinks.

But to pursue such an avenue still requires reckoning with a fundamental issue at the foundations of Bayesian method: the interpretation of and justification for the prior probability distribution. Error statisticians use idealizations, but they are tightly constrained by the need for error probabilities, in a statistical model, to approximate the actual ones, even if only hypothetical, or checked by simulation. We are modeling real processes, not knowledge of processes.

Gelman and Robert (2013) allow:

“that many Bayesians over the years have muddied the waters by describing parameters as random rather than fixed. Once again, for Bayesians as much as for any other statistician, parameters are (typically) fixed but unknown. It is the knowledge about these unknowns that Bayesians model as random” (p. 4).

Bayesians will …assign a probability distribution to a parameter that one could not possibly imagine to have been generated by a random process, parameters such as the coefficient of party identification in a regression on vote choice, or the overdispersion in a network model, or Hubble’s constant in cosmology. There is no inconsistency in this opposition once one realizes that priors are not reflections of a hidden “truth” but rather evaluations of the modeler’s uncertainty about the parameter. (p. 3)

The choice, of course, is not between modeling a “hidden ‘truth’” and modeling “the modeler’s uncertainty”. Actually, in the majority of the examples I have seen, it seems better to imagine the parameter being generated by a random process.  On the other hand, “the modeler’s uncertainty about the parameter” is one of the most unclear parts of Bayesian modeling. It is not that we can’t see measuring the degree of evidence, corroboration, severity of test, or the like, that is accorded a claim about a fixed parameter.  We can and do. It is just that those measures will not be well represented as posterior or prior probabilities, obeying the probability calculus.

Possibly an idea I once proposed–a variation on a view held by the frequentist Reichenbach– can work (in EGEK, ch. 4 1996). Reichenbach suggested that scientists might eventually be able to assess the relative frequency with which a given type of hypothesis or theory is true. This might provide it a frequentist probability assignment. I don’t see how one could get such a relative frequency (or rather I can see many different reference sets that could be used), nor why knowing such quantities would be useful in appraising the evidence for a given hypothesis H. My variation (Chapter 4 Duhem, Kuhn, and Bayes, pp 120-4) is to consider the relative frequency with which evidence of a certain strength, (e.g., passing k tests with increasingly impressive error probabilities)  is generated, despite H being false. This is attainable. But that of course take us to an error probabilistic assessment!

Maybe this style of Bayesianism doesn’t need a clear ontology so long as it’s got a clear epistemology. But does it?***

What do readers think?

*To see the full list of speakers: “Ontology and Methodology” conference. Actually our presentation will likely take a different tack, but I still want to hear your thoughts.

**Registration is free, but required, by April 20-25.

***I should say right off (for those who do not know) that my work is not in metaphysics, but on philosophical problems about inductive-statisical inference , experiment and evidence.My colleague (and co-conference organizer) Ben Jantzen is the “ontology” guy, and the third colleague involved, Lydia Patton, does O & M as well as HPS.

For further references, see those within posts and papers linked here, or search this blog.

De Finetti, B. (1972), Probability, Induction, and Statistics: The Art of Guessing. NY, Wiley.

Gelman, A. (2011). Induction and deduction in Bayesian data analysisRationality, Markets and Morals (RMM) 2, 67–78.

Gelman, A.and C. Shalizi. (Article first published online: 24 Feb 2012). “Philosophy and the Practice of Bayesian statistics (with discussion)”.British Journal of Mathematical and Statistical Psychology (BJMSP).

Gelman, A, and Robert, C. (2013). Not only defended but also applied: The perceived absurdity of Bayesian inference.

http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf

Kass and Wasserman, L. (1996). The Selection of Prior Distributions by Formal Rules. Journal of the American Statistical Association 91, 1343-1370.

Mayo, D. G. (1996).[EGEK] Error and the growth of experimental knowledge. Chicago: University of Chicago Press.

_____ (2005). Evidence as passing severe tests: Highly probable vs. highly probed hypotheses. In P. Achinstein (Ed.), Scientific Evidence (pp. 95-127). Baltimore: Johns Hopkins University Press.

_____ (2011). Statistical science and philosophy of science: where do/should they meet in 2011 (and beyond)?Rationality, Markets and Morals (RMM) 2, Special Topic: Statistical Science and Philosophy of Science, 79–102.

_____ (2013). Comments on A. Gelman and C. Shalizi: Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, forthcoming.

Mayo, D. and Cox, D. (2010). Frequentist statistics as a theory of inductive inference. In D. Mayo and A. Spanos (Eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (pp. 247-275). Cambridge: Cambridge University Press. This paper appeared in The Second Erich L. Lehmann Symposium: Optimality, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics, 247-275.

Mayo, D. and Spanos, A. (2011). Error statistics. In P. Bandyopadhyay and M. Forster (Volume Eds.); D. M.Gabbay, P. Thagard and J. Woods (General Eds.). Philosophy of statistics: Handbook of philosophy of science Vol 7 (pp. 1-46). The Netherlands: Elsevier.

Pearson, E. S. (1955). Statistical concepts in their relation to reality.  Journal of the Royal Statistical SocietyB 17, 204-207.

Senn, S. (2011). You may believe you are a Bayesian but you are probably wrong. Rationality, Markets and Morals (RMM) 2, Special Topic: Statistical Science and Philosophy of Science, 48-66.


[1] For a thorough account of problems with the latter, see Kass and Wasserman (1996).

[2] I take Gelman-Shalizi (2012) to be an attempt at a meeting of the minds between Bayesian Gelman and error statistical Shalizi. I may be wrong.

Categories: Bayesian/frequentist, Error Statistics, Statistics | 59 Comments

O & M Conference (upcoming) and a bit more on triggering from a participant…..

copy-cropped-ampersand-logo-blog1I notice that one of the contributed speakers, Koray Karaca*, at the upcoming Ontology and Methodology Conference at Virginia Tech (May 4-5) focuses his paper on triggering!  I entirely agree with the emphasis on the need to distinguish different questions at multiple stages of an inquiry or research endeavor from the design, collection and modeling of data to a series of hypotheses, questions, problems, and threats of error.  I do note a couple of queries below that I hope will be discussed at some point. Here’s part of his abstract…which may be found on the just created O & M Conference Blog (link is also at the O&M page on this blog). Recent posts on the Higgs data analysis are herehere, and here  Kent Staley had a recent post on the Higgs as well. (For earlier Higgs discussions search this blog.)

Koray Karaca
The method of robustness analysis and the problem of data-selection at the ATLAS experiment

In the first part, I characterize and distinguish between two problems of “methodological justification” that arise in the context of scientific experimentation. What I shall call the “problem of validation” concerns the accuracy and reliability of experimental procedures through which a particular set of experimental data is first acquired and later transformed into an experimental result. Therefore, the problem of validation can be phrased as follows: how to justify that a particular set of data as well as the procedures that transform it into an experimental result are accurate and reliable, so that the experimental result obtained at the end of the experiment can be taken as valid.  On the other hand, what I shall call the “problem of exploration” is concerned with the methodological question of whether an experiment is able, either or both, (1) to provide a genuine test of the conclusions of a scientific theory or hypothesis if the theory in question has not been previously (experimentally) tested, or to provide a novel test if the theory or hypothesis in question has already been tested, and (2) to discover completely novel phenomena; i.e., phenomena which have not been predicted by present theories and detected in previous theories. Even though the problem of validation and the ways it is dealt with in scientific practice has been thoroughly discussed in the literature of scientific experimentation, the significance of the problem of exploration has not yet been fully appreciated. In this work, I shall address this problem and examine the way it is handled in the present-day high collision-rate particle physics experiments. To this end, I shall consider the ATLAS experiment, which is one of the Large Hadron Collider (LHC) experiments currently running at CERN. …What are called “interesting events” are those collision events that are taken to serve to test the as-yet-untested predictions of the Standard Model of particle physics (SM) and its possible extensions, as well as to discover completely novel phenomena not predicted before by any theories or theoretical models.

To read the rest of the abstract, go to our just-made-public O & M conference blog.

First let me say that I’m delighted this case will be discussed at the O&M conference, and look forward to doing so. Here are a couple of reflections from the abstract, partly on terminology. First, I find it interesting that he places “tiggering” (what I alluded to in my last post as a behavioristic, pre-data, task) under “exploratory”. He may be focussed more on what occurs (in relation to this one episode anyhow) when data are later used to check for indications of anomalies for the Standard Model Higgs–having been “parked” for later analysis.  I thought the exploratory stage is usually a stage of informal or semi-formal data analysis to find interesting patterns and potential ingredients (variables, functions) for models, model building, and possible theory development.  When Strassler heard there would be “parked data” for probing anomalies, I take it his theories kicked in to program those exotic indicators. Second, it seems to me that philosophers of science and “confirmation theorists” of various sorts, have focussed on when “data,” all neat and tidied up, count as supporting, confirming, falsifying hypotheses and theories.  I wouldn’t have thought the problem of data collection, modeling or justifying data was “thoroughly discussed”–It absolutely should be– just that it seems all-too-rare. I may be wrong (I’d be glad to see references).

*Koray is a postdoctoral research fellow at the University of Wuppertal, and he knows I’m mentioning him here.

Categories: experiment & modeling | 7 Comments

Statistical flukes (3): triggering the switch to throw out 99.99% of the data

Unknown-1This is the last of my 3 parts on “statistical flukes” in the Higgs data analysis. The others are here and here.  Kent Staley had a recent post on the Higgs as well. 

Many preliminary steps in the Higgs data generation and analysis fall under an aim that I call “behavioristic” and performance oriented: the goal being to control error rates on the way toward finding out something else–here, excess events or bumps of interest.

(a) Triggering. First of all, 99.99% of the data must be thrown away!  So there needs to be a trigger to accept or reject” collision data for analysis–whether for immediate processing or for later on, as in so-called “data parking”.

With triggering we are not far off the idea that a result of a “test”, or single piece of data analysis, is to take one “action” or another:

reject the null -> retain the data;

do not reject -> discard the data.

(Here the null might, in effect, hypothesize that the data are not interesting.) It is an automatic classification scheme, given limits of processing and storing; the goal of controlling the rates of retaining uninteresting and discarding potentially interesting data is paramount.[i] It is common for performance oriented tasks to enter, especially in getting the data for analysis, and they too are very much under the error statistical umbrella.

Particle physicist Matt Strassler has excellent discussions of triggering and parking on his blog “Of Particular Significance”. Here’s just one passage:

Data Parking at CMS (and the Delayed Data Stream at ATLAS) takes advantage of the fact that the computing bottleneck for dealing with all this data is not data storage, but data processing. The experiments only have enough computing power to process about 300 – 400 bunch-crossings per second. But at some point the experimenters concluded that they could afford to store more than this, as long as they had time to process it later. That would never happen if the LHC were running continuously, because all the computers needed to process the stored data from the previous year would instead be needed to process the new data from the current year. But the 2013-2014 shutdown of the LHC, for repairs and for upgrading the energy from 8 TeV toward 14 TeV, allows for the following possibility: record and store extra data in 2012, but don’t process it until 2013, when there won’t be additional data coming in. It’s like catching more fish faster than you can possibly clean and cook them — a complete waste of effort — until you realize that summer’s coming to an end, and there’s a huge freezer next door in which you can store the extra fish until winter, when you won’t be fishing and will have time to process them.

(b) Bump indication. Then there are rules for identifying bumps, excesses more than 2 or 3 standard deviations above what is expected or predicted. This may be the typical single significance test serving as more of an indicator rule.  Observed signals are classified as either rejecting, or failing to reject, a null hypothesis of “mere background”; non-null indications are bumps, deemed potentially interesting. Estimates of the magnitude of any departures are reported and graphically displayed. They are not merely searching for discrepancies with the “no Higgs particle” hypothesis, they are looking for discrepancies with the simplest type, the simple Standard Model Higgs. I discussed this in my first flukes post.

How much additional checking of assumptions, data analysis, etc.should be required before reporting a prima facie statistically significant effect? It varies. Knowing the discretionary standard used, we “consumers” can scrutinize them. We must also ask, report to whom? It appears that in experimental particle physics, at least in this case, they will internally report an indication of a statistically significant bump, but they are careful not to leak it even to the full physics community, at least if it seems to indicate some exotic particle at odds with the simple Standard Model Higgs. At least that is my impression from reading some of the literature. Thus far, all of the potentially exciting (anomalous) indications disappear with further data; the very thing that is expected were the anomalous effects mere flukes. This is not a flaw with the “bump indication” rule, at least not in an experimental particle physicist’s context, because it is known that these indications will be checked and cross-checked, that effects will have to stand up to severe scrutiny.

From indication to evidence

This takes us back to where I began (here), evidence of a 5 sigma effect, then, refers to the evidence after scrutiny of bumps will not go away, and to the 2013 data on many more collisions. The particle’s properties with respect to a given type of decay do not change; finding it again and again substantiates a strong “argument from coincidence” of the sort that severity requires.

But that is not all that is inferred, in my view. It is also crucial is to infer, and report on, what has not been well-probed or severely indicated. In particular, here, they have not distinguished various properties of the particle, and have not ruled out alternatives to the Standard Model. Knowing what has been poorly distinguished will surely be the basis for future design recommendations. Progress is not inferring a hypothesis or theory “out there” but the growing understanding of the phenomenon, as modeled, triggered, simulated, and probed.

Categories: Error Statistics | Tags: , | 1 Comment

Who is allowed to cheat? I.J. Good and that after dinner comedy hour….

UnknownIt was from my Virginia Tech colleague I.J. Good (in statistics), who died four years ago (April 5, 2009), at 93, that I learned most of what I call “howlers” on this blog. His favorites were based on the “paradoxes” of stopping rules.

“In conversation I have emphasized to other statisticians, starting in 1950, that, in virtue of the ‘law of the iterated logarithm,’ by optional stopping an arbitrarily high sigmage, and therefore an arbitrarily small tail-area probability, can be attained even when the null hypothesis is true. In other words if a Fisherian is prepared to use optional stopping (which usually he is not) he can be sure of rejecting a true null hypothesis provided that he is prepared to go on sampling for a long time. The way I usually express this ‘paradox’ is that a Fisherian [but not a Bayesian] can cheat by pretending he has a plane to catch like a gambler who leaves the table when he is ahead” (Good 1983, 135) [*]

This paper came from a conference where we both presented, and he was extremely critical of my error statistical defense on this point. (I was a year out of grad school, and he a University Distinguished Professor.) 

One time, years later, after hearing Jack give this howler for the nth time, “a Fisherian [but not a Bayesian] can cheat, etc.,” I was driving him to his office, and suddenly blurted out what I really thought:

“You know Jack, as many times as I have heard you tell this, I’ve always been baffled as to its lesson about who is allowed to cheat. Error statisticians require the overall and not the ‘computed’ significance level be reported. To us, what would be cheating would be reporting the significance level you got after trying and trying again in just the same way as if the test had a fixed sample size. True, we are forced to fret about how stopping rules alter the error probabilities of tests, while the Bayesian is free to ignore them, but why isn’t the real lesson that the Bayesian is allowed to cheat?” (A published version of my remark may be found in EGEK p. 351: “As often as my distinguished colleague presents this point…”)

 To my surprise, or actually shock, after pondering this a bit, Jack said something like, “Hmm, I never thought of it this way.”

images-3By the way, the story of the “after dinner Bayesian comedy hour” on this blog, did not allude to Jack but to someone who gave a much more embellished version. Since it’s Saturday night, let’s once again listen into the comedy hour that unfolded at my dinner table at an academic conference:

 Did you hear the one about the researcher who gets a phone call from the guy analyzing his data? First the guy congratulates him and says, “The results show a Read more »

Categories: Bayesian/frequentist, Comedy, Statistics | Tags: , , | 68 Comments

Guest Post. Kent Staley: On the Five Sigma Standard in Particle Physics

Kent Staley

Kent Staley
Associate Professor
Department of philosophy
Saint Louis University

Regular visitors to Error Statistics Philosophy may recall a discussion that broke out here and on other sites last summer when the CMS and ATLAS collaborations at the Large Hadron Collider announced that they had discovered a new particle in their search for the Higgs boson that had at least some of the properties expected of the Higgs. Both collaborations emphasized that they had results that were significant at the level of “five sigma,” and the press coverage presented this is a requirement in high energy particle physics for claiming a new discovery. Both the use of significance testing and the reliance on the five sigma standard became a matter of debate.

Mayo has already commented on the recent updates to the Higgs search results (here and here); these seem to have further solidified the evidence for a new boson and the identification of that boson with the Higgs of the Standard Model. I have been thinking recently about the five sigma standard of discovery and what we might learn from reflecting on its role in particle physics. (I gave a talk on this at a workshop sponsored by the “Epistemology of the Large Hadron Collider” project at Wuppertal [i], which included both philosophers of science and physicists associated with the ATLAS collaboration.)

Just to refresh our memories, back in July 2012, Tony O’Hagan posted at the ISBA forum (prompted by “a question from Dennis Lindley”) three questions regarding the five-sigma claim:

  1. “Why such an extreme evidence requirement?} We know from a Bayesian perspective that this only makes sense if (a) the existence of the Higgs boson (or some other particle sharing some of its properties) has extremely small prior probability and/or (b) the consequences of erroneously announcing its discovery are dire in the extreme. Neither seems to be the case, so why 5-sigma?
  2. “Rather than ad hoc justification of a p-value, it is of course better to do a proper Bayesian analysis. Are the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?
  3. “We know that given enough data it is nearly always possible for a significance test to reject the null hypothesis at arbitrarily low p-values, simply because the parameter will never be exactly equal to its null value. And apparently the LHC has accumulated a very large quantity of data. So could even this extreme p-value be illusory?”

O’Hagan received a lot of responses to this post, and he very helpfully wrote up and posted a digest of those responses, discussed on this blog here and here. Read more »

Categories: Error Statistics, P-values, Statistics | 26 Comments

Flawed Science and Stapel: Priming for a Backlash?

my 1st fraud kitDeiderik Stapel is back in the news, given the availability of the English translation of the Tilberg (Levelt and Noort Committees) Report as well as his book, Ontsporing (Dutch for “Off the Rails”), where he tries to explain his fraud. An earlier post on him is here. While the disgraced social psychologist was shown to have fabricated the data for something like 50 papers, it seems that some people think he deserves a second chance. A childhood friend, Simon Kuper, in an article “The Sin of Bad Science,” describes a phone conversation with Stapel:

“I’ve lost everything,” the disgraced former psychology professor tells me over the phone from the Netherlands. He is almost bankrupt. … He has tarnished his own discipline of social psychology. And he has become a national pariah. …

Very few social psychologists make stuff up, but he was working in a discipline where cavalier use of data was common. This is perhaps the main finding of the three Dutch academic committees which investigated his fraud. The committees found many bad practices: researchers who keep rerunning an experiment until they get the right result, who omit inconvenient data, misunderstand statistics, don’t share their data, and so on….

Chapter 5 of the Report, pp 47-54, is extremely illuminating about the general practices they discovered in examining Stapel’s papers, I recommend it.

Social psychology might recover. However, Stapel might not. A country’s way of dealing with sinners is often shaped by its religious heritage. In Catholicism, sinners can get absolution in the secrecy of confession. … …In many American versions of Protestantism, the sinner can be “born again”. …Stapel’s misfortune is to be Dutch. The dominant Dutch tradition is Calvinist, and Calvinism believes in eternal sin. …But the downside to not forgiving sinners is that there are almost no second acts in Dutch lives.

http://www.ft.com/intl/cms/s/2/d1e53488-48cd-11e2-a6b3-00144feab49a.html#axzz2PAPIxuHx

But it isn’t just old acquaintances who think Stapel might be ready for a comeback. A few researchers are beginning to defend the field from the broader accusations the Report wages against the scientific integrity of social psychology. They do not deny the “cavalier” practices, but regard them as acceptable and even necessary! This might even pave the way for Stapel’s rehabilitation. An article by a delegate for the 3rd World Conference on Research Integrity (wcri2013.org) in Montreal, Canada, in May reports on members of a new group critical of the Report, including some who were interviewed by the Tilberg Committees: Read more »

Categories: junk science, Statistics | 21 Comments

possible progress on the comedy hour circuit?

Image of business woman rolling a giant stoneIt’s not April Fool’s Day yet, so I take it that Corey Yanofsky, one of the top 6 commentators on this blog, is serious in today’s exchange, despite claiming to be a Jaynesian (whatever that is). I dare not scratch too deep or look too close…along the lines of not looking a gift horse in the mouth, or however that goes. So here’s a not-too selective report from our exchange in the comments on my previous blogpost:

Mayo: You wrote:”I think I wrote something to the effect that your philosophy was the only one I have encountered that could possibly put frequentist procedures on a sound footing; I stand by that.” I’m curious as to why I deserve this honor ….

Corey: 
Mayo: It was always obvious no competent frequentist statistician would use a procedure criticized by the howlers; the problem was that I had never seen a compelling explanation why (beyond “that’s obviously stupid”). So you deserve the honor for putting forth a single principle from which error statistical procedures flow that refutes all of the howlers at once.

Mayo
: Corey: Wow, that’s a big concession even coupled with your remaining doubts….maybe I should highlight this portion of our exchange for our patient readers, looking for any sign of progress…

Corey:
 Mayo: Feel free to highlight it. I will point out that this “concession” shouldn’t be news to you: in an email I sent you on September 11, 2012, I wrote, ‘I now appreciate how the severity-based approach fully addresses all the typical criticisms offered during “Bayesian comedy hour”. Now, when I encounter these canards in Bayesian writings, I feel chagrin that they are being propagated; I certainly shall not be repeating them myself.’

Mayo: Ok, so you get an Honorable Mention, especially as I’m always pushing this bolder, or maybe it’s a stone egg. It will be a miracle if any to-be-published Bayesian texts or new editions excise some of the howlers!

But I still don’t understand the hesitancy in coming over to the error statistical side….

Categories: Uncategorized | 42 Comments

Blog at WordPress.com. Theme: Customized Adventure Journal by Contexture International.

Follow

Get every new post delivered to your Inbox.

Join 84 other followers