The error statistician has a complex, messy, subtle, ingenious, piece-meal approach

Posted on December 14, 2013 by Mayo

A comment today by Stephen Senn leads me to post the last few sentences of my (2010) paper with David Cox, “Frequentist Statistics as a Theory of Inductive Inference”:

A fundamental tenet of the conception of inductive learning most at home with the frequentist philosophy is that inductive inference requires building up incisive arguments and inferences by putting together several different piece-meal results; we have set out considerations to guide these pieces[i]. Although the complexity of the issues makes it more difficult to set out neatly, as, for example, one could by imagining that a single algorithm encompasses the whole of inductive inference, the payoff is an account that approaches the kind of arguments that scientists build up in order to obtain reliable knowledge and understanding of a field.” (273)[ii]

A reread for Saturday night?

[i]The pieces hang together by dint of the rationale growing out of a severity criterion (or something akin but using a different term.)

[ii]Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 1-27. This paper appeared in The Second Erich L. Lehmann Symposium: Optimality, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics, pp. 247-275.

Categories: Bayesian/frequentist, Error Statistics | 20 Comments

20 thoughts on “The error statistician has a complex, messy, subtle, ingenious, piece-meal approach”

December 14, 2013

Anon

Dr. Mayo,

Do you believe that if two different statisticians use the exact same evidence to answer a question, and break the problem down into different but both legitimate (by whatever criterion you favor) piecemeal steps, then they should, when they put it all together, get the same answer? Or at least, answers that don’t contradict each other?

If “no”, then on what basis can their answers be “objective” when the conclusions depend on how the analysis was broken down into pieces?

If “yes”, then do you believe Error Statistics, and in particular SEV, has this characteristic?

Or alternatively, do you think it’s worth looking at the mathematical consequences of forcing answers from both legitimate “paths” to be consistent with each other?

Reply

December 14, 2013

Mayo

Anon: I don’t know what you mean by “forcing answers from both legitimate ‘paths’ to be consistent.” But objectivity, on my view, isn’t a matter of getting the same answer, even in the mythical situation of having the same evidence.

Reply

December 14, 2013

Anon

For example, two analysts are investigating H_0. They make the same modeling assumptions and use the same data D.

However analyst A uses tests T_1, T_2, T_3, while analyst B uses test T_4, T_5. Assume they carry out the tests correctly. Call these different sequence of tests “different research paths”.

If analyst A says “my work shows there’s strong evidence for H_0”, while analyst B says “my work shows there’s strong evidence against H_0” then these results aren’t consistent.

In this case, the inconsistency doesn’t arise from different data or modeling assumptions. It arises purely because they chose different sets of piecemeal analysis.

So my questions are:

A: if this can happen do you think it’s a problem?

B: Do you think SEV has this problem?

C: If it is a problem and we agree to only look at methods that don’t have this flaw, then what general form must those methods have?

Reply

December 14, 2013

Mayo

Anon: I think this is a mindset that directs one to consider “methods” and “rules” that are on automatic pilot. We are forced to “use the old noodle” as Le Cam put it. I think it would be a good idea for people to consider how non-statistical science makes progress at the frontiers (I began this blog talking about prions and Mad Cow–one or two parts had statistical questions (the rates of infection), but the bulk of it does not). In solving various problems, disparate answers arise.
The severity approach, don’t forget, always requires an inference or assessment about what was not probed severely. That’s an important part of adjudicating/understanding disparate answers. The key is to be able to understand why apparently different answers to the same problem arise.

Reply

December 14, 2013

Anon

My point has nothing to do with “rules” that put inference on automatic pilot.

If what I describe can happen, then every researcher at the outset faces the very practical problem of deciding which piecemeal path to take. Their arbitrary choice can dramatically affect the answer in that case.

Your only practical advice to someone facing that problem is to tell them science is messy.

Reply

December 14, 2013

Mayo

Anon: I’m not giving practical advice to someone. If I were, I’d deny we should limit ourselves to methods where approaching a problem from different paths couldn’t lead to disparate results. As a general rule, that would be silly.

But I take it you’re actually interested in a specific case where one approach is found defective from the perspective of given goals and criteria. And further, you know why. It’s not fashion; you can show, I presume, that one approach gets it wrong. Good.That’s what I mean by having a rationale.

Reply
December 14, 2013

Anon

No, I’m not at all interested in “one approach is found defective from the perspective of given goals and criteria”, I have no idea how you got that from my comments. It couldn’t be further from what I’m aiming at.

“I’d deny we should limit ourselves to methods where approaching a problem from different paths couldn’t lead to disparate results. As a general rule, that would be silly.”

So you think as a matter of principle that given the same data and modeling assumptions, and without using any other information, it’s ok if one choice of test statistics strongly confirms H_0 while another strongly discredits H_0.

How much of Error Statistics depends on this principle?

Reply

December 14, 2013

Mayo

Anon: Straw man! And it comes at the perfect time, as I’m busy writing up the final exam for my critical thinking class.
If there’s an example where this allegedly occurs, I’d be glad to have a look later on.

Reply

December 14, 2013

Anon

Are you willing to stake the reputation of Error Statistics on the claim this doesn’t happen?

I only ask because it’s easy to find examples of it. I’m shocked you hadn’t discovered this already in your applied work utilizing Severity. Unfortunately, if you do only consider methods consistent in this way, it leads naturally toward the Bayesian formalism.

Reply

December 15, 2013

Mayo

Anon:
We are prepared to allow anonymous comments, so long as that wasn’t working too badly, but think it’s important to prevent an unconstructive turn, particularly when it’s entirely unnecessary– as when it’s not at all clear what’s being alleged. You’re shocked that I hadn’t discovered that approaching a problem from different paths could lead to disparate results? But I never said that, quite the opposite. I said I’d deny we should limit ourselves to methods where approaching a problem from different paths couldn’t lead to disparate results.
Tests with different properties, for just one example, yield different results. Understanding the properties of the tests allows understanding why, and critically scrutinizing warranted inferences.
But I would also deny that an error statistician who is consistent is led naturally toward assigning prior probabilities to hypotheses in order to go down a Bayesian path. So, basically, I think we made a wrong turn in understanding somewhere, but I’m unable to write further in reply.

Reply

December 14, 2013

Mayo student

Dr Mayo: You missed the fallacy of complex question: How much of your book depends on a principle leading to diametrically opposite inductive inferences? Little bit of humour and ridicule too. This sure beats studying.

Reply

December 15, 2013

Mayo

Student: Go back to studying, now I can’t include this on the test.

Reply

December 14, 2013

E.Berk

Neyman proved that with a single test hypothesis, there are test statistics x, y such that when |x| is large, |y| is small. That’s why he and Pearson introduced the alternative to the null.

Reply

December 15, 2013

Mayo

E. Berk: True. Of course I spoze it’s possible to always say that approaching a problem from one path rather than another reflects different information—but that’s a stretch. It’s not just the fact that different results are to be expected with different designs, questions and what not, it’s that it’s necessary in order to learn, empirically, the different properties of the tools, and perhaps reject one route in favor of another (as in Senn’s example which started my whole post). Aside from that there’s (literally) the role of chance. They used one telescope at Principe, another at Sobral (in the 1919 eclipse results) and all kinds of accidents wound up making some plates more informative. Others needed to be scrapped entirely (for good reason). Usable sample size is a post data affair as well.

Reply

December 15, 2013

E.Berk

Your interlocutor wrote: “the inconsistency doesn’t arise from different data or modeling assumptions”, which is incorrect in these cases.

Reply

December 15, 2013

Mayo

E.Berk: Clearly so. I think he’s just trying to rattle my chain.

Reply

December 16, 2013

john byrd

I had started to challenge the interlocutor to provide an example– a real one. I was wondering what world this person was living in. But, I was traveling and lost the wireless connection in an airport and arrived home to see that perhaps it was something of a joke?Perhaps another attempt at a howler?

Reply

December 16, 2013

Mayo

John: Well it would have been good if you had (jumped in). No it was definitely not a joke.

Reply

December 19, 2013

Christian Hennig

Anon (and the others): Two different tests can give different results, but taken together they give a fuller picture of what is going on (although one needs to worry about multiple testing when putting too many tests together).
I think that it’s very important here to distinguish between situations in which a binary decision is needed in finite time, and situations where it is about collecting more and more knowledge, “finding things out”, but always leaving open the possibility of criticising what once was seen as “secured” knowledge.
In decision setups, people using different but equally legitimate tests may make conflicting decisions.
In “collecting information” setups, different tests computed on the same data give different bits of information that can only be seen as contradictory if over-interpreted.
Keep in mind that a non-rejection never means that the null is true. Even if a test has a high severity (and may therefore be seen as positive confirmation of a H0 if not significant), it can only distinguish the H0 from specific alternatives, and there are always further alternatives open with the potential to create a consistent stroy out of all the apparently contradictory test results.

Reply

December 19, 2013

Mayo

Christian: Thanks for this sane reflection.

Reply

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension. Cancel reply

The error statistician has a complex, messy, subtle, ingenious, piece-meal approach

Post navigation

20 thoughts on “The error statistician has a complex, messy, subtle, ingenious, piece-meal approach”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

The error statistician has a complex, messy, subtle, ingenious, piece-meal approach

Related

Post navigation

20 thoughts on “The error statistician has a complex, messy, subtle, ingenious, piece-meal approach”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.