“It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based”

Posted on October 5, 2018 by Mayo

My new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars,” you might have discovered, includes Souvenirs throughout (A-Z). But there are some highlights within sections that might be missed in the excerpts I’m posting. One such “keepsake” is a quote from Fisher at the very end of Section 2.1.

These are some of the ﬁrst clues we’ll be collecting on a wide diﬀerence between statistical inference as a deductive logic of probability, and an inductive testing account sought by the error statistician. When it comes to inductive learning, we want our inferences to go beyond the data: we want lift-oﬀ. To my knowledge, Fisher is the only other writer on statistical inference, aside from Peirce, to emphasize this distinction.

In deductive reasoning all knowledge obtainable is already latent in the postulates. Rigour is needed to prevent the successive inferences growing less and less accurate as we proceed. The conclusions are never more accurate than the data. In inductive reasoning we are performing part of the process by which new knowledge is created. The conclusions normally grow more and more accurate as more data are included. It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based. (Fisher 1935b, p. 54)

How do you understand this remark of Fisher’s? (Please share your thoughts in the comments.) My interpretation, and its relation to the “lift-off” needed to warrant inductive inferences, is discussed in an earlier section, 1.2, posted here. Here’s part of that.

The weakest version of the severity requirement (Section 1.1), in the sense of easiest to justify, is negative, warning us when BENT data are at hand, and a surprising amount of mileage may be had from that negative principle alone. It is when we recognize how poorly certain claims are warranted that we get ideas for improved inquiries. In fact, if you wish to stop at the negative requirement, you can still go pretty far along with me. I also advocate the positive counterpart:

Severity (strong): We have evidence for a claim C just to the extent it survives a stringent scrutiny. If C passes a test that was highly capable of ﬁnding ﬂaws or discrepancies from C, and yet none or few are found, then the passing result, x, is evidence for C.

One way this can be achieved is by an argument from coincidence. The most vivid cases occur outside formal statistics.

Some of my strongest examples tend to revolve around my weight. Before leaving the USA for the UK, I record my weight on two scales at home, one digital, one not, and the big medical scale at my doctor’s oﬃce. Suppose they are well calibrated and nearly identical in their readings, and they also all pick up on the extra 3 pounds when I’m weighed carrying three copies of my 1-pound book, Error and the Growth of Experimental Knowledge (EGEK). Returning from the UK, to my astonishment, not one but all three scales show anywhere from a 4–5 pound gain. There’s no difference when I place the three books on the scales, so I must conclude, unfortunately, that I’ve gained around 4 pounds. Even for me, that’s a lot. I’ve surely falsified the supposition that I lost weight! From this informal example, we may make two rather obvious points that will serve for less obvious cases. First, there’s the idea I call lift-oﬀ.

Lift-oﬀ: An overall inference can be more reliable and precise than its premises individually.

Each scale, by itself, has some possibility of error, and limited precision. But the fact that all of them have me at an over 4-pound gain, while none show any difference in the weights of EGEK, pretty well seals it. Were one scale oﬀ balance, it would be discovered by another, and would show up in the weighing of books. They cannot all be systematically misleading just when it comes to objects of unknown weight, can they? Rejecting a conspiracy of the scales, I conclude I’ve gained weight, at least 4 pounds. We may call this an argument from coincidence, and by its means we can attain lift-oﬀ. Lift-oﬀ runs directly counter to a seemingly obvious claim of drag-down.

Drag-down: An overall inference is only as reliable/precise as is its weakest premise.

The drag-down assumption is common among empiricist philosophers: As they like to say, “It’s turtles all the way down.” Sometimes our inferences do stand as a kind of tower built on linked stones – if even one stone fails they all come tumbling down. Call that a linked argument.

Our most prized scientific inferences would be in a very bad way if piling on assumptions invariably leads to weakened conclusions. Fortunately we also can build what may be called convergent arguments, where lift-oﬀ is attained. This seemingly banal point suffices to combat some of the most well entrenched skepticisms in philosophy of science. And statistics happens to be the science par excellence for demonstrating lift-oﬀ!

Now consider what justifies my weight conclusion, based, as we are supposing it is, on a strong argument from coincidence. No one would say: “I can be assured that by following such a procedure, in the long run I would rarely report weight gains erroneously, but I can tell nothing from these readings about my weight now.” To justify my conclusion by long-run performance would be absurd. Instead we say that the procedure had enormous capacity to reveal if any of the scales were wrong, and from this I argue about the source of the readings: H: I’ve gained weight. Simple as that. It would be a preposterous coincidence if none of the scales registered even slight weight shifts when weighing objects of known weight, and yet were systematically misleading when applied to my weight. You see where I’m going with this. This is the key – granted with a homely example – that can ﬁll a very important gap in frequentist foundations: Just because an account is touted as having a long-run rationale, it does not mean it lacks a short run rationale, or even one relevant for the particular case at hand. Nor is it merely the improbability of all the results were H false; it is rather like denying an evil demon has read my mind just in the cases where I do not know the weight of an object, and deliberately deceived me. The argument to “weight gain” is an example of an argument from coincidence to the absence of an error, what I call:

Arguing from Error: There is evidence an error is absent to the extent that a procedure with a very high capability of signaling the error, if and only if it is present, nevertheless detects no error.

I am using “signaling” and “detecting” synonymously: It is important to keep in mind that we don’t know if the test output is correct, only that it gives a signal or alert, like sounding a bell. Methods that enable strong arguments to the absence (or presence) of an error I call strong error probes. Our ability to develop strong arguments from coincidence, I will argue, is the basis for solving the “problem of induction.”

^{*Where you are in the Journey: I posted all of Excursion 1 Tour I, here, here, and here, omitted Tour II (but blogposts on the Law of Likelihood, Royall, and optional stopping, may be found by searching this blog).}^{SIST Itinerary}

Categories: induction, keepsakes from Stat Wars, Statistical Inference as Severe Testing | 7 Comments

7 thoughts on ““It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based””

October 5, 2018

Thanatos Savehn

In my view, his “never”s give the game away. Once you accept a Leibniz/Newton actually infinite universe model it all makes sense. Well, that is until you realize it doesn’t. Having said as much, there’s something to the infinite, though it lies forever beyond our ken. We sailors can tack back and forth but never can we aim straightly at the truth. Such is our fate, glorious as uncertainty may be.

Reply
October 6, 2018

Andrew Gelman

Deborah:

Your example baffles me. I don’t see the need for lift-off or whatever. From a Bayesian perspective, it’s simple: you have a model for what your weight might be, you have a model for the errors of the measurement on the scales, you have some data, and put that all together and you have inference about your weight and about the scales. Under just about any reasonable model I can think of, your inference is that you’ve probably gained about 3 pounds, but there is some probability that all the scales are biased by enough that you’ve gained less than 3 pounds, or possibly more: there will be some posterior distribution that depends on the details of your model, but the point is that this is all plain old deductive reasoning.

Depending on the data, some model expansion might be required. For example, suppose the scales said that you gained 30 pounds, but you’re doubtful: your clothes still fit and you don’t look any bigger than before. Then: (a) you probably will want to expand your model to include possibilities such as, the scale is completely busted, someone’s playing a trick on you, etc., and (b) you’re including additional information in your inference (in this case, information about how your clothes fit and how you look). As I’ve written in various places, this sort of model expansion represents a Cantorian hole in Bayesian inference (or, for that matter, any sort of statistical inference), so it’s something to worry about. But the lift-off thing you’re talking about: I don’t see the need for it as a separate concept. It seems to me that inference for specific cases comes deductively from an assumed model, which is how we go about the world in so many ways.

Reply

October 22, 2018

Mayo

Andrew: I don’t know how I missed your comment, sorry. The blog doesn’t show me when comments are there as it used to. A little orange dot used to show up. The idea of lift off is the essence of inductive, understood as ampliative, reasoning: an argument from coincidence or via triangulation. It’s not so much a new concept, but simply a concept to identify an interlocking argument whereby the conclusion is more reliable than the premises. But I may be missing your question about my weight.

Reply
October 22, 2018

Mayo

Andrew: To reply more fully to your comment, the reference to the weighing method was deliberately to appeal to a homely example that does not use formal statistics, but appeals to properties of measuring instruments. Perhaps putting it together with my query about Fisher–in this post, but it is not in the same tour in the book– distracted from that. I don’t know exactly what Fisher means, but I know what Peirce means and what I mean. My point in the weighing example is that the justification for the inference is not about long-runs.

“Now consider what justifies my weight conclusion, based, as we are supposing it is, on a strong argument from coincidence. No one would say: ‘I can be assured that by following such a procedure, in the long run I would rarely report weight gains erroneously, but I can tell nothing from these readings about my weight now.’ To justify my conclusion by long-run performance would be absurd. Instead we say that the procedure had enormous capacity to reveal if any of the scales were wrong, and from this I argue about the source of the readings: H: I’ve gained weight. Simple as that. It would be a preposterous coincidence if none of the scales registered even slight weight shifts when weighing objects of known weight, and yet were systematically misleading when applied to my weight. You see where I’m going with this. This is the key – granted with a homely example – that can ﬁll a very important gap in frequentist foundations: Just because an account is touted as having a long-run rationale, it does not mean it lacks a short run rationale, or even one relevant for the particular case at hand.’ “

Reply

October 9, 2018

Enrique Guerra-Pujol

Great quote.

Reply
October 11, 2018

Steven McKinney

In the 1935 paper “The Logic of Inductive Inference” Fisher discusses the average value of the second derivative of the likelihood, stating “We shall come later to regard i as the amount of information supplied by each of our observations, and the inequality

1/V <= ni = I

as a statement that the reciprocal of the variance, or the invariance of the estimate, cannot exceed the amount of information in the sample."

one of his early references to what we now know as Fisher Information.

After the phrase you quote above, the paragraph finishes off with

"Statistical data are always erroneous, in greater or less degree. The study of inductive reasoning is the study of the embryology of knowledge, of the processes by means of which truth is extracted from its native ore in which it is fused with much error."

so I think what Fisher was saying is that from noisy individual bits, a far less noisy understanding can be gained, as information accumulates from all the noisy bits.

The conclusions are indeed more accurate than the data on which they are based, provided the distributional properties generating the data are not too pathological (e.g. Cauchy). Accumulated information leaves us with a more accurate understanding than is available from individual observations. This is the power of inductive inference.

Reply
October 14, 2018

Mayo

Steve: Thanks for your comment, I missed it. You’re saying the remark links to the idea of Fisher information. Or is it that the Fisher info remark is why the deductive case has an upper bound, unlike the inductive case? In any event, I totally agree that “what Fisher was saying is that from noisy individual bits, a far less noisy understanding can be gained, as information accumulates from all the noisy bits.”

Reply

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension. Cancel reply

“It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based”

Post navigation

7 thoughts on ““It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based””

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

“It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based”

Related

Post navigation

7 thoughts on ““It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based””

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.