“It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based”

.

My new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars,” you might have discovered, includes Souvenirs throughout (A-Z). But there are some highlights within sections that might be missed in the excerpts I’m posting. One such “keepsake” is a quote from Fisher at the very end of Section 2.1

These are some of the first clues we’ll be collecting on a wide difference between statistical inference as a deductive logic of probability, and an inductive testing account sought by the error statistician. When it comes to inductive learning, we want our inferences to go beyond the data: we want lift-off. To my knowledge, Fisher is the only other writer on statistical inference, aside from Peirce, to emphasize this distinction.

In deductive reasoning all knowledge obtainable is already latent in the postulates. Rigour is needed to prevent the successive inferences growing less and less accurate as we proceed. The conclusions are never more accurate than the data. In inductive reasoning we are performing part of the process by which new knowledge is created. The conclusions normally grow more and more accurate as more data are included. It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based. (Fisher 1935b, p. 54)

How do you understand this remark of Fisher’s? (Please share your thoughts in the comments.) My interpretation, and its relation to the “lift-off” needed to warrant inductive inferences, is discussed in an earlier section, 1.2, posted here.   Here’s part of that. 

The weakest version of the severity requirement (Section 1.1), in the sense of easiest to justify, is negative, warning us when BENT data are at hand, and a surprising amount of mileage may be had from that negative principle alone. It is when we recognize how poorly certain claims are warranted that we get ideas for improved inquiries. In fact, if you wish to stop at the negative requirement, you can still go pretty far along with me. I also advocate the positive counterpart:

Severity (strong): We have evidence for a claim C just to the extent it survives a stringent scrutiny. If C passes a test that was highly capable of finding flaws or discrepancies from C, and yet none or few are found, then the passing result, x, is evidence for C.

One way this can be achieved is by an argument from coincidence. The most vivid cases occur outside formal statistics.

Some of my strongest examples tend to revolve around my weight. Before leaving the USA for the UK, I record my weight on two scales at home, one digital, one not, and the big medical scale at my doctor’s office. Suppose they are well calibrated and nearly identical in their readings, and they also all pick up on the extra 3 pounds when I’m weighed carrying three copies of my 1-pound book, Error and the Growth of Experimental Knowledge (EGEK). Returning from the UK, to my astonishment, not one but all three scales show anywhere from a 4–5 pound gain. There’s no difference when I place the three books on the scales, so I must conclude, unfortunately, that I’ve gained around 4 pounds. Even for me, that’s a lot. I’ve surely falsified the supposition that I lost weight! From this informal example, we may make two rather obvious points that will serve for less obvious cases. First, there’s the idea I call lift-off.

Lift-o: An overall inference can be more reliable and precise than its premises individually.

Each scale, by itself, has some possibility of error, and limited precision. But the fact that all of them have me at an over 4-pound gain, while none show any difference in the weights of EGEK, pretty well seals it. Were one scale off balance, it would be discovered by another, and would show up in the weighing of books. They cannot all be systematically misleading just when  it  comes  to  objects  of  unknown  weight,  can  they?  Rejecting a conspiracy of the scales, I conclude I’ve gained weight, at least 4 pounds. We may call this an argument from coincidence, and by its means we can attain lift-off. Lift-off runs directly counter to a seemingly obvious claim of drag-down.

Drag-down: An overall inference is only as reliable/precise as is its weakest premise.

The drag-down assumption is  common  among  empiricist  philosophers: As they like to say, “It’s turtles all the way down.” Sometimes our inferences do stand as a kind of tower built on linked stones – if even one stone fails they all come tumbling down. Call that a linked argument.

Our most prized scientific inferences would be in a very bad way if piling on assumptions invariably leads to weakened conclusions. Fortunately we also can build what may be called convergent arguments, where lift-off is attained. This seemingly banal point suffices to combat some of the most well entrenched skepticisms in philosophy of science. And statistics happens to be the science par excellence for demonstrating lift-off!

Now consider what justifies my weight conclusion, based, as we are supposing it is, on a strong argument from coincidence. No one would say: “I can be assured that by following such a procedure, in the long run I would rarely report weight gains erroneously, but I can tell nothing from these readings about my weight now.” To justify my conclusion by long-run performance would be absurd. Instead we say that the procedure had enormous capacity to reveal if any of the scales were wrong, and from this I argue about the source of the readings: H: I’ve gained weight. Simple as that. It would be a preposterous coincidence if none of the scales registered even slight weight shifts when weighing objects of known weight, and yet were systematically misleading when applied to my weight. You see where I’m going with this. This is the key – granted with a homely example – that can fill a very important gap in frequentist foundations: Just because an account is touted as having a long-run rationale, it does not mean it lacks a short run rationale, or even one relevant for the particular case at hand. Nor is it merely the improbability of all the results were H false; it is rather like denying an evil demon has read my mind just in the cases where I do not know the weight of an object, and deliberately deceived me. The argument to “weight gain” is an example of an argument from coincidence to the absence of an error, what I call:

Arguing from Error: There is evidence an error is absent to the extent that a procedure with a very high capability of signaling the error, if and only if it is present, nevertheless detects no error.

I am using “signaling” and “detecting” synonymously: It is important to keep in mind that we don’t know if the test output is correct, only that it gives a signal or alert, like sounding a bell. Methods that enable strong arguments to the absence (or presence) of an error I call strong error probes. Our ability to develop strong arguments from coincidence, I will argue, is the basis for solving the “problem of induction.”

 

.

*Where you are in the Journey: I posted all of Excursion 1 Tour I, here, here, and here, omitted Tour II (but blogposts on the Law of Likelihood, Royall, and optional stopping, may be found by searching this blog). SIST Itinerary

Categories: induction, keepsakes from Stat Wars, Statistical Inference as Severe Testing

Post navigation

7 thoughts on ““It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based”

  1. Thanatos Savehn

    In my view, his “never”s give the game away. Once you accept a Leibniz/Newton actually infinite universe model it all makes sense. Well, that is until you realize it doesn’t. Having said as much, there’s something to the infinite, though it lies forever beyond our ken. We sailors can tack back and forth but never can we aim straightly at the truth. Such is our fate, glorious as uncertainty may be.

  2. Deborah:

    Your example baffles me. I don’t see the need for lift-off or whatever. From a Bayesian perspective, it’s simple: you have a model for what your weight might be, you have a model for the errors of the measurement on the scales, you have some data, and put that all together and you have inference about your weight and about the scales. Under just about any reasonable model I can think of, your inference is that you’ve probably gained about 3 pounds, but there is some probability that all the scales are biased by enough that you’ve gained less than 3 pounds, or possibly more: there will be some posterior distribution that depends on the details of your model, but the point is that this is all plain old deductive reasoning.

    Depending on the data, some model expansion might be required. For example, suppose the scales said that you gained 30 pounds, but you’re doubtful: your clothes still fit and you don’t look any bigger than before. Then: (a) you probably will want to expand your model to include possibilities such as, the scale is completely busted, someone’s playing a trick on you, etc., and (b) you’re including additional information in your inference (in this case, information about how your clothes fit and how you look). As I’ve written in various places, this sort of model expansion represents a Cantorian hole in Bayesian inference (or, for that matter, any sort of statistical inference), so it’s something to worry about. But the lift-off thing you’re talking about: I don’t see the need for it as a separate concept. It seems to me that inference for specific cases comes deductively from an assumed model, which is how we go about the world in so many ways.

    • Andrew: I don’t know how I missed your comment, sorry. The blog doesn’t show me when comments are there as it used to. A little orange dot used to show up. The idea of lift off is the essence of inductive, understood as ampliative, reasoning: an argument from coincidence or via triangulation. It’s not so much a new concept, but simply a concept to identify an interlocking argument whereby the conclusion is more reliable than the premises. But I may be missing your question about my weight.

    • Andrew: To reply more fully to your comment, the reference to the weighing method was deliberately to appeal to a homely example that does not use formal statistics, but appeals to properties of measuring instruments. Perhaps putting it together with my query about Fisher–in this post, but it is not in the same tour in the book– distracted from that. I don’t know exactly what Fisher means, but I know what Peirce means and what I mean. My point in the weighing example is that the justification for the inference is not about long-runs.

      “Now consider what justifies my weight conclusion, based, as we are supposing it is, on a strong argument from coincidence. No one would say: ‘I can be assured that by following such a procedure, in the long run I would rarely report weight gains erroneously, but I can tell nothing from these readings about my weight now.’ To justify my conclusion by long-run performance would be absurd. Instead we say that the procedure had enormous capacity to reveal if any of the scales were wrong, and from this I argue about the source of the readings: H: I’ve gained weight. Simple as that. It would be a preposterous coincidence if none of the scales registered even slight weight shifts when weighing objects of known weight, and yet were systematically misleading when applied to my weight. You see where I’m going with this. This is the key – granted with a homely example – that can fill a very important gap in frequentist foundations: Just because an account is touted as having a long-run rationale, it does not mean it lacks a short run rationale, or even one relevant for the particular case at hand.’ “

  3. Great quote.

  4. Steven McKinney

    In the 1935 paper “The Logic of Inductive Inference” Fisher discusses the average value of the second derivative of the likelihood, stating “We shall come later to regard i as the amount of information supplied by each of our observations, and the inequality

    1/V <= ni = I

    as a statement that the reciprocal of the variance, or the invariance of the estimate, cannot exceed the amount of information in the sample."

    one of his early references to what we now know as Fisher Information.

    After the phrase you quote above, the paragraph finishes off with

    "Statistical data are always erroneous, in greater or less degree. The study of inductive reasoning is the study of the embryology of knowledge, of the processes by means of which truth is extracted from its native ore in which it is fused with much error."

    so I think what Fisher was saying is that from noisy individual bits, a far less noisy understanding can be gained, as information accumulates from all the noisy bits.

    The conclusions are indeed more accurate than the data on which they are based, provided the distributional properties generating the data are not too pathological (e.g. Cauchy). Accumulated information leaves us with a more accurate understanding than is available from individual observations. This is the power of inductive inference.

  5. Steve: Thanks for your comment, I missed it. You’re saying the remark links to the idea of Fisher information. Or is it that the Fisher info remark is why the deductive case has an upper bound, unlike the inductive case? In any event, I totally agree that “what Fisher was saying is that from noisy individual bits, a far less noisy understanding can be gained, as information accumulates from all the noisy bits.”

Blog at WordPress.com.