Jeffreys

Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]

Comedy hour icon

This headliner appeared two years ago, but to a sparse audience (likely because it was during winter break), so Management’s giving him another chance… 

You might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno [1] who is standing up there at the mike ….

IMG_1547It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler* in criticizing the use of p-values.

“Did you hear the one about significance testers rejecting H0 because of outcomes H0 didn’t predict?

‘What’s unusual about that?’ you ask?

What’s unusual, is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading

Categories: Comedy, Fisher, Jeffreys, P-values | 9 Comments

Stephen Senn: Double Jeopardy?: Judge Jeffreys Upholds the Law (sequel to the pathetic P-value)[4]

S. Senn

S. Senn

Stephen Senn
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

Double Jeopardy?: Judge Jeffreys Upholds the Law*[4]

“But this could be dealt with in a rough empirical way by taking twice the standard error as a criterion for possible genuineness and three times the standard error for definite acceptance”. Harold Jeffreys(1) (p386)

This is the second of two posts on P-values. In the first, The Pathetic P-Value, I considered the relation of P-values to Laplace’s Bayesian formulation of induction, pointing out that that P-values, whilst they had a very different interpretation, were numerically very similar to a type of Bayesian posterior probability. In this one, I consider their relation or lack of it, to Harold Jeffreys’s radically different approach to significance testing. (An excellent account of the development of Jeffreys’s thought is given by Howie(2), which I recommend highly.)

The story starts with Cambridge philosopher CD Broad (1887-1971), who in 1918 pointed to a difficulty with Laplace’s Law of Succession. Broad considers the problem of drawing counters from an urn containing n counters and supposes that all m drawn had been observed to be white. He now considers two very different questions, which have two very different probabilities and writes: Continue reading

Categories: Jeffreys, P-values, reforming the reformers, Stephen Senn | Tags: | 11 Comments

Stephen Senn: Double Jeopardy?: Judge Jeffreys Upholds the Law (sequel to the pathetic P-value)

S. Senn

S. Senn

Stephen Senn
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

Double Jeopardy?: Judge Jeffreys Upholds the Law

“But this could be dealt with in a rough empirical way by taking twice the standard error as a criterion for possible genuineness and three times the standard error for definite acceptance”. Harold Jeffreys(1) (p386)

This is the second of two posts on P-values. In the first, The Pathetic P-Value, I considered the relation of P-values to Laplace’s Bayesian formulation of induction, pointing out that that P-values, whilst they had a very different interpretation, were numerically very similar to a type of Bayesian posterior probability. In this one, I consider their relation or lack of it, to Harold Jeffreys’s radically different approach to significance testing. (An excellent account of the development of Jeffreys’s thought is given by Howie(2), which I recommend highly.)

The story starts with Cambridge philosopher CD Broad (1887-1971), who in 1918 pointed to a difficulty with Laplace’s Law of Succession. Broad considers the problem of drawing counters from an urn containing n counters and supposes that all m drawn had been observed to be white. He now considers two very different questions, which have two very different probabilities and writes:

C.D. Broad quoteNote that in the case that only one counter remains we have n = m + 1 and the two probabilities are the same. However, if n > m+1 they are not the same and in particular if m is large but n is much larger, the first probability can approach 1 whilst the second remains small.

The practical implication of this just because Bayesian induction implies that a large sequence of successes (and no failures) supports belief that the next trial will be a success, it does not follow that one should believe that all future trials will be so. This distinction is often misunderstood. This is The Economist getting it wrong in September 2000

The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child’s degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.

See Dicing with Death(3) (pp76-78).

The practical relevance of this is that scientific laws cannot be established by Laplacian induction. Jeffreys (1891-1989) puts it thus

Thus I may have seen 1 in 1000 of the ‘animals with feathers’ in England; on Laplace’s theory the probability of the proposition, ‘all animals with feathers have beaks’, would be about 1/1000. This does not correspond to my state of belief or anybody else’s. (P128)

Continue reading

Categories: Jeffreys, P-values, reforming the reformers, Statistics, Stephen Senn | 42 Comments

Sir Harold Jeffreys’ (tail area) one-liner: Saturday night comedy (b)

Comedy hour icon

.

This headliner appeared before, but to a sparse audience, so Management’s giving him another chance… His joke relates to both Senn’s post (about alternatives), and to my recent post about using (1 – β)/α as a likelihood ratio--but for very different reasons. (I’ve explained at the bottom of this “(b) draft”.)

 ….If you look closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike, (especially as he’s no longer doing the Tonight Show) ….

IMG_1547

.

It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler joke* in criticizing the use of p-values.

“Did you hear the one about significance testers rejecting H0 because of outcomes H0 didn’t predict?

‘What’s unusual about that?’ you ask?

What’s unusual is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading

Categories: Comedy, Discussion continued, Fisher, Jeffreys, P-values, Statistics, Stephen Senn | 7 Comments

Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]

Comedy hour icon

This headliner appeared last month, but to a sparse audience (likely because it was during winter break), so Management’s giving him another chance… 

You might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike ….

IMG_1547It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler* in criticizing the use of p-values.

“Did you hear the one about significance testers rejecting H0 because of outcomes H0 didn’t predict?

‘What’s unusual about that?’ you ask?

What’s unusual, is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading

Categories: Comedy, Fisher, Jeffreys, P-values, Stephen Senn | Leave a comment

Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]

Comedy hour iconYou might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike ….

IMG_1547It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler* in criticizing the use of p-values.

“Did you hear the one about significance testers rejecting H0 because of outcomes H0 didn’t predict?

‘What’s unusual about that?’ you ask?

Well, what’s unusual, is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation.

We can view p-values in terms of rejecting H0, as in the joke: There’s a test statistic D such that H0 is rejected if its observed value d0 reaches or exceeds a cut-off d* where Pr(D > d*; H0) is small, say .025.
           Reject H0 if Pr(D > d0H0) < .025.
The report might be “reject Hat level .025″.
Example:  H0: The mean light deflection effect is 0. So if we observe a 1.96 standard deviation difference (in one-sided Normal testing) we’d reject H0 .

Now it’s true that if the observation were further into the rejection region, say 2, 3 or 4 standard deviations, it too would result in rejecting the null, and with an even smaller p-value. It’s also true that H0 “has not predicted” a 2, 3, 4, 5 etc. standard deviation difference in the sense that differences so large are “far from” or improbable under the null. But wait a minute. What if we’ve only observed a 1 standard deviation difference (p-value = .16)? It is unfair to count it against the null that 1.96, 2, 3, 4 etc. standard deviation differences would have diverged seriously from the null, when we’ve only observed the 1 standard deviation difference. Yet the p-value tells you to compute Pr(D > 1; H0), which includes these more extreme outcomes! This is “a remarkable procedure” indeed! [i]

So much for making out the howler. The only problem is that significance tests do not do this, that is, they do not reject with, say, D = 1 because larger D values might have occurred (but did not). D = 1 does not reach the cut-off, and does not lead to rejecting H0. Moreover, looking at the tail area makes it harder, not easier, to reject the null (although this isn’t the only function of the tail area): since it requires not merely that Pr(D = d0 ; H0 ) be small, but that Pr(D > d0 ; H0 ) be small. And this is well justified because when this probability is not small, you should not regard it as evidence of discrepancy from the null. Before getting to this …. Continue reading

Categories: Comedy, Fisher, Jeffreys, P-values, Statistics, Stephen Senn | 12 Comments

Blog at WordPress.com.