Monthly Archives: February 2015

Big Data Is The New Phrenology?



It happens I’ve been reading a lot lately about the assumption in social psychology and psychology in general that what they’re studying is measurable, quantifiable. Addressing the problem has been shelved to the back burner for decades thanks to some redefinitions of what it is to “measure” in psych (anything for which there’s a rule to pop out a number says Stevens–an operationalist in the naive positivist spirit). This at any rate is what I’m reading, thanks to papers sent by a colleague of Meehl’s (N. Waller).  (Here’s one by Mitchell.) I think it’s time to reopen the question.The measures I see of “severity of moral judgment”, “degree of self-esteem” and much else in psychology appear to fall into this behavior in a very non-self critical manner. No statistical window-dressing (nor banning of statistical inference) can help them become more scientific. So when I saw this on Math Babe’s twitter I decided to try the “reblog” function and see what happened. Here it is (with her F word included). The article to which she alludes is “Recruiting Better Talent Through Brain Games” )


Have you ever heard of phrenology? It was, once upon a time, the “science” of measuring someone’s skull to understand their intellectual capabilities.

This sounds totally idiotic but was a huge fucking deal in the mid-1800’s, and really didn’t stop getting some credit until much later. I know that because I happen to own the 1911 edition of the Encyclopedia Britannica, which was written by the top scholars of the time but is now horribly and fascinatingly outdated.

For example, the entry for “Negro” is famously racist. Wikipedia has an excerpt: “Mentally the negro is inferior to the white… the arrest or even deterioration of mental development [after adolescence] is no doubt very largely due to the fact that after puberty sexual matters take the first place in the negro’s life and thoughts.”

But really that one line doesn’t tell the whole story. Here’s the whole thing…

View original post 351 more words

Categories: msc kvetch, scientism, Statistics | 3 Comments


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: February 2012. I am to mark in red three posts (or units) that seem most apt for general background on key issues in this blog. Given our Fisher reblogs, we’ve already seen many this month. So, I’m marking in red (1) The Triad, and (2) the Unit on Spanos’ misspecification tests. Plase see those posts for their discussion. The two posts from 2/8 are apt if you are interested in a famous case involving statistics at the Supreme Court. Beyond that it’s just my funny theatre of the absurd piece with Barnard. (Gelman’s is just a link to his blog.)


February 2012


  • (2/11) R.A. Fisher: Statistical Methods and Scientific Inference
  • (2/11)  JERZY NEYMAN: Note on an Article by Sir Ronald Fisher
  • (2/12) E.S. Pearson: Statistical Concepts in Their Relation to Reality





This new, once-a-month, feature began at the blog’s 3-year anniversary in Sept, 2014.


Jan. 2012

Dec. 2011

Nov. 2011

Oct. 2011

Sept. 2011 (Within “All She Wrote (so far))

Categories: 3-year memory lane, Statistics | 1 Comment

Sir Harold Jeffreys’ (tail area) one-liner: Saturday night comedy (b)

Comedy hour icon


This headliner appeared before, but to a sparse audience, so Management’s giving him another chance… His joke relates to both Senn’s post (about alternatives), and to my recent post about using (1 – β)/α as a likelihood ratio--but for very different reasons. (I’ve explained at the bottom of this “(b) draft”.)

 ….If you look closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike, (especially as he’s no longer doing the Tonight Show) ….



It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler joke* in criticizing the use of p-values.

“Did you hear the one about significance testers rejecting H0 because of outcomes H0 didn’t predict?

‘What’s unusual about that?’ you ask?

What’s unusual is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading

Categories: Comedy, Discussion continued, Fisher, Jeffreys, P-values, Statistics, Stephen Senn | 5 Comments

Stephen Senn: Fisher’s Alternative to the Alternative


As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog Senn from 3 years ago.  

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.


The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading

Categories: Fisher, Statistics, Stephen Senn | Tags: , , , | 59 Comments

R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)



In recognition of R.A. Fisher’s birthday….

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

Categories: Fisher, phil/history of stat, Spanos, Statistics | 6 Comments

R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’: Just before breaking up (with N-P)

17 February 1890–29 July 1962

In recognition of R.A. Fisher’s birthday tomorrow, I will post several entries on him. I find this (1934) paper to be intriguing –immediately before the conflicts with Neyman and Pearson erupted. It represents essentially the last time he could take their work at face value, without the professional animosities that almost entirely caused, rather than being caused by, the apparent philosophical disagreements and name-calling everyone focuses on. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a very similar place (no pun intended) while starting from different origins. I quote just the most relevant portions…the full article is linked below. I’d blogged it earlier here.  You may find some gems in it.

‘Two new Properties of Mathematical Likelihood’

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

  The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true.

If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

Categories: Fisher, phil/history of stat, Statistics | Tags: , , , | 3 Comments

Continuing the discussion on truncation, Bayesian convergence and testing of priors



My post “What’s wrong with taking (1 – β)/α, as a likelihood ratio comparing H0 and H1?” gave rise to a set of comments that were mostly off topic but interesting in their own right. Being too long to follow, I put what appears to be the last group of comments here, starting with Matloff’s query. Please feel free to continue the discussion here; we may want to come back to the topic. Feb 17: Please note one additional voice at the end. (Check back to that post if you want to see the history)


I see the conversation is continuing. I have not had time to follow it, but I do have a related question, on which I’d be curious as to the response of the Bayesians in our midst here.

Say the analyst is sure that μ > c, and chooses a prior distribution with support on (c,∞). That guarantees that the resulting estimate is > c. But suppose the analyst is wrong, and μ is actually less than c. (I believe that some here conceded this could happen in some cases in whcih the analyst is “sure” μ > c.) Doesn’t this violate one of the most cherished (by Bayesians) features of the Bayesian method — that the effect of the prior washes out as the sample size n goes to infinity?


(to Matloff),

The short answer is that assuming information such as “mu is greater than c” which isn’t true screws up the analysis. It’s like a mathematician starting a proof of by saying “assume 3 is an even number”. If it were possible to consistently get good results from false assumptions, there would be no need to ever get our assumptions right. Continue reading

Categories: Discussion continued, Statistics | 60 Comments

Induction, Popper and Pseudoscience



February is a good time to read or reread these pages from Popper’s Conjectures and Refutations. Below are (a) some of my newer reflections on Popper after rereading him in the graduate seminar I taught one year ago with Aris Spanos (Phil 6334), and (b) my slides on Popper and the philosophical problem of induction, first posted here. I welcome reader questions on either.

As is typical in rereading any deep philosopher, I discover (or rediscover) different morsels of clues to understanding—whether fully intended by the philosopher or a byproduct of their other insights, and a more contemporary reading. So it is with Popper. A couple of key ideas to emerge from the seminar discussion (my slides are below) are:

  1. Unlike the “naïve” empiricists of the day, Popper recognized that observations are not just given unproblematically, but also require an interpretation, an interest, a point of view, a problem. What came first, a hypothesis or an observation? Another hypothesis, if only at a lower level, says Popper.  He draws the contrast with Wittgenstein’s “verificationism”. In typical positivist style, the verificationist sees observations as the given “atoms,” and other knowledge is built up out of truth functional operations on those atoms.[1] However, scientific generalizations beyond the given observations cannot be so deduced, hence the traditional philosophical problem of induction isn’t solvable. One is left trying to build a formal “inductive logic” (generally deductive affairs, ironically) that is thought to capture intuitions about scientific inference (a largely degenerating program). The formal probabilists, as well as philosophical Bayesianism, may be seen as descendants of the logical positivists–instrumentalists, verificationists, operationalists (and the corresponding “isms”). So understanding Popper throws a great deal of light on current day philosophy of probability and statistics.

Continue reading

Categories: Phil 6334 class material, Popper, Statistics | 8 Comments

What’s wrong with taking (1 – β)/α, as a likelihood ratio comparing H0 and H1?



Here’s a quick note on something that I often find in discussions on tests, even though it treats “power”, which is a capacity-of-test notion, as if it were a fit-with-data notion…..

1. Take a one-sided Normal test T+: with n iid samples:

H0: µ ≤  0 against H1: µ >  0

σ = 10,  n = 100,  σ/√n =σx= 1,  α = .025.

So the test would reject H0 iff Z > c.025 =1.96. (1.96. is the “cut-off”.)


  1. Simple rules for alternatives against which T+ has high power:
  • If we add σx (here 1) to the cut-off (here, 1.96) we are at an alternative value for µ that test T+ has .84 power to detect.
  • If we add 3σto the cut-off we are at an alternative value for µ that test T+ has ~ .999 power to detect. This value, which we can write as µ.999 = 4.96

Let the observed outcome just reach the cut-off to reject the null,z= 1.96.

If we were to form a “likelihood ratio” of μ = 4.96 compared to μ0 = 0 using

[Power(T+, 4.96)]/α,

it would be 40.  (.999/.025).

It is absurd to say the alternative 4.96 is supported 40 times as much as the null, understanding support as likelihood or comparative likelihood. (The data 1.96 are even closer to 0 than to 4.96). The same point can be made with less extreme cases.) What is commonly done next is to assign priors of .5 to the two hypotheses, yielding

Pr(H0 |z0) = 1/ (1 + 40) = .024, so Pr(H1 |z0) = .976.

Such an inference is highly unwarranted and would almost always be wrong. Continue reading

Categories: Bayesian/frequentist, law of likelihood, Statistical power, statistical tests, Statistics, Stephen Senn | 87 Comments

Stephen Senn: Is Pooling Fooling? (Guest Post)

Stephen Senn


Stephen Senn
Head, Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS), Luxembourg

Is Pooling Fooling?

‘And take the case of a man who is ill. I call two physicians: they differ in opinion. I am not to lie down, and die between them: I must do something.’ Samuel Johnson, in Boswell’s A Journal of a Tour to the Hebrides

A common dilemma facing meta-analysts is what to put together with what? One may have a set of trials that seem to be approximately addressing the same question but some features may differ. For example, the inclusion criteria might have differed with some trials only admitting patients who were extremely ill but with other trials treating the moderately ill as well. Or it might be the case that different measurements have been taken in different trials. An even more extreme case occurs when different, if presumed similar, treatments have been used.

It is helpful to make a point of terminology here. In what follows I shall be talking about pooling results from various trials. This does not involve naïve pooling of patients across trials. I assume that each trial will provide a valid within- trial comparison of treatments. It is these comparisons that are to be pooled (appropriately).

A possible way to think of this is in terms of a Bayesian model with a prior distribution covering the extent to which results might differ as features of trials are changed. I don’t deny that this is sometimes an interesting way of looking at things (although I do maintain that it is much more tricky than many might suppose[1]) but I would also like to draw attention to the fact that there is a frequentist way of looking at this problem that is also useful.

Suppose that we have k ‘null’ hypotheses that we are interested in testing, each being capable of being tested in one of k trials. We can label these Hn1, Hn2, … Hnk. We are perfectly entitled to test the null hypothesis Hjoint that they are all jointly true. In doing this we can use appropriate judgement to construct a composite statistic based on all the trials whose distribution is known under the null. This is a justification for pooling. Continue reading

Categories: evidence-based policy, PhilPharma, S. Senn, Statistics | 19 Comments

Blog at