Stephen Senn
Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg
“The nuisance parameter nuisance”
A great deal of statistical debate concerns ‘univariate’ error, or disturbance, terms in models. I put ‘univariate’ in inverted commas because as soon as one writes a model of the form (say) Yi =Xiβ + Єi, i = 1 … n and starts to raise questions about the distribution of the disturbance terms, Єi one is frequently led into multivariate speculations, such as, ‘is the variance identical for every disturbance term?’ and, ‘are the disturbance terms independent?’ and not just speculations such as, ‘is the distribution of the disturbance terms Normal?’. Aris Spanos might also want me to put inverted commas around ‘disturbance’ (or ‘error’) since what I ought to be thinking about is the joint distribution of the outcomes, Yi conditional on the predictors.
However, in my statistical world of planning and analysing clinical trials, the differences made to inferences according to whether one uses parametric versus non-parametric methods is often minor. Of course, using non-parametric methods does nothing to answer the problem of non-independent observations but for experiments, as opposed to observational studies, you can frequently design-in independence. That is a major potential pitfall avoided but then there is still the issue of Normality. However, in my experience, this is rarely where the action is. Inferences rarely change dramatically on using ‘robust’ approaches (although one can always find examples with gross-outliers where they do). However, there are other sorts of problem that can affect data which can make a very big difference.
I am now going to give you a numerical example, so if this sort of things makes your eyeballs glaze over skip forward several paragraphs. In fact the precise numerical details are unimportant and since they include rank tests applied to data with some ties, would vary depending on how they are implemented in a given package. (In 2000, Bergman et al performed a Mann-Whitney test of a data-set and got at least six different answers from 11 packages!)
START NUMERICAL EXAMPLE
As an example of a similarity of inferences based on parametric or non-parametric models consider the analysis of the cross-over trial given as example 3.1 in my book Cross-over Trials in Clinical Research (2nd edition 2002). The trial was in 13 asthmatic children treated on two separate occasions with either formoterol or salbutamol. Seven were allocated to receive formoterol on day one and salbutamol on day two and six to the reverse sequence. The key outcome measure was peak expiratory flow in litres per minute eight hours after treatment. The data are available on the web here http://www.senns.demon.co.uk/Data/SJS%20Datasets.htm and are labelled ‘GS20’, which was the name of the trial.
A common approach to analysing such data is to fit a linear model with patient and period as effects in addition to treatment (the effect of interest). If you do this, then the point estimate is 47 L/min minute and the 95% confidence limits for the treatment effect, (formoterol-salbutamol), are 22 and 70 L/min. The P-value, if you like that sort of thing, is 0.0012. However, if you worry about distributional assumptions, you can always go for the non-parametric equivalent. The Hodges-Lehman estimate is 45 L/min and the confidence limits are 25 and 75 L/min. The P-value is 0.007.
These may seem like important differences but they are very small beer compared to how the inferences change when you start worrying about carry-over. The analyses above are only appropriate if you believe there is no (important) carry-over from period 1 into period 2. If you believe that carryover could have occurred and might be anything at all, then you could just use the first period data. If you do that, however, then the inferences, whether parametric or non-parametric, are completely different. The parametric point estimate is 54 L/min with limits of -46 to 154 L/min and a P-value of 0.26. If you prefer the non-parametric version then the HL estimate is 35 L/min the confidence limits are -50 and 150 L/min with a P-value of 0.35.
END NUMERICAL EXAMPLE
I can put it like this. To move from parametric to non-parametric inferences is like moving towns. To account for as opposed to ignoring carry-over is to change continents. A graphical representation makes this clear. The figure gives point estimates and confidences limits plotted against P-values for the four approaches according to whether carry-over is allowed for (Yes, red) or not (No, blue). Parametric intervals are shown as solid lines and non-parametric as dashed. What you assume about carry-over makes an enormous difference. What you assume about Normality makes hardly any.
Every applied statistician is faced with dilemmas like this where inferences change dramatically depending on what one is or is not prepared to assume about nuisance parameters. Every applied statistician also knows that going for the assumption-free solution can lead to absurd inferences.
Thus, whereas I have some sympathy with those who say we should reduce our dependence on assumptions there is a warning bell ringing in my mind: “fine but what’s the cost?”
REFERENCES:
Bergmann, R., Ludbrook, J. & Spooren, W. P.J.M., (2000), “Different Outcomes of the Wilcoxon-Mann-Whitney Test from Different Statistics Packages. The American Statistician, 54(1), 72–77.
Senn, S., (2002), Cross-over Trials in Clinical Research (2nd ed.), The Atrium Society, Chichester, West Sussex, England: John Wiley & Sons, Ltd.
Stephen: Thanks so much for this. The issue of the desirability of “assumption-free” methods was discussed at some length in the set of posts revolving around Wasserman’s paper in our RMM volume:
For example:
https://errorstatistics.com/2012/08/11/u-phil-wasserman-replies-to-spanos-and-hennig/
Although your application concerns medical trials, I’m wondering if you surmise that in Wasserman’s high dimensional examples as well, the worry should be on “assumption-free” solutions leading to to possibly absurd inferences.
“small beer”, cute.
Deborah: I would need to study Wasserman’s example and I haven’t but there are many examples where trying to do assumption-free inference is absurd. Another more complicated example using four period cross-over trials to compare Placebo (P) or Active (A) is given in the chapter I wrote with Philip Dawid in the book on which we are co-editors, Simplicity Complexity and Modelling. That involves a design in which patients are randomised to either PPAA or AAPP. Four models for carry-over are considered, no carry-over, simple carry-over (it depends only on the engendering treatment), steady state (a treatment cannot carry-over into itself) and general (any arbitrary carry-over). The variance of the treatments contrasts are proportion to1,1.1, 1.5 and 6. A six-fold increase in variances is something any applied statistician will be very reluctant to accept just to guarantee unbiasedness and in practice one would choose to assume one of the simpler models (in my opinion ‘no carry-over’ is best) and proceed, even at the risk of some bias.
To put it another way, applied statistics is a bias-variance trade-off but to know whether the game is worth the candle you have to make assumptions about the bias.
Stephen: It might illuminate how you’re using these terms to hear of the record of examples where people tried to do assumption-free inference and absurdities occurred. Or is this a more theoretical claim… . By the way, I’m really glad you emphasize in your post the experimental route to satisfying assumptions; it seems we’ve often gotten directed to examples where the data are come across, rather than deliberately generated.