What can we learn from the debate on statistical significance?
The statistical community is in the midst of crisis whose latest convulsion is a petition to abolish the concept of significance. The problem is perhaps neither with significance, nor with statistics, but with the inconsiderate way we use numbers, and with our present approach to quantification. Unless the crisis is resolved, there will be a loss of consensus in scientific arguments, with a corresponding decline of public trust in the findings of science.
#The sins of quantification
Every quantification which is unclear as to its scope and the context in which it is produced obscures rather than elucidates.
Traditionally, the strength of numbers in the making of an argument has rested on their purported objectivity and neutrality. Expressions such as “Concrete numbers”, “The numbers speak for themselves”, “The data/the model don’t lie” are common currency. Today, doubts about algorithmic instances of quantification – e.g. in promoting, detaining, conceding freedom or credit, are becoming more urgent and visible. Yet the doubt should be general. It is becoming realised that in every activity of quantification, the technique or the methods are never neutral, because it is never possible to separate entirely the act of quantifying from the wishes and expectations of the quantifier. Thus, books apparently telling separate stories, such as Rigor Mortis, Weapons of Math Destruction, the Tyranny of Metrics, or Useless Arithmetic, dealing with statistics, algorithms, indicators and models, share a common concern.
# Statisticians know
Statisticians are increasingly aware that each number presupposes an underlying narrative, a worldview, and a purpose of the exercise. The maturity of this debate in the house of statistics is not an accident. Statistics is a discipline, with recognized leaders and institutions, and although one might derive an impression of disorder by the use a petition to influence a scientific argument, one cannot deny that the problems in statistics are being tackled head on, in the public arena, in spite of the obvious difficulty for the lay public to follow the technicality of the arguments. With its ongoing discussion of significance, the community of statistics is teaching us an important lesson about the tight coupling between technique and values. How so? We recap here some elements of the debate.
- For some, it would be better to throw away the concept of significance altogether, because the p-test, – with its magical p<0.05 threshold, is being misused as a measure of veracity and publishability.
- Others object that discussion should not take place with the instrument of a petition and that withdrawing tests of significance would make science even more uncertain.
- The former retort that since this discussion has been going on for decades on academic journal without the existing flaws being fixed, then perhaps times are ripe for action.
A good vantage point to look at this debate in its entirety is this section in Andrew Gelman’s blog.
# Different worlds
An important aspect of this discussion is that the contenders may inhabit different worlds. One world is full of important effects which are overlooked because the test of significance fails (p value greater that 0.05 in statistical parlance). The other world is instead replete with bogus results passed on to the academic literature thanks to a low value of the p-test (p<0.05).
A modicum of investigation reveals that the contention is normative, or indeed political. To take an example, some may fear the introduction on the market of ineffectual pharmaceutical products, others that important epidemiological effects of a pollutant on health may be overlooked. The first group would thus have a more restrictive value for the test, the second group a less restrictive one.
All this is not new. Philosopher Richard Rudner had already written in 1953 that it is impossible to use a test of significance without knowing to what it is being applied, i.e. without making a value judgment. Interestingly, Rudner used this example to make the point that scientists do need to make value judgments.
# How about mathematical models?
In all this discussion mathematical models have enjoyed a relative immunity, perhaps because mathematical modelling is not a discipline. But the absence of awareness of a quality problem is not proof of the absence of a problem. And there are signals that the crisis there might be even worse than that which is recognised in statistics.
Implausible quantifications of the effect of climate change on the gross domestic product of a country at the year 2100, or of the safety of a disposal for nuclear waste a million years from now, or of the risk of the financial products at the heart of the latest financial crisis, are just examples that are easily seen in the literature. Political decision in the field of transports may be based on a model which needs as an input the average number of passengers sitting is a car several decades in the future. A scholar studying science and technology laments the generation of artefactual numbers through methods and concepts such as ‘expected utility’, ‘decision theory’, ‘life cycle assessment’, ‘ecosystem services’ ‘sound scientific decisions’ and ‘evidence-based policy’ to convey a spurious impression of certainty and control over important issues concerning health and the environment. A rhetorical use of quantification may thus be used in evidence-based policy to hide important knowledge and power asymmetries: the production of evidence empowers those who can pay for it, a trend noted in both the US and Europe.
Since its inception the current of post normal science (PNS) has insisted on the need to fight against instrumental or fantastic quantifications. PNS scholars suggested the use of pedigree for numerical information (NUSAP), and recently for mathematical models. Combined with PNS’ concept of extended peer communities, these tools are meant to facilitate a discussion of the various attributes of a quantification. This information includes not just its uncertainty, but also its history, the profile of its producers, its position within a system of power and norms, and overall its ‘fitness for function’, while also identifying the possible exclusion of competing stakes and worldviews.
Stat-Activisme, a recent French intellectual ovement, proposes to ‘fight against’ as well as ‘fight with’ numbers. Stat-activisme targets invasive metrics and biased statistics, with a rich repertoire of strategies from ‘statistical judo’ to the construction of alternative measures.
As philosopher Jerome Ravetz reminds us, so long as our modern scientific culture has faith in numbers as if they were ‘nuggets of truth’, we will be victims of ‘funny numbers’ employed to rule our technical society.
Note: A different version of this piece has been published in Italian in the journal Epidemiologia and Prevenzione.