David Hand: Trustworthiness of Statistical Analysis (LSE PH 500 presentation)

This was David Hand’s guest presentation (25 June) at our zoomed graduate research seminar (LSE PH500) on Current Controversies in Phil Stat (~30 min.)  I’ll make some remarks in the comments, and invite yours.


Trustworthiness of Statistical Analysis

David Hand

Abstract: Trust in statistical conclusions derives from the trustworthiness of the data and analysis methods. Trustworthiness of the analysis methods can be compromised by misunderstanding and incorrect application. However, that should stimulate a call for education and regulation, to ensure that methods are used correctly. The alternative of banning potentially useful methods, on the grounds that they are often misunderstood and misused is short-sighted, unscientific, and Procrustean. It damages the capability of science to advance, and feeds into public mistrust of the discipline.

Below are Prof.Hand’s slides w/o audio, followed by a video w/audio. You can also view them on the Meeting #6 post on the PhilStatWars blog (https://phil-stat-wars.com/2020/06/21/meeting-6-june-25/).



VIDEO: (Viewing in full screen mode helps with buffering issues.)

Categories: LSE PH 500 | Tags: , , , , , ,

Post navigation

7 thoughts on “David Hand: Trustworthiness of Statistical Analysis (LSE PH 500 presentation)

  1. rkenett

    David – I hope you do not mind listing your inspiring foreword to our book on Information quality, It seems to fit well with your great talk:

    “Foreword for Information Quality by Ron S. Kenett and Galit Shmueli

    I am often invited to assess research proposals. Included amongst the questions I have to ask myself in such assessments are: Are the goals stated sufficiently clearly? Does the study have a good chance of achieving the stated goals? Will the researchers be able to obtain sufficient quality data for the project? Are the analysis methods adequate to answer the questions? And so on. These questions are fundamental, not merely for research proposals, but for any empirical study – for any study aimed at extracting useful information from evidence or data. And yet they are rarely overtly stated. They tend to lurk in the background, with the capability of springing into the foreground to bite those who failed to think them through.

    These questions are precisely the sorts of questions addressed by the InfoQ – Information Quality – framework. Answering such questions allows funding bodies, corporations, national statistical institutes, and other organisations to rank proposals, balance costs against success probability, and also to identify the weaknesses and hence improve proposals and their chance of yielding useful and valuable information. In a context of increasing constraints on financial resources, it is critical that money is well spent, so that maximising the chance that studies will obtain useful information is becoming more and more important. The InfoQ framework provides a structure for maximising these chances.

    A glance at the statistics shelves of any technical library will reveal that most books focus narrowly on the details of data analytic methods. The same is true of almost all statistics teaching. This is all very well – it is certainly vital that such material be covered. After all, without an understanding of the basic tools, no analysis, no knowledge extraction would be possible. But such a narrow focus typically fails to place such work in the broader context, without which its chances of success are damaged. This volume will help to rectify that oversight. It will provide readers with insight into and understanding of other key parts of empirical analysis, parts which are vital if studies are to yield valid, accurate, and useful conclusions.

    But the book goes beyond merely providing a framework. It also delves into the details of these overlooked aspects of data analysis. It discusses the fact that the same data may be high quality for one purpose and low for another, and that the adequacy of an analysis depends on the data and the goal, as well as depending on other less obvious aspects, such as the accessibility, completeness, and confidentiality of the data. And it illustrates the ideas with a series of illuminating applications.

    With computers increasingly taking on the mechanical burden of data analytics the opportunities are becoming greater for us to shift our attention to the higher order aspects of analysis: to precise formulation of the questions, to consideration of data quality to answer those questions, to choice of the best method for the aims, taking account of the entire context of the analysis. In doing so we improve the quality of the conclusions we reach. And this, in turn, leads to improved decisions – for researchers, policy makers, managers, and others. This book will provide an important tool in this process.

    David J. Hand
    Imperial College, London”

  2. David Hand does a superb job zeroing in on the serious casualties of the wars surrounding statistical significance tests being waged by a Ron Wasserstein and others. “Proposals to abandon the use of significance testing and play down the of p-values risk implying that the statistical community accepts that those tools are unsuitable, rather than misuse of those tools is the problem. Abandoning the very tools which enable trust in conclusions would represent the most dramatic example of a scientific discipline shooting itself in the foot.” Relinquishing tools known to be important for the central statistical goal of distinguishing genuine effects from random error simply because the requirements for their valid use may be violated is a very bad argument. Replacing them with tools unable or less able to pick up on such violations just encourages the use untrustworthy methods. The argument comes across as largely reflecting a preference for promoting a different statistical philosophy, rather than promoting more trustworthy science–resulting in further public mistrust of statistics.

  3. Excellent presentation by David Hand

  4. Thank you Prof. Hand. I also very much enjoyed “Dark Data”, one of my two favourite recent statistics books (along with SIST, of course).

    When I was younger, I dismissed the varied versions of “if you need statistics, you ought to have done a better experiment” as just an example of how even great scientists sometimes have huge blind spots. I now believe, however, that Rutherford may have been more perceptive than I gave him credit for. He might just have recognized that “if you are arguing over statistics, doing a better experiment” is usually the best way to resolve such controversies in physics. The challenge for life and social sciences is that people are expensive, different, and constantly changing, while physicists get to study (for example) electrons that are cheap, identical, and immutable. Hence it is relatively easy for physicists to do a “better experiment”, and when physicists make a new measurement of an interesting quantity, it is typically twice as precise as the best previous measurement of the same quantity, compared to medical sciences where new measurements are often less precise (http://rsos.royalsocietypublishing.org/content/4/1/160600). The trustworthiness of physics does not come primarily from its use of good statistical methods, but because physics is the easiest science.

    It is also good to remember that the existence of “several different schools of statistical thought” is not always a problem.
    The essential way that systematic and methodological errors are usually found or constrained is by making measurements in as many different ways as possible.
    If different statistical methods agree in their basic conclusion about some data or experiment, I am more likely to trust that conclusion.

    • David:
      Multiple methods are good when one can be trusted to unearth a mistake overlooked by another. Wasserstein et al. (2019) claim, in their enthusiasm to derogate the p-value, that no p-value can show the presence of an effect. So, if taken seriously, p-values couldn’t be used to check other methods. Of course, I doubt anyone will take it seriously because, to start with, the simple significance test is used to test assumptions of models and methods used in other approaches. However, this does not remove Professor Hand’s concern that Wasserstein et al., (2019) promote the view that stat doesn’t trust its own methods.

      • rkenett


        I think the point is that one needs both a constructive proposal and a method to achieve it.

        David Hand presents various aspects of trust and trustworthiness, a great constructive proposal. Ron Wasserstein is discussing what not to do, amd the effect of that discussion has been a reduction in trustworthiness, as mentioned by David.

        David Hand, in a 1994 paper, sketched a framework for deconstructing statistical questions (Deconstructing statistical questions (with discussion), Journal of the Royal Statistical Society, Series A, 157, 317-356, 1994)

        Twenty years later, Galit Shmueli and me used the deconstruction approach to lay out an information quality framework (On Information Quality, Journal of the Royal Statistical Society, Series A (with discussion), Vol. 177, No. 1, pp. 3-38, 2014).

        So to achieve trustworthiness, like achieving information quality, requires a deconstruction exercise. Information quality is translated into 8 dimensions. How would one decontructrut trustworthiness? Severe testing is one aspect. There are other aspects to consider, like communication etc…

        This is again raising the issue that a wide perspective of applied statistics needs considerations of several aspects, beyond what is covered by statistical inference.

  5. Pingback: Meeting 6 (June 25) | PhilStatWars

Blog at WordPress.com.