“Error statistical modeling and inference: Where methodology meets ontology” A. Spanos and D. Mayo



A new joint paper….

“Error statistical modeling and inference: Where methodology meets ontology”

Aris Spanos · Deborah G. Mayo

Abstract: In empirical modeling, an important desideratum for deeming theoretical entities and processes real is that they can be reproducible in a statistical sense. Current day crises regarding replicability in science intertwine with the question of how statistical methods link data to statistical and substantive theories and models. Different answers to this question have important methodological consequences for inference, which are intertwined with a contrast between the ontological commitments of the two types of models. The key to untangling them is the realization that behind every substantive model there is a statistical model that pertains exclusively to the probabilistic assumptions imposed on the data. It is not that the methodology determines whether to be a realist about entities and processes in a substantive field. It is rather that the substantive and statistical models refer to different entities and processes, and therefore call for different criteria of adequacy.

Keywords: Error statistics · Statistical vs. substantive models · Statistical ontology · Misspecification testing · Replicability of inference · Statistical adequacy

To read the full paper: “Error statistical modeling and inference: Where methodology meets ontology.”

The related conference.

Mayo & Spanos spotlight

Reference: Spanos, A. & Mayo, D. G. (2015). “Error statistical modeling and inference: Where methodology meets ontology.” Synthese (online May 13, 2015), pp. 1-23.

Categories: Error Statistics, misspecification testing, O & M conference, reproducibility, Severity, Spanos

Post navigation

2 thoughts on ““Error statistical modeling and inference: Where methodology meets ontology” A. Spanos and D. Mayo

  1. an important desideratum

  2. There is much in the paper with which I agree, there is also much with
    which I disagree. On the agreement side the authors emphasize that
    models are approximation and that they are adequate rather than true
    as in `statistically adequate’. On the disagreement side their vocabulary
    contains words associated with a concept of truth such as `actual’ as in
    `{\it actual} error probabilities’ and `wrong’ as in `wrong
    likelihood’. I have no idea what an actual error probability is
    unless the concept is restricted to simulations. Does the use of `wrong
    likelihood’ mean that there is some `correct likelihood’ and, if so,
    how to be recognize it when we see it (or them)? The disagreement is
    about substantial matters which are reflected in the vocabulary.

    The authors concept of statistical adequacy relies on the
    ability to simulate data sets under the model and comparing these
    simulated data sets with the real data. This is to be applauded but
    unfortunately the form of comparison is never made precise. Here is
    how it is done in `Data Analysis and Approximate Models’. A model is a
    fully specified probability measure, that is, all parameters have
    explicit values as they must have if the model is to be used for
    simulations. The next step is to decide which features of the data
    set are to be replicated by the model. Suppose for the sake of
    argument the model is that of i.i.d. Gaussian random variables and
    the features of interest are (i) shape as measured by the Kolmogorov
    distance between the empirical and model distributions
    $T_1=d_{ko}(\ep_n, N(\mu,\sigma^2))$ and (ii) the
    lack of outliers as measure by $T_2=\max_i \vert X_i-\mu\vert/\sigma$.
    These play the role of the mis-specification tests of the authors.
    One now generates data and the model $N(\mu,\sigma^2)$ and calculates
    say the 0.975-quantiles of $T_1$ and $T_2$, say $q_1(0.975)$ and
    $q_2(0.975)$ respectively. Given data $x_1,\ldots,x_n$ the set of
    adequate Gaussian models are those $N(\mu,\sigma^2)$ for which
    $d_{ko}(\ep_n, N(\mu,\sigma^2))\le q_1(0.975)$ and $max_i \vert
    x_i-\mu\vert/\sigma\le q_2(0.975)$.

    Note that this concept of adequacy specifies the parameter
    values. Maximum likelihood has nothing to add although one can include
    the behaviour of the mean and standard deviation in the features to be
    replicated. One can define mis-specification tests without specifying
    parameters. Thus $T_1$ can be replaced by $T_3=\inf_{\mu,\sigma}
    d_{ko}(\ep_n, N(\mu,\sigma^2))$ and $T_2$ by $T_4=\max_i \vert
    x_i-mean(x)\vert/sd(x)$ where $mean(x)$ and $sd(x)$ are the mean and
    standard deviation of the data. Now a model can be declared adequate
    without specifying any parameter values. This leaves the statistician
    free to use say maximum likelihood in the interests of efficiency
    and severity. This can however go completely wrong as the resulting maximum
    likelihood estimate can produce parameter values for which the
    resulting model is an arbitrarily poor approximation to the data.

    Finally a comment on severity. Suppose the model is $N(\mu,\sigma^2)$
    and the null hypothesis is $H_0: \mu=0$. Presumably a severe test will
    be based on the mean of the sample. However the careful statistician
    decides first to check the adequacy of the model using some
    mis-specification tests. The data pass the mis-specification tests and the null
    hypothesis is accepted. Suppose we now consider all symmetric
    location/scale models which pass the mis-specification tests and then
    use maximum likelihood to define a severe test of $H_0$. It turns out
    that the test using the Gaussian model is the least severe of all the
    tests. The moral is that severity depends not only on the data but on
    the model and that severity can be imported from the model. Tukey
    calls this a free lunch. In mathematical terms the testing $H_0$ is an
    ill-posed problem if the model can also be chosen. The problem need
    regularizing and one way of doing this is to use minimum Fisher
    information models. The Gaussian model is one such. The test based on
    the mean is the severest test using the least severe model, that is
    that model which does not introduce spurious severity. I miss a
    discussion of this problem in the paper.

Blog at WordPress.com.