Dept Fish and Wildlife Sciences,
Dept Mathematics and Statistical Science
University of Idaho
Journal Editors Be Warned: Statistics Won’t Be Contained
I heartily second Professor Mayo’s call, in a recent issue of Conservation Biology, for science journals to tread lightly on prescribing statistical methods (Mayo 2021). Such prescriptions are not likely to be constructive; the issues involved are too vast.
The science of ecology has long relied on innovative statistical thinking. Fisher himself, inventor of P values and a considerable portion of other statistical methods used by generations of ecologists, helped ecologists quantify patterns of biodiversity (Fisher et al. 1943) and understand how genetics and evolution were connected (Fisher 1930). G. E. Hutchinson, the “founder of modern ecology” (and my professional grandfather), early on helped build the tradition of heavy consumption of mathematics and statistics in ecological research (Slack 2010). Investigators in the early days of the subfield of conservation biology, saw the need for stochastic approaches to modeling rare or colonizing populations and for assessing extinction jeopardy (MacArthur and Wilson 1967, Leigh 1981, Lande and Orzack 1988, Dennis 1989, Dennis et al. 1991). Data arising from modern molecular genetics are now a huge cornerstone of conservation, and analyzing such data well often requires considerable statistical sophistication. Other data in ecology are highly nonstandard and require custom made generalized linear models, generalized additive models, integrated models, state space models, structural equation models, spatial capture-recapture models… an ever-expanding list. Nonstandard data, and the very theories of ecology themselves, require the modern ecologist to master an extensive statistical arsenal (Ellison and Dennis 2010).
Lack of replicability has long been acknowledged in ecology, as ecological systems are severely heterogeneous. Ecologists turned heavily to hierarchical models of various sorts to better capture a fuller picture of the sources of variability in data. The likelihoods involved, for all but the usual normal-based random effects models, are wicked multiple integrals that for many years defied computation. The Bayesian revolution swept through ecology after the discovery in statistics that the posterior distributions for such models could be simulated with MCMC algorithms, bypassing the need to calculate likelihood functions. Most ecologists I talked to had little patience for the philosophical-scientific issues involved in the Bayesian/frequentist choice but rather were enthralled with the quantum leap in complexity and realism in models that could be handled with these Bayesian methods. Frequentist methods were late to the party, but the development of algorithms for likelihood maximization such as data cloning (Lele et al. 2007, Lele et al. 2010) have now given investigators a real choice between frequentist or Bayesian inference for hierarchical models. The philosophical issues can no longer be ignored; the choice between frequentist and Bayesian approaches has consequential differences in the types of conclusions to be drawn from data (Mayo 2018, Lele 2020a,b).
It is no wonder that ecologists have long indulged in substantial introspection and questioning of statistical practice. Single papers, single papers with commentary, forums in journals, whole journal issues, and even whole journals are devoted to expounding on and debating statistical methods in ecology. The “null hypothesis” as an ecological-scientific tool rated an entire issue of The American Naturalist (November 1983).
In a contemporary example, Frontiers in Ecology and Evolution devoted a “research topic” featuring papers on “evidence statistics.” The evidence project seeks to extend Richard Royall’s (1997) ideas about evidence to statistical cases with unknown parameters and misspecified models and to endow the approach with a frequentist error structure useful for pre-data design and post-data inference (Dennis et al. 2019, Taper et al. 2021). The extension is accomplished with “evidence functions” (Lele 2004). The main structural departure from Neyman-Pearson (NP) hypothesis testing or Fisherian significance testing is that the concepts of evidence and frequentist error are separated.
The quality of inferences should increase as the amount of data available. This presents problems for NP hypothesis testing if inferences are bound to error rates, as Type I error rates (alpha) are constant regardless of sample size. On the other hand, with evidence functions, both error rates (probabilities of misleading evidence, analogous to alpha and beta in NP testing) approach zero asymptotically as sample size increases, even when models are misspecified. Results thus far suggest that differences of consistent model selection indexes (such as SIC, a.k.a. BIC) retain properties of evidence functions. AIC differences by contrast have error properties similar to NP hypothesis testing (one of the probabilities of misleading evidence does not go to zero but rather becomes constant, similar to alpha in NP hypothesis testing). Evidence functions are for comparing two models; evidence functions are point estimates of the differences of discrepancies of two models from the true data generating mechanism. Interval estimates for evidence can be produced with valid coverage properties, even when models are misspecified.
An argument against an evidence-error project is the Likelihood Principle (LP), the concept that experiment outcomes giving equal likelihood to a parameter value must be considered equal evidence for that value. The concept requires, for instance, that 7 successes out of 20 Bernoulli trials is the same evidence for a particular value of the success probability regardless of whether the experiment was a binomial experiment (number of trials fixed) or as a negative binomial experiment (trials occur until 7 successes are attained). Mayo (2018) provides an entertaining takedown of the LP on philosophical-scientific grounds. Statistically, the variances of those two success probability estimates would be different between the two experiment designs, and so any assessment of long-run error rates must depend on design as well. Similarly, to consider error rates for evidence functions, the LP must necessarily be left behind.
Journal editors can best help ecology by facilitating, promoting, and encouraging such discourse. Prescribing some fixed statistical approach (as agriculture journals once did for multiple comparisons) in the instructions to authors is likely to be ill-informed and harmful to scientific progress. The statistical landscape is growing and changing rapidly, and how statistical approaches can contribute to a particular science is best left to practitioners to sort out on the journal pages.
- Dennis B. 1989. Allee effects: population growth, critical density, and the chance of extinction. Natural Resource Modeling 3:481-538.
- Dennis B., PL Munholland, JM Scott. 1991. Estimation of growth and extinction parameters for endangered species. Ecological Monographs 61:115-143.
- Dennis B, Ponciano JM, Taper ML, Lele SR. 2019. Errors in statistical inference under model misspeciﬁcation: evidence, hypothesis testing, and AIC. Frontiers in Ecology and Evolution 7:372.
- Ellison AM, Dennis B. 2010. Paths to statistical fluency for ecologists. Frontiers in Ecology and the Environment 8:362-370.
- Fisher RA. 1930. The genetical theory of natural selection. The Clarendon Press, Oxford, UK.
- Fisher RA, AS Corbet, SB Williams. 1943. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 12:42-58.
- Lande R, Orzack SH. 1988. Extinction dynamics of age-structured populations in a structured environment. Proceedings of the National Academy of Sciences (USA) 85:7418-7421.
- Leigh EG. 1981. The average lifetime of a population in a varying environment. Journal of Theoretical Biology 90:213-239.
- Lele SR. 2004. Evidence functions and the optimality of the law of likelihood. In: The nature of scientiﬁc evidence: statistical, philosophical and empirical considerations, eds ML Taper, SR Lele. The University of Chicago Press, Chicago, Illinois.
- Lele SR. 2020a. Consequences of lack of parameterization invariance of non-informative Bayesian analysis for wildlife management: survival of San Joaquin kit fox and declines in amphibian populations. Frontiers in Ecology and Evolution 7:501.
- Lele SR. 2020b. How should we quantify uncertainty in statistical inference? Frontiers in Ecology and Evolution 8:35.
- Lele SR, Dennis B, Lutscher F. 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10:551–563.
- Lele SR, Nadeem K, Schmuland B. 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105:1617–1625
- MacArthur RH, Wilson EO. 1967. The theory of island biogeography. Princeton University Press, Princeton, New Jersey.
- Mayo, D. 2018. Statistical inference as severe testing: how to get beyond the statistics wars. Cambridge University Press, Cambridge, UK.
- Mayo, D. 2021. The statistics wars and intellectual conﬂicts of interest. Conservation Biology 2021:1-3.
- Royall R. 1997. Statistical evidence: a likelihood paradigm. Chapman & Hall, London, UK.
- Slack NG. 2010. G. Evelyn Hutchinson and the invention of modern ecology. Yale University Press, New Haven, Connecticut.
- Taper ML, Lele SR, Ponciano JM, Dennis B, Jerde CL. 2021. Assessing the global and local uncertainty of scientific evidence in the presence of model misspecification. Frontiers of Ecology and Evolution 9:679155.
All commentaries on Mayo (2021) editorial until Jan 31, 2022 (more to come*)
Ionides and Ritov
*Let me know if you wish to write one
Brian: Thanks so much for the commentary on my editorial. “Journal editors can best help ecology by facilitating, promoting, and encouraging such discourse. Prescribing some fixed statistical approach (as agriculture journals once did for multiple comparisons) in the instructions to authors is likely to be ill-informed and harmful to scientific progress.” Curious about the prescriptions that were given there. I’d also be glad to understand how “algorithms for likelihood maximization such as data cloning (Lele et al. 2007, Lele et al. 2010) have now given investigators a real choice between frequentist or Bayesian inference for hierarchical models. The philosophical issues can no longer be ignored; the choice between frequentist and Bayesian approaches has consequential differences in the types of conclusions to be drawn from data.” because earlier you said that was not a concern.
The only thing I might wonder about are controlling error probs with misspecified models, but maybe the asymptotic previsions saves that.
These claims about misspecified models are somewhat problematic. I had a look at the Dennis et al. (2019) paper. What these claims apparently mean is that a true model with density g is assumed, and when comparing densities f1 and f2, evidence claims are interpreted as claims regarding which of f1 or f2 is closer to the true g in terms of Kullback-Leibler.
This is a legitimate thing to look at, but doesn’t really solve the model misspecification problem in much generality. The problem is that Kullback-Leibler itself is vulnerable to model misspecification.
Let’s say f1 and f2 are normal distributions with means 0 and 1 and same variance sigma^2, and g=0.99phi(0,sigma^2)+0.01phi(100,\sigma^2), phi(a,s^2) being the normal density with mean a and variance s^2. 99% of this distribution agrees with f1, yet in terms of Kullback-Leibler it is closer to f2. Vulnerability of likelihood-based inference to outliers is well known, and KL can’t get around it. Another example is that in reality data are rounded and not continuous. If the true distribution is, say, a rounded normal (i.e., a discrete distribution on values with max. 2 digits after the decimal point, say, obtained from rounding a normal random variable), loglikelihood ratios of both f1 and f2 with g are infinite, and which one of these is closer regarding KL is undefined, even though g may be, apart from rounding, perfectly equal to f1 or f2.
Pingback: Paul Daniell & Yu-li Ko commentaries on Mayo’s ConBio Editorial | Error Statistics Philosophy
Pingback: Schachtman Law » Statistical Significance Test Anxiety