Monthly Archives: April 2021

April 22 “How an information metric could bring truce to the statistics wars” (Daniele Fanelli)

The eighth meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

22 April 2021

TIME: 15:00-16:45 (London); 10:00-11:45 (New York, EST)

For information about the Phil Stat Wars forum and how to join, click on this link.

“How an information metric could bring truce to the statistics wars

Daniele Fanelli

Abstract: Both sides of debates on P-values, reproducibility, and other meta-scientific issues are entrenched in traditional methodological assumptions. For example, they often implicitly endorse rigid dichotomies (e.g. published findings are either “true” or “false”, replications either “succeed” or “fail”, research practices are either “good” or “bad”), or make simplifying and monistic assumptions about the nature of research (e.g. publication bias is generally a problem, all results should replicate, data should always be shared).

Thinking about knowledge in terms of information may clear a common ground on which all sides can meet, leaving behind partisan methodological assumptions. In particular, I will argue that a metric of knowledge that I call “K” helps examine research problems in a more genuinely “meta-“ scientific way, giving rise to a methodology that is distinct, more general, and yet compatible with multiple statistical philosophies and methodological traditions.

This talk will present statistical, philosophical and scientific arguments in favour of K, and will give a few examples of its practical applications.

Daniele Fanelli is a London School of Economics Fellow in Quantitative Methodology, Department of Methodology, London School of Economics and Political Science. He graduated in Natural Sciences, earned a PhD in Behavioural Ecology and trained as a science communicator, before devoting his postdoctoral career to studying the nature of science itself – a field increasingly known as meta-science or meta-research. He has been primarily interested in assessing and explaining the prevalence, causes and remedies to problems that may affect research and publication practices, across the natural and social sciences. Fanelli helps answer these and other questions by analysing patterns in the scientific literature using meta- analysis, regression and any other suitable methodology. He is a member of the Research Ethics and Bioethics Advisory Committee of Italy’s National Research Council, for which he developed the first research integrity guidelines, and of the Research Integrity Committee of the Luxembourg Agency for Research Integrity (LARI).


Readings: 

Fanelli D (2019) A theory and methodology to quantify knowledge. Royal Society Open Science – doi.org/10.1098/rsos.181055. (PDF)

4 page Background: Fanelli D (2018) Is science really facing a reproducibility crisis, and do we need it to? PNAS –doi.org/10.1073/pnas.1708272114. (PDF)


Slides & Video Links: 


See Phil-Stat-Wars.com

*Meeting 16 of our the general Phil Stat series which began with the LSE Seminar PH500 on May 21

Categories: Phil Stat Forum, replication crisis, stat wars and their casualties | Leave a comment

A. Spanos: Jerzy Neyman and his Enduring Legacy (guest post)

I am reblogging a guest post that Aris Spanos wrote for this blog on Neyman’s birthday some years ago.   

A. Spanos

A Statistical Model as a Chance Mechanism
Aris Spanos 

Jerzy Neyman (April 16, 1894 – August 5, 1981), was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

Neyman: 16 April

Neyman: 16 April 1894 – 5 Aug 1981

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for  non-random samples. Fisher’s original parametric statistical model Mθ(x) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data x0:=(x1,x2,…,xn) can be viewed as a ‘truly representative sample’ from that ‘population’:

“The postulate of randomness thus resolves itself into the question, Of what population is this a random sample? (ibid., p. 313), underscoring that: the adequacy of our choice may be tested a posteriori.’’ (p. 314)

In cases where data x0 come from sample surveys or it can be viewed as a typical realization of a random sample X:=(X1,X2,…,Xn), i.e. Independent and Identically Distributed (IID) random variables, the ‘population’ metaphor can be helpful in adding some intuitive appeal to the inductive dimension of statistical inference, because one can imagine using a subset of a population (the sample) to draw inferences pertaining to the whole population.

This ‘infinite population’ metaphor, however, is of limited value in most applied disciplines relying on observational data. To see how inept this metaphor is consider the question: what is the hypothetical ‘population’ when modeling the gyrations of stock market prices? More generally, what is observed in such cases is a certain on-going process and not a fixed population from which we can select a representative sample. For that very reason, most economists in the 1930s considered Fisher’s statistical modeling irrelevant for economic data!

Due primarily to Neyman’s experience with empirical modeling in a number of applied fields, including genetics, agriculture, epidemiology, biology, astronomy and economics, his notion of a statistical model, evolved beyond Fisher’s ‘infinite populations’ in the 1930s into Neyman’s frequentist ‘chance mechanisms’ (see Neyman, 1950, 1952):

Guessing and then verifying the ‘chance mechanism’, the repeated operation of which produces the observed frequencies. This is a problem of ‘frequentist probability theory’. Occasionally, this step is labeled ‘model building’. Naturally, the guessed chance mechanism is hypothetical. (Neyman, 1977, p. 99)

From my perspective, this was a major step forward for several reasons, including the following.

First, the notion of a statistical model as a ‘chance mechanism’ extended the intended scope of statistical modeling to include dynamic phenomena that give rise to data from non-IID samples, i.e. data that exhibit both dependence and heterogeneity, like stock prices.

Second, the notion of a statistical model as a ‘chance mechanism’ is not only of metaphorical value, but it can be operationalized in the context of a statistical model, formalized by:

Mθ(x)={f(x;θ), θ∈Θ}, x∈Rn , Θ⊂Rm; m << n,

where the distribution of the sample f(x;θ) describes the probabilistic assumptions of the statistical model. This takes the form of a statistical Generating Mechanism (GM), stemming from  f(x;θ), that can be used to generate simulated data on a computer. An example of such a Statistical GM is:

Xt = α0 + α1Xt-1 + σεt,  t=1,2,…,n

This indicates how one can use pseudo-random numbers for the error term  εt ~NIID(0,1) to simulate data for the Normal, AutoRegressive [AR(1)] Model. One can generate numerous sample realizations, say N=100000, of sample size n in nanoseconds on a PC.

Third, the notion of a statistical model as a ‘chance mechanism’ puts a totally different spin on another metaphor widely used by uninformed critics of frequentist inference. This is the ‘long-run’ metaphor associated with the relevant error probabilities used to calibrate frequentist inferences. The operationalization of the statistical GM reveals that the temporal aspect of this metaphor is totally irrelevant for the frequentist inference; remember Keynes’s catch phrase “In the long run we are all dead”? Instead, what matters in practice is its repeatability in principle, not over time! For instance, one can use the above statistical GM to generate the empirical sampling distributions for any test statistic, and thus render operational, not only the pre-data error probabilities like the type I-II as well as the power of a test, but also the post-data probabilities associated with the severity evaluation; see Mayo (1996).

I have restored all available links to the following references.

For further discussion on the above issues see:

Spanos, A. (2013), “A Frequentist Interpretation of Probability for Model-Based Inductive Inference,” in Synthese.

Fisher, R. A. (1922), “On the mathematical foundations of theoretical statistics,” Philosophical Transactions of the Royal Society A, 222: 309-368.

Mayo, D. G. (1996), Error and the Growth of Experimental Knowledge, The University of Chicago Press, Chicago.

Neyman, J. (1950), First Course in Probability and Statistics, Henry Holt, NY.

Neyman, J. (1952), Lectures and Conferences on Mathematical Statistics and Probability, 2nd ed. U.S. Department of Agriculture, Washington.

Neyman, J. (1977), “Frequentist Probability and Frequentist Statistics,” Synthese, 36, 97-131.

[i]He was born in an area that was part of Russia.

Categories: Neyman, Spanos | Leave a comment

Happy Birthday Neyman: What was Neyman opposing when he opposed the ‘Inferential’ Probabilists?

.

Today is Jerzy Neyman’s birthday (April 16, 1894 – August 5, 1981). I’m posting a link to a quirky paper of his that explains one of the most misunderstood of his positions–what he was opposed to in opposing the “inferential theory”. The paper is Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. It arises on p. 391 of Excursion 5 Tour III of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s a link to the proofs of that entire tour. If you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. He is not rejecting statistical inference in favor of behavioral performance as typically thought. Neyman always distinguished his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). You can find quite a lot on this blog searching Birnbaum.

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.
HAPPY BIRTHDAY NEYMAN!

What doesn’t Neyman like about Birnbaum’s advocacy of a Principle of Sufficiency S (p. 25)? He doesn’t like that it is advanced as a normative principle (e.g., about when evidence is or ought to be deemed equivalent) rather than a criterion that does something for you, such as control errors. (Presumably it is relevant to a type of context, say parametric inference within a model.) S is put forward as a kind of principle of rationality, rather than one with a rationale in solving some statistical problem

“The principle of sufficiency (S): If E is specified experiment, with outcomes x; if t = t (x) is any sufficient statistic; and if E’ is the experiment, derived from E, in which any outcome x of E is represented only by the corresponding value t = t (x) of the sufficient statistic; then for each x, Ev (E, x) = Ev (E’, t) where t = t (x)… (S) may be described informally as asserting the ‘irrelevance of observations independent of a sufficient statistic’.”

Ev(E, x) is a metalogical symbol referring to the evidence from experiment E with result x. The very idea that there is such a thing as an evidence function is never explained, but to Birnbaum “inferential theory” required such things. (At least that’s how he started out.) The view is very philosophical and it inherits much from logical positivism and logics of induction.The principle S, and also other principles of Birnbaum, have a normative character: Birnbaum considers them “compellingly appropriate”.

“The principles of Birnbaum appear as a kind of substitutes for known theorems” Neyman says. For example, various authors proved theorems to the general effect that the use of sufficient statistics will minimize the frequency of errors. But if you just start with the rationale (minimizing the frequency of errors, say) you wouldn’t need these”principles” from on high as it were. That’s what Neyman seems to be saying in his criticism of them in this paper. Do you agree? He has the same gripe concerning Cornfield’s conception of a default-type Bayesian account akin to Jeffreys. Why?

[i] I am grateful to @omaclaran for reminding me of this paper on twitter in 2018.

[ii] Or so I argue in my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, 2018, CUP.

[iii] Do you think Neyman is using “breakthrough” here in reference to Savage’s description of Birnbaum’s “proof” of the (strong) Likelihood Principle? Or is it the other way round? Or neither? Please weigh in.

REFERENCES

Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘, Revue De l’Institut International De Statistique / Review of the International Statistical Institute, 30(1), 11-27.

Categories: Bayesian/frequentist, Error Statistics, Neyman | 3 Comments

Intellectual conflicts of interest: Reviewers

.

Where do journal editors look to find someone to referee your manuscript (in the typical “double blind” review system in academic journals)? One obvious place to look is the reference list in your paper. After all, if you’ve cited them, they must know about the topic of your paper, putting them in a good position to write a useful review. The problem is that if your paper is on a topic of ardent disagreement, and you argue in favor of one side of the debates, then your reference list is likely to include those with actual or perceived conflicts of interest. After all, if someone has a strong standpoint on an issue of some controversy, and a strong interest in persuading others to accept their side, it creates an intellectual conflict of interest, if that person has power to uphold that view. Since your referee is in a position of significant power to do just that, it follows that they have a conflict of interest (COI). A lot of attention is paid to author’s conflicts of interest, but little into intellectual or ideological conflicts of interests of reviewers. At most, the concern is with the reviewer having special reasons to favor the author, usually thought to be indicated by having been a previous co-author. We’ve been talking about journal editors conflicts of interest as of late (e.g., with Mark Burgman’s presentation at the last Phil Stat Forum) and this brings to mind another one.

But is it true that just because a reviewer is put in a position of competing interests (staunchly believing in a position opposed to yours, while under an obligation to provide a fair and unbiased review) that their fairness in executing the latter is compromised? I surmise that your answer to this question will depend on which of two scenarios you imagine yourself in: In the first, you imagine yourself reviewing a paper that argues in favor of a position that you oppose. In the second, you imagine that your paper, which argues in favor of a view, has been sent to a reviewer with a vested interest in opposing that view.

In other words, if the paper argues in favor of a position, call it position X, and you oppose X, I’m guessing you imagine you’d have no trouble giving fair and constructive assessments of arguments in favor of X. You would not dismiss arguments in favor of X, just because you sincerely oppose X. You’d give solid reasons. You’d be much more likely to question if a reviewer, staunchly opposed to position X, will be an unbiased reviewer of your paper in favor of X. I’m not biased, but they are.

I think the truth is that reviewers with a strong standpoint on a controversial issue, are likely to have an intellectual conflict of interest in reviewing a paper in favor of a position they oppose. Recall that it suffices, according to standard definitions of an individual having a COI, that reasonable grounds exist to question whether their judgments and decisions can be unbiased. (For example, investment advisors avoid recommending stocks they themselves own, to avoid a conflict of interest.) If this is correct, does it follow that opponents of a contentious issue should not serve as reviewers of papers that take an opposite stance?  I say no because an author can learn a lot from a biased review about how to present their argument in the strongest possible terms, and how to zero in on the misunderstandings and confusions underlying objections to the view. Authors will almost surely not persuade such a reviewer by means of a revised paper, but they will be in possession of an argument that may enable them to persuade others.

A reviewer who deeply opposes position X will indeed, almost certainly, raise criticisms of a paper that favors X, but it does not follow that they are not objective or valid criticisms. Nevertheless, if all the reviewers come from this group, the result is still an unbalanced and unfair assessment, especially in that–objective or not–the critical assessment is more likely to accentuate the negative. If the position X happens to be currently unpopular, and opposing X the “received” position extolled by leaders of associations, journals, and institutions, then restricting reviewers to those opposed to X would obstruct intellectual progress. Progress comes from challenging the status quo and the tendency of people to groupthink and to jump on the bandwagon endorsed by many influential thought leaders of the day. Thus it would make sense for authors to have an opportunity to point out ahead of time to journal editors–who might not be aware of the particular controversy–the subset of references with a vested intellectual interest against the view for which they are arguing. If the paper is nevertheless sent to those reviewers, a judicious journal editor should weigh very heavily the author’s retorts and rejoinders. [1]

Here’s an example from outside of academia–the origins of the Coronavirus. The president of an organization that is directly involved with and heavily supported by funds for experimenting on coronaviruses, Peter Daszak, has a vested interest in blocking hypotheses of lab leaks or lab errors. Such hypotheses, if accepted, would have huge and adverse effects on that research and its regulation. When he is appointed to investigate Coronavirus origins, he has a conflict of interest. See this post.

Molecular biologist, Richard Ebright, one of the scientists to Call for a Full and Unrestricted International Forensic Investigation into the Origins of COVID-19 claims “the fact that the WHO named Daszak as a member of its mission, and the fact that the WHO retained Daszak as a member of its mission after being informed of his conflicts of interest, make it clear that the WHO study cannot be considered a credible, independent investigation.” (LINK) If all the reviewers of a paper in support of a lab association come from team Daszak, the paper is scarcely being given a fair shake.

Do you agree? Share your thoughts in the comments.

[1] The problem is compounded by the fact that today there are more journal submissions than ever, and with the difficulty in getting volunteers, there’s pressure on the journal editor not to dismiss the views of referees. My guess is that anonymity doesn’t play a big role most of the time.

 

Categories: conflicts of interest, journal referees | 12 Comments

ASA to Release the Recommendations of its Task Force on Statistical Significance and Replication

The American Statistical Association has announced that it has decided to reverse course and share the recommendations developed by the ASA Task Force on Statistical Significance and Replicability in one of its official channels. The ASA Board created this group [1] in November 2019 “with a charge to develop thoughtful principles and practices that the ASA can endorse and share with scientists and journal editors.” (AMSTATNEWS 1 February 2020). Some members of the ASA Board felt that its earlier decision not to make these recommendations public, but instead to leave the group to publish its recommendations on its own, might give the appearance of a conflict of interest between the obligation of the ASA to represent the wide variety of methodologies used by its members in widely diverse fields, and the advocacy by some members who believe practitioners should stop using the term “statistical significance” and end the practice of using p-value thresholds in interpreting data [the Wasserstein et al. (2019) editorial]. I think that deciding to publicly share the new Task Force recommendations is very welcome, given especially that the Task Force was appointed to avoid just such an apparent conflict of interest. Past ASA President, Karen Kafadar noted:

Many of you have written of instances in which authors and journal editors—and even some ASA members—have mistakenly assumed this [Wasserstein et al. (2019)] editorial represented ASA policy. The mistake is understandable: The editorial was co-authored by an official of the ASA.

… To address these issues, I hope to establish a working group that will prepare a thoughtful and concise piece … without leaving the impression that p-values and hypothesis tests…have no role in ‘good statistical practice’. (K. Kafadar, President’s Corner, 2019, p. 4)

Thus the Task Force on Statistical Significance and Replicability was born. Meanwhile, its recommendations remain under wraps. The one principle mentioned in Kafadar’s JSM presentation is that there be a disclaimer on all publications, articles, editorials authored by ASA staff, making it clear that the views presented are theirs and not the associations. It is good that we can now count on seeing the original recommendations. Were they only to have appeared in a distinct publication, perhaps in a non-statistics journal, we would never actually know if we were getting to see the original recommendations, or some modified version of them.

For a blogpost that provides the background to this episode, see “Why hasn’t the ASA board revealed the recommendations of its new task force on statistical significance and replicability?”

 

[1] Members of the ASA Task Force on Statistical Significance and Replicability

Linda Young, National Agricultural Statistics Service and University of Florida (Co-Chair)
Xuming He, University of Michigan (Co-Chair)
Yoav Benjamini, Tel Aviv University
Dick De Veaux, Williams College (ASA Vice President)
Bradley Efron, Stanford University
Scott Evans, The George Washington University (ASA Publications Representative)
Mark Glickman, Harvard University (ASA Section Representative)
Barry Graubard, National Cancer Institute
Xiao-Li Meng, Harvard University
Vijay Nair, Wells Fargo and University of Michigan
Nancy Reid, University of Toronto
Stephen Stigler, The University of Chicago
Stephen Vardeman, Iowa State University
Chris Wikle, University of Missouri

 

CHECK DATE OF THIS POST

 

REFERENCES:

Kafadar, K. Presidents Corner “The Year in Review … And More to Come” AMSTATNEWS 1 December 2019.

“Highlights of the November 2019 ASA Board of Directors Meeting”, AMSTATNEWS 1 January 2020.

Kafadar, K. “Task Force on Statistical Significance and Replicability Created”, AMSTATNEWS 1 February 2020.

Categories: conflicts of interest | Leave a comment

Blog at WordPress.com.