
.
A topic that came up in some comments recently reflects a recent tendency to divorce statistical inference (bad) from statistical thinking (good), and it deserves the spotlight of a post. I always alert authors of papers that come up on this blog, inviting them to comment, and one from Christopher Tong (reacting to a comment on Ron Kenett) concerns this dichotomy.
Response by Christopher Tong to D. Mayo’s July 14 comment
TONG: In responding to Prof. Kenett, Prof. Mayo states: “we should reject the supposed dichotomy between ‘statistical method and statistical thinking’ which unfortunately gives rise to such titles as ‘Statistical inference enables bad science, statistical thinking enables good science,’ in the special TAS 2019 issue. This is nonsense.” [Mayo July 14 comment here.]
I am the author of the paper whose title she attacks as “nonsense”. If she had read my paper she would know that, like Kenett, I am advocating placing statistical thinking at the center of statistical teaching and practice. The dichotomy that she thinks is false exists in much of actual teaching and practice, and is one that it seems both Kenett and I are trying to undo. The title of my paper reflects the real (not the ideal) situation, and if that’s “nonsense”, then (and I would agree) much of statistical teaching and practice is nonsense. Finally I note that mine is one of only two papers in the 2019 special issue that even contains the phrase “statistical thinking” in the title. I strongly recommend the other one, which offers a concrete solution to how the “integrating” that Kenett speaks of can be done in statistics education.
The views expressed are my own.
Response by Mayo to Tong:
MAYO: Thank you for your comment. My thinking was that it would be good to alert the authors of the papers Lakens discusses, and I’m glad that you have. I have read your paper, and, as much as your highly provocative title earns rewards in contexts such as the special issue in which it appears, in my thinking, it does an enormous disservice to statistical inference as “enabling bad science”. Your paper itself—which reviews many right-headed contributions—shows that the very insights and tools that your “good statistical thinking” requires are themselves at the foundations of frequentist error statistical methodology and depend upon statistical inference methods, formal and informal. The formal tools were developed as deliberate idealizations by the founders as exemplars—to check and improve our ordinary (pre-statistics) statistical thinking. Grasping the brilliance of how this works demands a clear understanding of the mathematical and conceptual tools.
I wrote a book Statistical Inferisence as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2018): SIST. You can find all 16 “tours” on this blog (in final draft form) in this post: Blurbs of 16 Tours: Statistical Inference as Severe Testing; How to Get Beyond the Statistics Wars (SIST).
The notion of statistical inference developed there is very different from your sterile depiction. You talk as if practicing scientists in fields that employ statistical method gain their first exposure to statistics at the point that they are doing applied research. This should not be true. High school students, if they are to be critical consumers of the policies and decisions that will affect them in their lives—let alone conduct research—should study statistical method, including experimental design. Nor can statistical researchers without a VERY clear understanding of statistical concepts and computations assume they only need to think about the domain field, and assume good science will emerge. They should have a deep grasp of the formal methods that others will use to check their models and results. Statistical significance tests and tests of statistical hypotheses more generally, are intimately connected to experimental design, as Fisher emphasized.
I worry that your paper warns these students off, claiming it will only endanger their ability to do good science. What a relief for the students! This is one of their hardest courses, and now they can point to an important journal that has an article that warns us NOT to study statistical inference.
Statistical significance tests are just one small part of statistical science, but they are piecemeal methods and cannot all be learned at one time. Fisher wrote a book, Statistical Methods and Scientific Inference; the integration of the two was there from the start.
Testing statistical assumptions is a crucial part of error statistical methods. You mention Box, but he is talking about Bayesian vs frequentist methods. Box considered that Bayesian inference gives the formal, deductive part of inference which, in his view, could enter only after the creative, inductive work of arriving at and testing a model, which he claimed requires statistical significance tests:
[S]ome check is needed on [the brain’s] pattern seeking ability, for common experience shows that some pattern or other can be seen in almost any set of data or facts. This is the object of diagnostic checks and tests of fit which, I will argue, require frequentist theory significance tests for their formal justification. (Box 1983, 57)
Yet you say “formal, probability-based statistical inference should play no role in most scientific research, which is inherently exploratory, requiring flexible methods of analysis that inherently risk overfitting”. Box disagrees, saying we need checks on such risks, and statistical significance tests provides that. Eye-balling the data won’t suffice. (I say this after having worked with Aris Spanos, an expert on testing model assumptions.) Whenever we use data to solve statistical problems we are doing statistical inference: this goes beyond the data, and thus it is inductive or ampliative. (A paper I wrote with David Cox in 2006 is called: “Frequentist Statistics as a Theory of Inductive Inference”.)
There is no suggestion whatever that the significance test would typically be the only analysis reported. In fact, a fundamental tenet of the conception of inductive learning most at home with the frequentist philosophy is that inductive inference requires building up incisive arguments and inferences by putting together several different piece-meal results. Although the complexity of the story makes it more difficult to set out neatly, as, for example, if a single algorithm is thought to capture the whole of inductive inference, the payoff is an account that approaches the kind of full-bodied arguments that scientists build up in order to obtain reliable knowledge and understanding of a field. (Mayo & Cox 2006, 82.)
Perhaps some pedagogical treatments of statistical inference methods are overly formal, allowing students to just use computers to get the answer. Maybe that’s what’s behind your saying that there’s a divorce between statistical inference (bad) and statistical thinking (good). I say that computing solutions by hand provides a much deeper understanding of methods, and of where our intuitive thinking about probability and statistical inference is often badly wrong. It seems you’re missing that the key rationale for using deliberately idealized models in statistics is in order to learn from data how they fail and how to improve them. Used correctly, they serve as references for severe testing.
Of course, as you stress, “exploratory” inquiry and model building require a data dependence that would not be kosher in a predesignated “confirmatory inquiry”. But even in exploratory inquiry, we can use data both to build and severely probe such questions as whether a given method or model ought to be modified, whether it will serve to find out what we want to know, despite approximations, etc. Moreover, in exploratory inference, there are still statistical assumptions that ought to, and can, be checked by methods with different assumptions, and triangulating results. Fisher, Neyman and many, many others gave us mathematics to show how various designs (e.g., randomizations), and remodeling of data allow “subtracting out” or compensating misspecifications. Contemporary methods go further, but puzzlingly, you reject all such “technical fixes”.
I’m inclined to think that John Byrd had it right (in his comment on Kenett—who I do not think shares your view of statistical inference as enabling bad science):
So, I say that the reasoning underlying these [data science] approaches was given to us by Fisher, Neyman, Pearson, Deming, Cohen, Cox and others from our past. If “data science” becomes ignorant of what statistics can teach us, they will end up re-inventing these same concepts that guide error control, sampling issues, etc. Then we all get to watch a younger generation think they invented such concepts. (Link to J. Byrd July 14 full comment)
Another response to Kenett that seems right-headed is that of Christian Hennig:
Then, on the other side, statistical methodology is to quite some extent a formalisation of principles of statistical thinking, and if we want to analyse formally the implications of our thinking (and more broadly how to do it best), we generate “statistical methodology” by modelling situations (probability models) and decision making (statistical methods, model-based or not). “Errors” and “error probabilities” are then relevant again in the sense that statistical thinking can be criticised by saying, “if you apply “statistical thinking principle”, i.e., method A in artificial situation B in which we know (as we can “control” the truth when assuming models) that we should arrive at conclusion C, in fact you will quite likely arrive at conclusion D which is opposite to C”, then we have learnt something about how statistical thinking can be led astray.
I don’t really think this kind of reasoning can be easily replaced, and I claim that quality statistical thinking needs to be informed by such knowledge. (Link to C. Hennig’s July 17 comment)
There are several other comments on Kenett’s post both before and after Tong’s that might interest y. I invite your thoughts in the comments.
References
Box, G. (1983). An Apology for Ecumenism in Statistics, in Box, G., Leonard, T., and Wu, D. (eds.), Scientific Inference, Data Analysis, and Robustness, New York: Academic Press, 51–84.
Tong, C. (2019). Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science. The American Statistician, 73(sup1), 246–261. https://doi.org/10.1080/00031305.2018.1518264



Interesting exchange between Tong and Mayo.
My piece in https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2171179
starts with this quote: “Much fine work in statistics involves minimal mathematics; some bad work in statistics gets by because of its apparent mathematical content.”
David Cox (1981), Theory and general principle in statistics, JRSS(A), 144, pp. 289-297.
I think the Tong-Mayo exchange relates to the point of reference. If you see statistics as a domain meeting practical challenges, you will have a different perspective from a mathematical one.
In many cases it is a dichotomous split. Cox was unique in that he could combine both perspectives in order to do “fine work”. Box was also driven by this objective and wrote: Sampling and Bayes’ Inference in Scientific Modelling and Robustness, https://www.cs.princeton.edu/courses/archive/fall11/cos597C/reading/Box1980.pdf. That paper was overreaching the frequentist-Bayesian animated debates. An this was 1980….To this we can add Tukey’s perspective on the future of data analysis https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-33/issue-1/The-Future-of-Data-Analysis/10.1214/aoms/1177704711.full and Breiman’s two cultures in statistical modeling https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling–The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.full
In dealing with statistics in modern days, I believe that one needs a wider perspective that does not ignore developments in data science. We developed the information quality framework to reach this synergy of perspectives on contributions of statistics https://sites.google.com/site/datainfoq. It was meant to be a starting point.
We need to reach such wider platforms in these days of singularity where statistics and statistical thinking have huge opportunities. For a view of the environment that needs to be accounted for see https://hdsr.mitpress.mit.edu/pub/g9mau4m0/release/2
Discussions in this blog can make interesting contributions to this endeavor.
I completely understand the point of your calling for integration of the parts of overarching inquiries that employ data, but Tong’s position much different. He calls for ousting all formal statistical methods as enablers of bad science. In my view, without a deep and clear understanding of how formal methods and concepts, provide idealized reasoning for statistical problems, one lacks the tools for critical inference in practice. Fisher in “The Logic of Inductive Inference”:
“The purpose of randomisation . . . is to guarantee the validity of the test of significance, this test being based on an estimate of error made possible by replication”. (Fisher [1935b]1951, p. 26)
Note how Stephen Senn exposes and demonstrates faulty intuitive thinking about randomization and about control using formal tools:
https://errorstatistics.com/wp-content/uploads/2018/01/deaton-cartwright-2017-understanding-and-misunderstanding-randomized-controlled-trials.pdf
https://errorstatistics.com/2020/07/20/stephen-senn-losing-control-guest-post/
Once I saw the mathematical demonstration of how our statistical thinking can go wrong in such toy examples as “The lady tasting tea”, the overarching pattern of reasoning became clear. Tong also rejects the idea of mathematical and computational methods entering to allow learning despite shortcomings and approximations, including cross-validation and bootstrapping because “such methods must still fail to guarantee reliable statistical inference, because they cannot eliminate model uncertainty and systematic error”. This presupposes that appealing to methods of statistical inference would only be warranted if they could function on automatic pilot. Humans are smarter than that.
Tong advises that methods of statistical inference “should rarely be applied,” but his conception of what it means to “apply” them (i.e., to isolated cases where their assumptions literally hold) is very far from how they are used in good science. Granted, statistical significance testing is a small part of science—entering when there’s a need to distinguish random error, but this is a crucial task, and it serves as part of a complex, piecemeal inquiry with an overarching logic. We obtain a general philosophy of statistical inference from formal statistical methodology, which is why philosophers of science often appeal to them. Nobody recommends ousting formal propositional logic, e.g., modes tollens, even though it captures a very small part of actual reasoning; nor would we recommend not teaching more advanced logics because they too idealize actual reasoning. The whole point of formal methods is to have reference points and idealized exemplars for illuminating and finding flaws in actual, messy inference. And if we are not making inferences from data, we’re not learning from them.
Mayo: Let me refer to you last sentence here namely “The whole point of formal methods is to have reference points and idealized exemplars for illuminating and finding flaws in actual, messy inference. And if we are not making inferences from data, we’re not learning from them.”
Four comments to this
For me at least, all the above involve statistical thinking…
Kenett:
You’ve shared these interesting advances with us before. I don’t dichotomize statistical thinking vs inference/method. I’m just focussed here on the accusations that statistical inference methods encourage bad science, should rarely be used, and rarely be taught. Where in your post you cited post selection inference as an example of important current research, Tong rejects those methods because “such methods must still fail to guarantee reliable statistical inference, because they cannot eliminate model uncertainty and systematic error”. This presupposes that appealing to methods of statistical inference would only be warranted if they could guarantee reliable statistical inference, in and of themselves. Tong tells us: “One reviewer of this article characterized our view as “the proposed solution for imperfect variance estimation is no variance estimation,” and then asked “Is no quantification of uncertainty truly better than imperfect quantification?” I don’t think he has an adequate reply.
An article in the recent Amstat on “Statistical Thinking: A Brand Differentiator” has a box called “Data modeling with statistical thinking” which includes hypothesis testing and assumption testing. These are the methods Tong derogates as encouraging bad science.
I join Prof. Mayo in pointing readers to the comments section of Prof. Kennet’s post, where I provided an extensive reply to Prof. Mayo’s comments. I also posted additional clarifications about my 2019 paper in the comments section of Prof. Lakens’ post. For the most part, I will not repeat these here.
Unfortunately a couple points I already made in those comments do need to be reiterated here, as Prof. Mayo’s most recent comment above failed to take them into account. First, there is a disagreement on definitions that has caused wholly unnecessary tension between Prof. Mayo’s comments and mine. She said above, “Whenever we use data to solve statistical problems we are doing statistical inference.” In contrast, by statistical inference I mean (as noted in my 2019) paper, estimating and/or testing parameters in statistical models. Several of Mayo’s objections fall away once this is realized. Second, my paper embraced the use of model building, model criticism, and model selection during exploratory research. This is exactly the learning from data using “mathematical and computational methods” that Prof. Mayo incorrectly states that I would disallow. These methods can and should be applied – it is the epitome of good science, emphatically stated in my paper. However there is just one caveat: estimating or testing of parameters in the “final model” do not exhibit the nominal error and coverage probabilities due to model selection bias. This is not an opinion – it is well documented both theoretically and via simulation in many papers cited in my 2019 article. The Mayo-Hand paper of 2022 seems to concede this point in their Sec. 3.4, though they emphasize “p-hacking” while a completely sincere and careful investigator will still not be able to avoid it (Gelman & Loken 2014). It is quite surprising then, that there should be any controversy about this point at all. The “severe test” of the final model must be made on data obtained subsequent to the current model selection activity.
The history of science is replete with examples of inferences from data–these are just not statistical inferences, using my meaning of the phrase. A good illustration can be found in the recently published Penguin Classic, The Dawn of Modern Cosmology: From Copernicus to Newton, edited by Aviva Rothman (Penguin Random House, 2023). I listed others in the final section of my 2019 paper. Yet another example is the discovery of the “dark-matter” discrepancy that I detailed at length in the other thread.
There are conditions in which statistical inference can be fruitfully applied, as noted in my paper. A canonical example is phase III clinical trials. Thus I should have added a caveat in my paper that “Statistical inference doesn’t always enable bad science, just most of the time.”
Christopher:
Thank you for your further comment. My criticisms do not fall away because I consider the formal methodology extremely important whereas you say they encourage bad science.
Note how Stephen Senn exposes and demonstrates faulty intuitive thinking about randomization and about control using formal tools. The following links were left out:
The Tong educated (or thinly educated) student will not be able to understand these demonstrations. Unfortunately, misunderstanding about randomization (which Senn discusses) by many philosophers is directly linked to unfamiliarity with the more formal expositions of what it does. You can check links on this blog.
As soon as I saw, as a grad student, the mathematical demonstration of how our statistical thinking can go wrong in such toy examples as “The lady tasting tea”, the overarching pattern of reasoning became clear to me, and I could see how mathematical statistics was the key to solving central problems of induction that philosophers of science had long been agonizing over.
Tong also rejects the idea of mathematical and computational methods entering to allow learning despite shortcomings and approximations, including cross-validation and bootstrapping because “such methods must still fail to guarantee reliable statistical inference, because they cannot eliminate model uncertainty and systematic error”. This presupposes that appealing to methods of statistical inference would only be warranted if they could function on automatic pilot. Humans are smarter than that.
Research on dark matter, as I understand it, certainly does involve distinguishing genuine signals from noise using statistics.*
You have not tackled the majority the arguments in my comment, so I won’t repeat them.**
*The very first link I found not behind a paywall:
Click to access report18w5095.pdf
Most importantly, the vast amount of background needed, from HEP physics, gravitational waves, Higgs discovery, and much more, all hinged on statistically distinguishing signals and noise.
**As I note in the preface of SIST, “Methods of statistical inference become relevant primarily when effects are neither totally swamped by noise, nor so clear cut that formal assessment of
errors is relatively unimportant.” But even when a formal assessment isn’t needed, the overall logic of scientific inference and learning from error are well captured in error statistical methodology.
A comment by HEP physicist Robert Cousins replying to Gelman on the latter’s last guest post may be of interest.
https://errorstatistics.com/2024/08/18/andrew-gelman-guest-post-trying-to-clear-up-a-misunderstanding-about-decision-analysis-and-significance-testing/#comment-266295
Anyone who doubts how much statistical inference and methods interact with current physics, see phystat:
https://phystat.github.io/Website/seminars/
For a fantastic talk physicist Robert Cousins gave which is chock full of current foundational issues, see https://www.cmu.edu/dietrich/statistics-datascience/stamps/events/webinars/webinar-archive/spring-21.html . You can scroll down to his talk, slides and video.
Thank you again for entertaining my thoughts. Sadly I did not respond to the majority of Mayo’s comments because they are aimed at a straw person, not my 2019 paper. For instance, no criticism of “formal methods” (other than estimating and testing parameters in statistical models) appears anywhere in the paper (I searched for this phrase in my paper and couldn’t find it). On the contrary, many “formal methods” are embraced and encouraged throughout my paper (e.g., lasso, loess, intention to treat, etc.). To take the example of Senn’s randomization post, such methods are strongly encouraged in Sec. 7.2 of my paper, including discussion of block randomization (top of p. 253, right column). There is no daylight between me and the first Senn post above. The second Senn post discusses adjustment for baseline response. I don’t disagree with that post, but it is incomplete. There is a constrained longitudinal data analysis approach of Liang & Zeger (2000) that can be viewed as competitve with the ANCOVA approach discussed by Senn. One comparative investigation (with good references) can be found here: https://www.tandfonline.com/doi/full/10.1080/10543406.2011.550113 , although others continued to study this topic and publish their findings later.
The discussions in Senn’s posts are best understood in the context of phased clinical trials, the field that Senn worked in for many decades. Here, statistical inferences are not usually considered definitive enough for regulatory decision making until Phase III, placing such inferences at the end of a learn-and-confirm framework with comprehensive prespecification of the statistical model at the final stage. This is a setting in which my paper argued that statistical inference may be used properly, because researcher degrees of freedom (and consequent opportunities to overfit) are minimized at the last stage. This is clearly stated in the 4th bullet at the bottom of p. 256, left column.
Mayo’s comments on physics are wholly compatible with mine. My earlier comment was on the discovery of the dark matter discrepancy in the 1970s, not the current search for what dark matter could be (the topic of the paper Mayo cited above). The example I gave illustrates my point, hers illustrates her point, and there is no actual contradiction. Physicists and astronomers have their own mechanisms for guarding against the garden of forking paths. For example, there was the requirement (used in the discovery of the Higgs boson) that statistical procedures “must be well understood, well defined, and fixed in advance” (van Dyk, 2014). This prespecification plays the same role as it does in Phase III trials in preventing the use of “researcher degrees of freedom”. Another is the ability of physicists to generate hypotheses about one class of experiment based on theories developed from other classes of experiment, rather than just massaging the same data set over and over again. A canonical example is the discovery of energy quantization. Einstein took the concept from Planck’s model of blackbody radiation (energy quantization being regarded by Planck as a purely instrumental assumption needed to arrive at a theory compatible with the data, though he never actually carried out a statistical “curve fit” – the story is highly instructive but better told by others). In contrast with Planck, Einstein took the concept seriously as a possible principle of nature, and used it to explain the photoelectric effect and other phenomena. The methods used by Planck and Einstein were certainly “formal” (based on mathematically formulated theories of physics) but not “statistical”, to reinforce Kennet’s point that “some formal methods are outside the realm of statistics.”
The exchange between Gelman and Cousins is excellent. HEP physics is one area where a point null hypothesis (eg, are neutrinos massless?) actually has physical interpretability and incredible consequences. This is unlike the usual situation in many other fields where as Tukey quipped, the null hypothesis is always wrong “in some decimal place”. In a clinical trial or A/B test, we know a priori that the two groups receive different treatents, so of course they are different — what matters is by how much are they different, how precisely do we know that, and is this clinically or practically relevant. Bear in mind all the safeguards against the optimism principle used by physicists, and their constant urge to get new and better data of different kinds, which contradicts the (as you say) sterile use of statistical methods in many other fields (what Nelder called the Cult of the Isolated Study). Like those working in the context of phased clinical trials, physicists who take the precautions above are doing their best to protect themselves from the garden of forking paths, and can fruitfully use statistical inference. Though the physics examples are not discussed in my paper, they are compatible with its ideas, as is the Genome Wide Association Studies (GWAS) community’s requirement for both a discovery and a validation phase of research, before publication can be considered. The rest of the scientific community needs to catch up with these three exemplar fields in order for statistical inference to “enable good science” in their areas.
Christopher:
I think we have now circled around these points sufficiently.
You appear, in your recent comments, to be retreating from the position you urge forcefully in your paper. That is all to the good. That goal is the only reason I took the time to highlight this discussion. However, even your caveat is, in my view, quite unsatisfactory: You now say: “I should have added a caveat in my paper that “Statistical inference doesn’t always enable bad science, just most of the time.” Were that so, abandoning statistical inference would be warranted (“weak repeated sampling principle”.) But the actual manner in which statistical inference methods serve to critically appraise conjectures is very far from your caricatures. Fraud-busting, new and old, attests to this.
It is only by divorcing statistical inference from critical statistical thinking that one could imagine statistical inference methods encourage a focus on an isolated study. Fisher always emphasized that we’re not interested in an isolated statistically significant effect, it must be shown that one can bring about effects that rarely fail to be statistically significant before saying you have evidence hold of a genuine phenomena. Finding flaws in initial attempts, however, are a crucial key toward inferences that are warranted with greater severity.
You aver that I am misstating your view because:
“For instance, no criticism of ‘formal methods’ (other than estimating and testing parameters in statistical models) appears anywhere in the paper.”
That’s exactly the basis of my criticism because I would deny that estimating and testing parameters in statistical models enables bad science. That a method can be misused does not entail that the methods encourage misuses. Nor ought statistical inference methods be divorced from something called statistical thinking. Your paper declares that “formal, probability-based statistical inference should play no role in most scientific research” and also disparages mathematical and computational methods developed to ensure or compensate for multiple testing and various selection effects. since “such methods must still fail to guarantee reliable statistical inference, because they cannot eliminate model uncertainty and systematic error”. But guaranteed reliability, and elimination of uncertainty are not required to warrant the use of methods in scientific inquiry.
You say (in your comment) that in section 3.4 Hand and I agree with you. That would be so if the thesis of your paper was: “Misuses of statistical inference enable bad science”. In 3.4 we defend adjusting for model selection, which you disparage. Our paper underscores the damages to practice in abandoning statistical significance.
As a last point: the “abandon significance” call is a call for abandoning thresholds. It is notable that you are prepared to conclude a given statistical inference is vitiated by dint of model uncertainty and systematic error. In so doing you are applying a statistical threshold that you choose for counting a case as “bad science” (or unwarranted inference). So it appears you endorse the use of thresholds, without which there is no basis for such criticism. What we should worry about, in my view, are those methods that are insensitive to violations of error probabilities (as with those that deny data dredging and multiplicity alter evidence).
Mayo, D. G. (2020). “Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology?” Review of Philosophy and Psychology 12: 101-120. DOI https://doi.org/10.1007/s13164-020-00501-w
Dr. Tong,
Thank you for your thought-provoking contributions to the discourse on statistical methods in science. I appreciate your emphasis on elevating statistical thinking in scientific practice and education, which is indeed crucial. However, I wish to address the contention that statistical inference primarily enables bad science, as suggested in the title of your 2019 paper.
Firstly, it’s important to recognize that statistical inference, when applied correctly, is a powerful tool that aids in the understanding and interpretation of data within a framework that controls for error. The foundational methodologies of statistical inference—hypothesis testing, estimation of parameters, and model validation—are essential for advancing scientific knowledge by providing a structured way to quantify uncertainty and test theories rigorously.
Your concern seems to focus on the misapplication and potential misuse of statistical inference, which can indeed lead to misleading conclusions if not properly managed. However, this does not inherently fault statistical inference as a discipline but rather highlights the need for robust training and education in both statistical thinking and methodology. It’s the application that is at fault, not the tool itself.
Moreover, statistical inference and statistical thinking are not mutually exclusive but are complementary. The former provides the formal tools and methods that, when used with good judgment (statistical thinking), lead to rigorous scientific conclusions. A clear understanding of both aspects is necessary to navigate the complexities of real-world data and to foster innovation and discovery in science.
Statistical significance tests, for instance, are often criticized for their misuse in science, particularly in fields with a high reliance on p-values. However, these tests are not just isolated tools but part of a broader analytical process that includes experimental design, data collection, and data analysis—all framed within the context of statistical inference. When understood and applied within this comprehensive framework, they enhance the reliability of scientific findings rather than detract from them.
To truly improve scientific research, we must advocate for better statistical education that emphasizes both the theoretical underpinnings and practical applications of statistical methods. Encouraging a dismissive stance towards statistical inference might inadvertently discourage rigorous analysis and lead to an underappreciation of the role of formal statistical methods in scientific discovery.
In conclusion, while I agree with your call for more emphasis on statistical thinking, I respectfully disagree with the view that statistical inference is a net detractor in scientific endeavors. Instead, I advocate for a more integrated approach where statistical thinking and inference coexist and reinforce each other to improve the quality and integrity of scientific research.
Sincerely,
Miodrag