Author Archives: Mayo

Dennis Lindley’s “Philosophy of Statistics”

Philosopher’s Stone

Yesterday’s slight detour [i] presents an opportunity to (re)read Lindley’s “Philosophy of Statistics” (2000) (see also an earlier post).  I recommend the full article and discussion. There is actually much here on which we agree.

The Philosophy of Statistics

Dennis V. Lindley

The Statistician (2000) 49:293-319

Summary. This paper puts forward an overall view of statistics. It is argued that statistics is the study of uncertainty. The many demonstrations that uncertainties can only combine according to the rules of the probability calculus are summarized. The conclusion is that statistical inference is firmly based on probability alone. Progress is therefore dependent on the construction of a probability model; methods for doing this are considered. It is argued that the probabilities are personal. The roles of likelihood and exchangeability are explained. Inference is only of value if it can be used, so the extension to decision analysis, incorporating utility, is related to risk and to the use of statistics in science and law. The paper has been written in the hope that it will be intelligible to all who are interested in statistics.

Around eight pages in we get another useful summary:

Let us summarize the position reached.

(a)   Statistics is the study of uncertainty.

(b)    Uncertainty should be measured by probability.

(c)   Data uncertainty is so measured, conditional on the parameters.

(d)  Parameter uncertainty is similarly measured by probability.

(e)    Inference is performed within the probability calculus, mainly by equations (1) and (2) (301).

Continue reading

Categories: Statistics | Tags: , , , | 50 Comments

Is Particle Physics Bad Science?

I suppose[ed] this was somewhat of a joke from the ISBA, prompted by Dennis Lindley, but as I [now] accord the actual extent of jokiness to be only ~10%, I’m sharing it on the blog [i].  Lindley (according to O’Hagan) wonders why scientists require so high a level of statistical significance before claiming to have evidence of a Higgs boson.  It is asked: “Are the particle physics community completely wedded to frequentist analysis?  If so, has anyone tried to explain what bad science that is?”

Bad science?   I’d really like to understand what these representatives from the ISBA would recommend, if there is even a shred of seriousness here (or is Lindley just peeved that significance levels are getting so much press in connection with so important a discovery in particle physics?)

Well, read the letter and see what you think.

On Jul 10, 2012, at 9:46 PM, ISBA Webmaster wrote:

Dear Bayesians,

A question from Dennis Lindley prompts me to consult this list in search of answers.

We’ve heard a lot about the Higgs boson.  The news reports say that the LHC needed convincing evidence before they would announce that a particle had been found that looks like (in the sense of having some of the right characteristics of) the elusive Higgs boson.  Specifically, the news referred to a confidence interval with 5-sigma limits.

Now this appears to correspond to a frequentist significance test with an extreme significance level.  Five standard deviations, assuming normality, means a p-value of around 0.0000005.  A number of questions spring to mind.

1.  Why such an extreme evidence requirement?  We know from a Bayesian  perspective that this only makes sense if (a) the existence of the Higgs  boson (or some other particle sharing some of its properties) has extremely small prior probability and/or (b) the consequences of erroneously announcing its discovery are dire in the extreme.  Neither seems to be the case, so why  5-sigma?

2.  Rather than ad hoc justification of a p-value, it is of course better to do a proper Bayesian analysis.  Are the particle physics community completely wedded to frequentist analysis?  If so, has anyone tried to explain what bad science that is? Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , | 11 Comments

PhilStatLaw: Reference Manual on Scientific Evidence (3d ed) on Statistical Significance (Schachtman)

A quick perusal of the “Manual” on Nathan Schachtman’s legal blog shows it to be chock full of revealing points of contemporary legal statistical philosophy.  The following are some excerpts, read the full blog here.   I make two comments at the end.

July 8th, 2012

Nathan Schachtman

How does the new Reference Manual on Scientific Evidence (RMSE3d 2011) treat statistical significance?  Inconsistently and at times incoherently.

Professor Berger’s Introduction

In her introductory chapter, the late Professor Margaret A. Berger raises the question of the role statistical significance should play in evaluating a study’s support for causal conclusions:

“What role should statistical significance play in assessing the value of a study? Epidemiological studies that are not conclusive but show some increased risk do not prove a lack of causation. Some courts find that they therefore have some probative value, 62 at least in proving general causation. 63”

Margaret A. Berger, “The Admissibility of Expert Testimony,” in RMSE3d 11, 24 (2011).

This seems rather backwards.  Berger’s suggestion that inconclusive studies do not prove lack of causation seems nothing more than a tautology.  And how can that tautology support the claim that inconclusive studies “therefore ” have some probative value? This is a fairly obvious logical invalid argument, or perhaps a passage badly in need of an editor.

…………

Chapter on Statistics

The RMSE’s chapter on statistics is relatively free of value judgments about significance probability, and, therefore, a great improvement upon Berger’s introduction.  The authors carefully describe significance probability and p-values, and explain:

“Small p-values argue against the null hypothesis. Statistical significance is determined by reference to the p-value; significance testing (also called hypothesis testing) is the technique for computing p-values and determining statistical significance.”

David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” in RMSE3d 211, 241 (3ed 2011).  Although the chapter confuses and conflates Fisher’s interpretation of p-values with Neyman’s conceptualization of hypothesis testing as a dichotomous decision procedure, this treatment is unfortunately fairly standard in introductory textbooks.

Kaye and Freedman, however, do offer some important qualifications to the untoward consequences of using significance testing as a dichotomous outcome: Continue reading

Categories: Statistics | Tags: , , , , | 9 Comments

Stephen Senn: Randomization, ratios and rationality: rescuing the randomized clinical trial from its critics

Stephen Senn
Head of the Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS), Luxembourg

An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say:

Even if there is only a small probability that an individual factor is unbalanced, given that there are indefinitely many possible confounding factors, then it would seem to follow that the probability that there is some factor on which the two groups are unbalanced (when remember randomly constructed) might for all anyone knows be high. (Worrall J. What evidence is evidence based medicine. Philosophy of Science 2002; 69: S316-S330: see page S324 )

It seems to me, however, that this overlooks four matters. The first is that it is not indefinitely many variables we are interested in but only one, albeit one we can’t measure perfectly. This variable can be called ‘outcome’. We wish to see to what extent the difference observed in outcome between groups is compatible with the idea that chance alone explains it. The indefinitely many covariates can help us predict outcome but they are only of interest to the extent that they do so. However, although we can’t measure the difference we would have seen in outcome between groups in the absence of treatment, we can measure how much it varies within groups (where the variation cannot be due to differences between treatments). Thus we can say a great deal about random variation to the extent that group membership is indeed random.

The second point is that in the absence of a treatment effect, where randomization has taken place, the statistical theory predicts probabilistically how the variation in outcome between groups relates to the variation within. Continue reading

Categories: Statistics | Tags: , , , , , , | 28 Comments

Metablog: Up and Coming

Dear Reader: Over the next week, in addition to a regularly scheduled post by Professor Stephen Senn, we will be taking up two papers[i] from the contributions to the special topic: “Statistical Science and Philosophy of Science: Where Do (Should) They Meet in 2011 and Beyond?” in Rationality, Markets and Morals: Studies at the Intersection of Philosophy and Economics.

I will attempt a (daring) deconstruction of Professor Wasserman’s paper[ii] and at that time will invite your “U-Phils” for posting around a week after (<1000 words).  I will be posting comments by Clark Glymour on Sir David Hendry’s paper later in the week. So you may want to study those papers in advance.

The first “deconstruction” (“Irony and Bad Faith, Deconstructing Bayesians 1”) may be found here / https://errorstatistics.com/2012/04/17/3466/; for a selection of both U-Phils and Deconstructions, see https://errorstatistics.com/2012/04/17/3466/

D. Mayo

P.S. Those who had laughed at me for using this old trusty typewriter were asking to borrow it last week when we lost power for 6 days and their computers were down.


[i] *L. Wasserman, “Low Assumptions, High Dimensions”. RMM Vol. 2, 2011, 201–209;

D. Hendry, “Empirical Economic Model Discovery and Theory Evaluation”. RMM Vol. 2, 2011, 115–145.

[ii] Assuming I don’t chicken out.

Categories: Metablog, Philosophy of Statistics, U-Phil | Tags: , | Leave a comment

Vladimir Cherkassky Responds on Foundations of Simplicity

I thank Dr. Vladimir Cherkassky for taking up my general invitation to comment. I don’t have much to add to my original post[i], except to make two corrections at the end of this post.  I invite readers’ comments.

Vladimir Cherkassky

As I could not participate in the discussion session on Sunday, I would like to address several technical issues and points of disagreement that became evident during this workshop. All opinions are mine, and may not be representative of the “machine learning community.” Unfortunately, the machine learning community at large is not very much interested in the philosophical and methodological issues. This breeds a lot of fragmentation and confusion, as evidenced by the existence of several technical fields: machine learning, statistics, data mining, artificial neural networks, computational intelligence, etc.—all of which are mainly concerned with the same problem of estimating good predictive models from data.

Occam’s Razor (OR) is a general metaphor in the philosophy of science, and it has been discussed for ages. One of the main goals of this workshop was to understand the role of OR as a general inductive principle in the philosophy of science and, in particular, its importance in data-analytic knowledge discovery for statistics and machine learning.

Data-analytic modeling is concerned with estimating good predictive models from finite data samples. This is directly related to the philosophical problem of inductive inference. The problem of learning (generalization) from finite data had been formally investigated in VC-theory ~ 40 years ago. This theory starts with a mathematical formulation of the problem of learning from finite samples, without making any assumptions about parametric distributions. This formalization is very general and relevant to many applications in machine learning, statistics, life sciences, etc. Further, this theory provides necessary and sufficient conditions for generalization. That is, a set of admissible models (hypotheses about the data) should be constrained, i.e., should have finite VC-dimension. Therefore, any inductive theory or algorithm designed to explain the data should satisfy VC-theoretical conditions. Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , | 12 Comments

Comment on Falsification

The comment box was too small for my reply to Sober on falsification, so I will post it here:

I want to understand better Sober’s position on falsification. A pervasive idea to which many still subscribe, myself included, is that the heart of what makes inquiry scientific is the critical attitude: that if a claim or hypothesis or model fails to stand up to critical scrutiny it is rejected as false, and not propped up with various “face-saving” devices. Now

Sober writes “I agree that we can get rid of models that deductively entail (perhaps with the help of auxiliary assumptions) observational outcomes that do not happen.  But as soon as the relation is nondeductive, is there ‘falsification’”?

My answer is yes, else we could scarcely retain the critical attitude for any but the most trivial scientific claims. While at one time philosophers imagined that “observational reports” were given, and could therefore form the basis for a deductive falsification of scientific claims, certainly since Popper, Kuhn and the rest of the post-positivists, we recognize that observations are error prone, as are appeals to auxiliary hypotheses. Here is Popper: Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , | 11 Comments

Elliott Sober Responds on Foundations of Simplicity

Here are a few comments on your recent blog about my ideas on parsimony.  Thanks for inviting me to contribute!

You write that in model selection, “’parsimony fights likelihood,’ while, in adequate evolutionary theory, the two are thought to go hand in hand.”  The second part of this statement isn’t correct.  There are sufficient conditions (i.e., models of the evolutionary process) that entail that parsimony and maximum likelihood are ordinally equivalent, but there are cases in which they are not.  Biologists often have data sets in which maximum parsimony and maximum likelihood disagree about which phylogenetic tree is best.

You also write that “error statisticians view hypothesis testing as between exhaustive hypotheses H and not-H (usually within a model).”  I think that the criticism of Bayesianism that focuses on the problem of assessing the likelihoods of “catch-all hypotheses” applies to this description of your error statistical philosophy.  The General Theory of Relativity, for example, may tell us how probable a set of observations is, but its negation does not.  I note that you have “usually within a model” in parentheses.  In many such cases, two alternatives within a model will not be exhaustive even within the confines of a model and of course they won’t be exhaustive if we consider a wider domain.

Continue reading

Categories: philosophy of science, Statistics | Tags: , , , | 13 Comments

More from the Foundations of Simplicity Workshop*

*See also earlier posts from the CMU workshop here and here.

Elliott Sober has been writing on simplicity for a long time, so it was good to hear his latest thinking. If I understood him, he continues to endorse a comparative likelihoodist account, but he allows that, in model selection, “parsimony fights likelihood,” while, in adequate evolutionary theory, the two are thought to go hand in hand. Where it seems needed, therefore, he accepts a kind of “pluralism”. His discussion of the rival models in evolutionary theory and how they may give rise to competing likelihoods (for “tree taxonomies”) bears examination in its own right, but being in no position to accomplish this, I shall limit my remarks to the applicability of Sober’s insights (as my notes reflect them) to the philosophy of statistics and statistical evidence.

1. Comparativism:  We can agree that a hypothesis is not appraised in isolation, but to say that appraisal is “contrastive” or “comparativist” is ambiguous. Error statisticians view hypothesis testing as between exhaustive hypotheses H and not-H (usually within a model), but deny that the most that can be said is that one hypothesis or model is comparatively better than another, among a group of hypotheses that is to be delineated at the outset. There’s an important difference here. The best-tested of the lot need not be well-tested!

2. Falsification: Sober made a point of saying that his account does not falsify models or hypotheses. We are to start out with all the possible models to be considered (hopefully including one that is true or approximately true), akin to the “closed universe” of standard Bayesian accounts[i], but do we not get rid of any as falsified, given data? It seems not.

Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , , | 3 Comments

PhilStatLaw: “Let’s Require Health Claims to Be ‘Evidence Based'” (Schachtman)

I see that Nathan Schachtman has had many interesting posts during the time I was away.  His recent post endorses the idea of “a hierarchy of evidence”–but philosophers of “evidence-based” medicine generally question or oppose it, at least partly because of disagreement as to where to place RCTs in the hierarchy.  What do people think?

Litigation arising from the FDA’s refusal to approval “health claims” for foods and dietary supplements is a fertile area for disputes over the interpretation of statistical evidence.  A ‘‘health claim’’ is ‘‘any claim made on the label or in labeling of a food, including a dietary supplement, that expressly or by implication … characterizes the relationship of any substance to a disease or health-related condition.’’ 21 C.F.R. § 101.14(a)(1); see also 21 U.S.C. § 343(r)(1)(A)-(B).

Unlike the federal courts exercising their gatekeeping responsibility, the FDA has committed to pre-specified principles of interpretation and evaluation. By regulation, the FDA gives notice of standards for evaluating complex evidentiary displays for the ‘‘significant scientific agreement’’ required for approving a food or dietary supplement health claim.  21 C.F.R. § 101.14.  SeeFDA – Guidance for Industry: Evidence-Based Review System for the Scientific Evaluation of Health Claims – Final (2009).

If the FDA’s refusal to approve a health claim requires pre-specified criteria of evaluation, then we should be asking ourselves why have the federal courts failed to develop a set of criteria for evaluating health effects claims as part of its Rule 702 (“Daubert“) gatekeeping responsibilities.  Why, after close to 20 years after the Supreme Court decided Daubert, can lawyers make “health claims” without having to satisfy evidence-based criteria?

Read the rest.

Categories: philosophy of science, Statistics | Tags: , , , | Leave a comment

Further Reflections on Simplicity: Mechanisms

To continue with some philosophical reflections on the papers from the “Ockham’s razor” conference, let me respond to something in Shalizi’s recent comments (http://cscs.umich.edu/~crshalizi/weblog/). His emphasis on the interest in understanding processes and mechanisms, as opposed to mere prediction, seems exactly right. But he raises a question that seems to me simply answered (on grounds of evidence):  If “a model didn’t seem to need” a mechanism, it is left out, why?

“It’s this, the leave-out-processes-you-don’t-need, which seems to me the core of the Razor for scientific model-building. This is definitely not the same as parameter-counting, and I think it’s also different from capacity control and even from description-length-measuring (cf.), though I am open to Peter persuading me otherwise. I am not, however, altogether sure how to formalize it, or what would justify it, beyond an aesthetic preference for tidy models. (And who died and left the tidy-minded in charge?) The best hope for such justification, I think, is something like Kevin’s idea that the Razor helps us get to the truth faster, or at least with fewer needless detours. Positing processes and mechanisms which aren’t strictly called for to account for the phenomena is asking for trouble needlessly.”

But it is easy to see that if a model M is adequate for data x regarding an aspect of a phenomenon (i.e., M had passed reasonably severe tests with x) , then a model M’ that added an “unnecessary” mechanism would have passed with very low severity, or, if one prefers, M’ would be very poorly corroborated.  To justify “leaving-out-processes-you-don’t-need” then, the appeal is not to aesthetics or heuristics but to the severity or well-testedness of M and M’.

Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , | 4 Comments

Deviates, Sloths, and Exiles: Philosophical Remarks on the Ockham’s Razor Workshop*

Picking up the pieces…

My flight out of Pittsburgh has been cancelled, and as I may be stuck in the airport for some time, I will try to make a virtue of it by jotting down some of my promised reflections on the “simplicity and truth” conference at Carnegie Mellon (organized by Kevin Kelly). My remarks concern only the explicit philosophical connections drawn by (4 of) the seven non-philosophers who spoke. For more general remarks, see blogs of: Larry Wasserman (Normal Deviate) and Cosma Shalizi (Three-Toed Sloth). (The following, based on my notes and memory, may include errors/gaps, but I trust that my fellow bloggers and sloggers, will correct me.)

First to speak were Vladimir Vapnik and Vladimir Cherkassky, from the field of machine learning, a discipline I know of only formally. Vapnik, of the Vapnik Chervonenkis (VC) theory, is known for his seminal work here. Their papers, both of which addressed directly the philosophical implications of their work, share enough themes to merit being taken up together.

Vapnik and Cherkassky find a number of striking dichotomies in the standard practice of both philosophy and statistics. They contrast the “classical” conception of scientific knowledge as essentially rational with the more modern, “data-driven” empirical view:

The former depicts knowledge as objective, deterministic, rational. Ockham’s razor is a kind of synthetic a priori statement that warrants our rational intuitions as the foundation of truth with a capital T, as well as a naïve realism (we may rely on Cartesian “clear and distinct” ideas; God does not deceive; and so on). The latter empirical view, illustrated by machine learning, is enlightened. It settles for predictive successes and instrumentalism, views models as mental constructs (in here, not out there), and exhorts scientists to restrict themselves to problems deemed “well posed” by machine-learning criteria.

But why suppose the choice is between assuming “a single best (true) theory or model” and the extreme empiricism of their instrumental machine learner? Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , | 14 Comments

Promissory Note

Dear Reader:
After a month of traveling, I’m soon to return to home port; then it’s just a ferry back to Elba. I promise to post (hopefully by Monday) some philosophical reflections on the past few days at the Ockham’s Razor conference, here at CMU (see post from June 12, 2012), and catch up on your comments/e-mails. I am to present Sunday (tomorrow) at 9 a.m.

Categories: Metablog | Tags: , | Leave a comment

The Error Statistical Philosophy and The Practice of Bayesian Statistics: Comments on Gelman and Shalizi

Mayo elbowThe following is my commentary on a paper by Gelman and Shalizi, forthcoming (some time in 2013) in the British Journal of Mathematical and Statistical Psychology* (submitted February 14, 2012).
_______________________

The Error Statistical Philosophy and the Practice of Bayesian Statistics: Comments on A. Gelman and C. Shalizi: Philosophy and the Practice of Bayesian Statistics**
Deborah G. Mayo

  1. Introduction

I am pleased to have the opportunity to comment on this interesting and provocative paper. I shall begin by citing three points at which the authors happily depart from existing work on statistical foundations.

First, there is the authors’ recognition that methodology is ineluctably bound up with philosophy. If nothing else “strictures derived from philosophy can inhibit research progress” (p. 4). They note, for example, the reluctance of some Bayesians to test their models because of their belief that “Bayesian models were by definition subjective,” or perhaps because checking involves non-Bayesian methods (4, n4).

Second, they recognize that Bayesian methods need a new foundation. Although the subjective Bayesian philosophy, “strongly influenced by Savage (1954), is widespread and influential in the philosophy of science (especially in the form of Bayesian confirmation theory),”and while many practitioners perceive the “rising use of Bayesian methods in applied statistical work,” (2) as supporting this Bayesian philosophy, the authors flatly declare that “most of the standard philosophy of Bayes is wrong” (2 n2). Despite their qualification that “a statistical method can be useful even if its philosophical justification is in error”, their stance will rightly challenge many a Bayesian.

Continue reading

Categories: Statistics | Tags: , , , , | Leave a comment

G. Cumming Response: The New Statistics

Prof. Geoff Cumming [i] has taken up my invite to respond to “Do CIs Avoid Fallacies of Tests? Reforming the Reformers” (May 17th), reposted today as well. (I extend the same invite to anyone I comment on, whether it be in the form of a comment or full post).   He reviews some of the complaints against p-values and significance tests, but he has not here responded to the particular challenge I raise: to show how his appeals to CIs avoid the fallacies and weakness of significance tests. The May 17 post focuses on the fallacy of rejection; the one from June 2, on the fallacy of acceptance. In each case, one needs to supplement his CIs with something along the lines of the testing scrutiny offered by SEV. At the same time, a SEV assessment avoids the much-lampooned uses of p-values–or so I have argued. He does allude to a subsequent post, so perhaps he will address these issues there.

The New Statistics

PROFESSOR GEOFF CUMMING [ii] (submitted June 13, 2012)

I’m new to this blog—what a trove of riches! I’m prompted to respond by Deborah Mayo’s typically insightful post of 17 May 2012, in which she discussed one-sided tests and referred to my discussion of one-sided CIs (Cumming, 2012, pp 109-113). A central issue is:

Cumming (quoted by Mayo): as usual, the estimation approach is better

Mayo: Is it?

Lots to discuss there. In this first post I’ll outline the big picture as I see it.

‘The New Statistics’ refers to effect sizes, confidence intervals, and meta-analysis, which, of course, are not themselves new. But using them, and relying on them as the basis for interpretation, would be new for most researchers in a wide range of disciplines—that for decades have relied on null hypothesis significance testing (NHST). My basic argument for the new statistics rather than NHST is summarised in a brief magazine article (http://tiny.cc/GeoffConversation) and radio talk (http://tiny.cc/geofftalk). The website www.thenewstatistics.com has information about the book (Cumming, 2012) and ESCI software, which is a free download.

Continue reading

Categories: Statistics | Tags: , , , , , , , | 5 Comments

Repost (5/17/12): Do CIs Avoid Fallacies of Tests? Reforming the Reformers

The one method that enjoys the approbation of the New Reformers is that of confidence intervals (See May 12, 2012, and links). The general recommended interpretation is essentially this:

For a reasonably high choice of confidence level, say .95 or .99, values of µ within the observed interval are plausible, those outside implausible.

Geoff Cumming, a leading statistical reformer in psychology, has long been pressing for ousting significance tests (or NHST[1]) in favor of CIs. The level of confidence “specifies how confident we can be that our CI includes the population parameter m (Cumming 2012, p.69). He recommends prespecified confidence levels .9, .95 or .99:

“We can say we’re 95% confident our one-sided interval includes the true value. We can say the lower limit (LL) of the one-sided CI…is a likely lower bound for the true value, meaning that for 5% of replications the LL will exceed the true value. “ (Cumming 2012, p. 112)[2]

For simplicity, I will use the 2-standard deviation cut-off corresponding to the one-sided confidence level of ~.98.

However, there is a duality between tests and intervals (the intervals containing the parameter values not rejected at the corresponding level with the given data).[3]

“One-sided CIs are analogous to one-tailed tests but, as usual, the estimation approach is better.”

Is it?   Consider a one-sided test of the mean of a Normal distribution with n iid samples, and known standard deviation σ, call it test T+.

H0: µ ≤  0 against H1: µ >  0 , and let σ= 1.

Test T+ at significance level .02 is analogous to forming the one-sided (lower) 98% confidence interval:

µ > M – 2(1/ √n ).

where M, following Cumming, is the sample mean (thereby avoiding those x-bars). M – 2(1/ √n ) is the lower limit (LL) of a 98% CI.

Central problems with significance tests (whether of the N-P or Fisherian variety) include: Continue reading

Categories: Statistics | Tags: , , , | Leave a comment

Scratch Work for a SEV Homework Problem

Scratch-Paper-postSomeone wrote to me asking to see the scratch work for the SEV calculations.  (See June 14 post, also LSE problem set.)  I’ll just do the second one:

What is the Severity with which (μ<3.29) passes the test T+ in the case where  σx = 2?  We have that the observed sample mean M is 1.4, so

SEV (μ < 3.29) = P( test T+ yields a result that fits the 0 null less well than the one you got (in the direction of the alternative); computed assuming μ as large as 3.29)

SEV(μ < 3.29) = P(M >1.4; μ >3.29) > P(Z > (1.4 -3.29)/2)) * = P(Z > -1.89/2) = P(Z > -.945 ) ~ .83

*We calculate this at the point μ = 3.29, since the SEV would be larger for greater values of μ.

That’s quite a difference from the power calculation of .5, calculated in the usual way of a discrepancy detect size (DDS) analysis.

QUESTIONS?

NEW PROBLEM: You want to make an inference that passes with high SEV, say you want  SEV(μ < μ’) = .99, with the same (statistically insignificant) outcome you got from the second case of test T+ as before (σx = 2).  What value for μ’ can you infer μ < μ’ with a SEV of .99?

Categories: Statistics | Tags: , | 5 Comments

Answer to the Homework & a New Exercise

Debunking the “power paradox” allegation from my previous post. The authors consider a one-tailed Z test of the hypothesis H0: μ ≤ 0 versus H1: μ > 0: our Test T+.  The observed sample mean is = 1.4 and in the first case σx = 1, and in the second case σx = 2.

First case: The power against μ = 3.29 is high, .95 (i.e. P(Z > 1.645; μ=3.29) =1-φ(-1.645) = .95), and thus the DDS assessor would take the result as a good indication that μ < 3.29.

Second case: For σx = 2, the cut-off for rejection would be 0 + 1.65(2) = 3.30.

So, in the second case (σx = 2) the probability of erroneously accepting H0, even if μ were as high as 3.29, is .5!  (i.e. P(Z ≤ 1.645; μ=3.29)  = φ(1.645-(3.29/2)) ~.5.)  Although p1 < p2[i] the justifiable upper bound in the first test is smaller (closer to 0) than in the second!  Hence, the DDS assessment is entirely in keeping with the appropriate use of error probabilities in interpreting tests. There is no conflict with p-value reasoning.

NEW PROBLEM

The DDS power analyst always takes the worst cast of just missing the cut-off for rejection. Compare instead

SEV(μ < 3.29) for the first test, and SEV(μ < 3.29) for the second (using the actual outcomes as SEV requires).


[i] p1= .081 and p2 = .242.

Categories: Statistics | Tags: , , , | 6 Comments

CMU Workshop on Foundations for Ockham’s Razor

CMU Workshop on Foundations for Ockham’s Razor

Carnegie Mellon University, Center for Formal Epistemology:

Workshop on Foundations for Ockham’s Razor

All are welcome to attend.

June 22-24, 2012

Adamson WingBaker Hall 136A, Carnegie Mellon University

Workshop web page and schedule

Contact:  Kevin T. Kelly (kk3n@andrew.cmu.edu)

Rationale:  Scientific theory choice is guided by judgments of simplicity, a bias frequently referred to as “Ockham’s Razor”. But what is simplicity and how, if at all, does it help science find the truth? Should we view simple theories as means for obtaining accurate predictions, as classical statisticians recommend? Or should we believe the theories themselves, as Bayesian methods seem to justify? The aim of this workshop is to re-examine the foundations of Ockham’s razor, with a firm focus on the connections, if any, between simplicity and truth.

Speakers:

Categories: Announcement, philosophy of science | Tags: , | Leave a comment

U-Phil: Is the Use of Power* Open to a Power Paradox?

* to assess Detectable Discrepancy Size (DDS)

In my last post, I argued that DDS type calculations (also called Neymanian power analysis) provide needful information to avoid fallacies of acceptance in the test T+; whereas, the corresponding confidence interval does not (at least not without special testing supplements).  But some have argued that DDS computations are “fundamentally flawed” leading to what is called the “power approach paradox”, e.g., Hoenig and Heisey (2001).

We are to consider two variations on the one-tailed test T+: H0: μ ≤ 0 versus H1: μ > 0 (p. 21).  Following their terminology and symbols:  The Z value in the first, Zp1, exceeds the Z value in the second, Zp2, although the same observed effect size occurs in both[i], and both have the same sample size, implying that σ1 < σ2.  For example, suppose σx1 = 1 and σx2 = 2.  Let observed sample mean M be 1.4 for both cases, so Zp1 = 1.4 and Zp2 = .7. They note that for any chosen power, the computable detectable discrepancy size will be smaller in the first experiment, and for any conjectured effect size, the computed power will always be higher in the first experiment.

“These results lead to the nonsensical conclusion that the first experiment provides the stronger evidence for the null hypothesis (because the apparent power is higher but significant results were not obtained), in direct contradiction to the standard interpretation of the experimental results (p-values).” (p. 21)

But rather than show the DDS assessment “nonsensical”, nor any direct contradiction to interpreting p values, this just demonstrates something  nonsensical in their interpretation of the two p-value results from tests with different variances.  Since it’s Sunday  night and I’m nursing[ii] overexposure to rowing in the Queen’s Jubilee boats in the rain and wind, how about you find the howler in their treatment. (Also please inform us of articles pointing this out in the last decade, if you know of any.)

______________________

Hoenig, J. M. and D. M. Heisey (2001), “The Abuse of Power: The Pervasive Fallacy of Power Calculations in Data Analysis,” The American Statistician, 55: 19-24.

 


[i] The subscript indicates the p-value of the associated Z value.

[ii] With English tea and a cup of strong “Elbar grease”.

Categories: Statistics, U-Phil | Tags: , , , , , | 7 Comments

Blog at WordPress.com.