Little Bit of Logic (5 mini problems for the reader)

Little bit of logic (5 little problems for you)[i]

Deductively valid arguments can readily have false conclusions! Yes, deductively valid arguments allow drawing their conclusions with 100% reliability but only if all their premises are true. For an argument to be deductively valid means simply that if the premises of the argument are all true, then the conclusion is true. For a valid argument to entail  the truth of its conclusion, all of its premises must be true.  In that case the argument is said to be (deductively) sound.

Equivalently, using the definition of deductive validity that I prefer: A deductively valid argument is one where, the truth of all its premises together with the falsity of its conclusion, leads to a logical contradiction (A & ~A).

Show that an argument with the form of disjunctive syllogism can have a false conclusion. Such an argument take the form (where A, B are statements): Continue reading

Categories: Error Statistics | 22 Comments

Mayo Slides Meeting #1 (Phil 6334/Econ 6614, Mayo & Spanos)

Slides  Meeting #1 (Phil 6334/Econ 6614: Current Debates on Statistical Inference and Modeling (D. Mayo and A. Spanos)

 

Categories: Phil6334/ Econ 6614 | Leave a comment

Excerpt from Excursion 4 Tour IV: More Auditing: Objectivity and Model Checking

4.8 All Models Are False

. . . it does not seem helpful just to say that all models are wrong. The very word model implies simplification and idealization. . . . The construction of idealized representations that capture important stable aspects of such systems is, however, a vital part of general scientific analysis. (Cox 1995, p. 456)

 A popular slogan in statistics and elsewhere is “all models are false!”  Is this true? What can it mean to attribute a truth value to a model? Clearly what is meant involves some assertion or hypothesis about the model – that it correctly or incorrectly represents some phenomenon in some respect or to some degree. Such assertions clearly can be true. As Cox observes, “the very word model implies simplification and idealization.”  To declare, “all models are false”  by dint of their being idealizations or approximations, is to stick us with one of those  “all flesh is grass”  trivializations (Section 4.1). So understood, it follows that all statistical models are false, but we have learned nothing about how statistical models may be used to infer true claims about problems of interest. Since the severe tester’s goal in using approximate statistical models is largely to learn where they break down, their strict falsity is a given. Yet it does make her wonder why anyone would want to place a probability assignment on their truth, unless it was 0? Today’s tour continues our journey into solving the problem of induction (Section 2.7). Continue reading

Categories: Statistical Inference as Severe Testing | 3 Comments

Protected: Participants in 6334/6614 Meeting place Jan-Feb

This content is password protected. To view it please enter your password below:

Categories: SIST | Enter your password to view comments.

6334/6614: Captain’s Library: Biblio With Links

Mayo and A. Spanos
PHIL 6334/ ECON 6614: Spring 2019: Current Debates on Statistical Inference and Modeling

Bibliography (this includes a selection of articles with links; numbers 1-15 after the item refer to seminar meeting number.)

See Syllabus (first) for class meetings, and the page PhilStat19 menu up top for other course items.

Achinstein (2010). Mill’s Sins or Mayo’s Errors? (E&I: 170-188). (11)

Bacchus, Kyburg, & Thalos (1990). Against Conditionalization, Synthese (85): 475-506. (15)

Barnett (1999). Comparative Statistical Inference (Chapter 6: Bayesian Inference), John Wiley & Sons. (1), (15)

Begley & Ellis (2012) Raise standards for preclinical cancer research. Nature 483: 531-533. (10)

Continue reading

Categories: SIST | 1 Comment

(Full) Excerpt of Excursion 4 Tour I: The Myth of “The Myth of Objectivity”

A month ago, I excerpted just the very start of Excursion 4 Tour I* on The Myth of the “Myth of Objectivity”. It’s a short Tour, and this continues the earlier post.

4.1    Dirty Hands: Statistical Inference Is Sullied with Discretionary Choices

If all flesh is grass, kings and cardinals are surely grass, but so is everyone else and we have not learned much about kings as opposed to peasants. (Hacking 1965, p.211)

Trivial platitudes can appear as convincingly strong arguments that everything is subjective. Take this one: No human learning is pure so anyone who demands objective scrutiny is being unrealistic and demanding immaculate inference. This is an instance of Hacking’s “all flesh is grass.” In fact, Hacking is alluding to the subjective Bayesian de Finetti (who “denies the very existence of the physical property [of] chance” (ibid.)). My one-time colleague, I. J. Good, used to poke fun at the frequentist as “denying he uses any judgments!” Let’s admit right up front that every sentence can be prefaced with “agent x judges that,” and not sweep it under the carpet (SUTC) as Good (1976) alleges. Since that can be done for any statement, it cannot be relevant for making the distinctions in which we are interested, and we know can be made, between warranted or well-tested claims and those so poorly probed as to be BENT. You’d be surprised how far into the thicket you can cut your way by brandishing this blade alone. Continue reading

Categories: objectivity, SIST | Leave a comment

New Course Starts Tomorrow: Current Debates on Statistical Inference and Modelings: Joint Phil and Econ

I will post items on a new PhilStat Spring 19 page on this blogI

Categories: Announcement | Leave a comment

A letter in response to the ASA’s Statement on p-Values by Ionides, Giessing, Ritov and Page

I came across an interesting letter in response to the ASA’s Statement on p-values that I hadn’t seen before. It’s by Ionides, Giessing, Ritov and Page, and it’s very much worth reading. I make some comments below. Continue reading

Categories: ASA Guide to P-values, P-values | 7 Comments

Mementos from Excursion 4: Objectivity & Auditing: Blurbs of Tours I – IV

Excursion 4: Objectivity and Auditing (blurbs of Tours I – IV)

 

.

Excursion 4 Tour I: The Myth of “The Myth of Objectivity”

Blanket slogans such as “all methods are equally objective and subjective” trivialize into oblivion the problem of objectivity. Such cavalier attitudes are at odds with the moves to take back science The goal of this tour is to identify what there is in objectivity that we won’t give up, and shouldn’t. While knowledge gaps leave room for biases and wishful thinking, we regularly come up against data that thwart our expectations and disagree with predictions we try to foist upon the world. This pushback supplies objective constraints on which our critical capacity is built. Supposing an objective method is to supply formal, mechanical, rules to process data is a holdover of a discredited logical positivist philosophy.Discretion in data generation and modeling does not warrant concluding: statistical inference is a matter of subjective belief. It is one thing to talk of our models as objects of belief and quite another to maintain that our task is to model beliefs. For a severe tester, a statistical method’s objectivity requires the ability to audit an inference: check assumptions, pinpoint blame for anomalies, falsify, and directly register how biasing selection effects–hunting, multiple testing and cherry-picking–alter its error probing capacities.

Keywords

objective vs. subjective, objectivity requirements, auditing, dirty hands argument, phenomena vs. epiphenomena, logical positivism, verificationism, loss and cost functions, default Bayesians, equipoise assignments, (Bayesian) wash-out theorems, degenerating program, transparency, epistemology: internal/external distinction

 

Excursion 4 Tour II: Rejection Fallacies: Whose Exaggerating What?

We begin with the Mountains out of Molehills Fallacy (large n problem): The fallacy of taking a (P-level) rejection of H0 with larger sample size as indicating greater discrepancy from H0 than with a smaller sample size. (4.3). The Jeffreys-Lindley paradox shows with large enough n, a .05 significant result can correspond to assigning H0 a high probability .95. There are family feuds as to whether this is a problem for Bayesians or frequentists! The severe tester takes account of sample size in interpreting the discrepancy indicated. A modification of confidence intervals (CIs) is required.

It is commonly charged that significance levels overstate the evidence against the null hypothesis (4.4, 4.5). What’s meant? One answer considered here, is that the P-value can be smaller than a posterior probability to the null hypothesis, based on a lump prior (often .5) to a point null hypothesis. There are battles between and within tribes of Bayesians and frequentists. Some argue for lowering the P-value to bring it into line with a particular posterior. Others argue the supposed exaggeration results from an unwarranted lump prior to a wrongly formulated null.We consider how to evaluate reforms based on bayes factor standards (4.5). Rather than dismiss criticisms of error statistical methods that assume a standard from a rival account, we give them a generous reading. Only once the minimal principle for severity is violated do we reject them. Souvenir R summarizes the severe tester’s interpretation of a rejection in a statistical significance test. At least 2 benchmarks are needed: reports of discrepancies (from a test hypothesis) that are, and those that are not, well indicated by the observed difference.

Keywords

significance test controversy, mountains out of molehills fallacy, large n problem, confidence intervals, P-values exaggerate evidence, Jeffreys-Lindley paradox, Bayes/Fisher disagreement, uninformative (diffuse) priors, Bayes factors, spiked priors, spike and slab, equivocating terms, severity interpretation of rejection (SIR)

 

Excursion 4 Tour III: Auditing: Biasing Selection Effects & Randomization

Tour III takes up Peirce’s “two rules of inductive inference”: predesignation (4.6) and randomization (4.7). The Tour opens on a court case transpiring: the CEO of a drug company is being charged with giving shareholders an overly rosy report based on post-data dredging for nominally significant benefits. Auditing a result includes checking for (i) selection effects, (ii) violations of model assumptions, and (iii) obstacles to moving from statistical to substantive claims. We hear it’s too easy to obtain small P-values, yet replication attempts find it difficult to get small P-values with preregistered results. I call this the paradox of replication. The problem isn’t P-values but failing to adjust them for cherry picking and other biasing selection effects. Adjustments by Bonferroni and false discovery rates are considered. There is a tension between popular calls for preregistering data analysis, and accounts that downplay error probabilities. Worse, in the interest of promoting a methodology that rejects error probabilities, researchers who most deserve lambasting are thrown a handy line of defense. However, data dependent searching need not be pejorative. In some cases, it can improve severity. (4.6)

Big Data cannot ignore experimental design principles. Unless we take account of the sampling distribution, it becomes difficult to justify resampling and randomization. We consider RCTs in development economics (RCT4D) and genomics. Failing to randomize microarrays is thought to have resulted in a decade lost in genomics. Granted the rejection of error probabilities is often tied to presupposing their relevance is limited to long-run behavioristic goals, which we reject. They are essential for an epistemic goal: controlling and assessing how well or poorly tested claims are. (4.7)

Keywords

error probabilities and severity, predesignation, biasing selection effects, paradox of replication, capitalizing on chance, bayes factors, batch effects, preregistration, randomization: Bayes-frequentist rationale, bonferroni adjustment, false discovery rates, RCT4D, genome-wide association studies (GWAS)

 

Excursion 4 Tour IV: More Auditing: Objectivity and Model Checking

While all models are false, it’s also the case that no useful models are true. Were a model so complex as to represent data realistically, it wouldn’t be useful for finding things out. A statistical model is useful by being adequate for a problem, meaning it enables controlling and assessing if purported solutions are well or poorly probed and to what degree. We give a way to define severity in terms of solving a problem.(4.8) When it comes to testing model assumptions, many Bayesians agree with George Box (1983) that “it requires frequentist theory of significance tests” (p. 57). Tests of model assumptions, also called misspecification (M-S) tests, are thus a promising area for Bayes-frequentist collaboration. (4.9) When the model is in doubt, the likelihood principle is inapplicable or violated. We illustrate a non-parametric bootstrap resampling. It works without relying on a theoretical  probability distribution, but it still has assumptions. (4.10). We turn to the M-S testing approach of econometrician Aris Spanos.(4.11) I present the high points for unearthing spurious correlations, and assumptions of linear regression, employing 7 figures. M-S tests differ importantly from model selection–the latter uses a criterion for choosing among models, but does not test their statistical assumptions. They test fit rather than whether a model has captured the systematic information in the data.

Keywords

adequacy for a problem, severity (in terms of problem solving), model testing/misspecification (M-S) tests, likelihood principle conflicts, bootstrap, resampling, Bayesian p-value, central limit theorem, nonsense regression, significance tests in model checking, probabilistic reduction, respecification

 

Where you are in the Journey 

Categories: SIST, Statistical Inference as Severe Testing | 2 Comments

Excerpt from Excursion 4 Tour II: 4.4 “Do P-Values Exaggerate the Evidence?”

getting beyond…

Excerpt from Excursion 4 Tour II*

 

4.4 Do P-Values Exaggerate the Evidence?

“Significance levels overstate the evidence against the null hypothesis,” is a line you may often hear. Your first question is:

What do you mean by overstating the evidence against a hypothesis?

Several (honest) answers are possible. Here is one possibility:

What I mean is that when I put a lump of prior weight π0 of 1/2 on a point null H0 (or a very small interval around it), the P-value is smaller than my Bayesian posterior probability on H0.

More generally, the “P-values exaggerate” criticism typically boils down to showing that if inference is appraised via one of the probabilisms – Bayesian posteriors, Bayes factors, or likelihood ratios – the evidence against the null (or against the null and in favor of some alternative) isn’t as big as 1 − P. Continue reading

Categories: SIST, Statistical Inference as Severe Testing | 1 Comment

January Invites: Ask me questions (about SIST), Write Discussion Analyses (U-Phils)

.

ASK ME. Some readers say they’re not sure where to ask a question of comprehension on Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP)–SIST– so here’s a special post to park your questions of comprehension (to be placed in the comments) on a little over the first half of the book. That goes up to and includes Excursion 4 Tour I on “The Myth of ‘The Myth of Objectivity'”. However,I will soon post on Tour II: Rejection Fallacies: Who’s Exaggerating What? So feel free to ask questions of comprehension as far as p.259.

All of the SIST BlogPost (Excerpts and Mementos) so far are here.

.

WRITE A DISCUSSION NOTE: Beginning January 16, anyone who wishes to write a discussion note (on some aspect or issue up to p. 259 are invited to do so (<750 words, longer if you wish). Send them to my error email.  I will post as many as possible on this blog.

We initially called such notes “U-Phils” as in “You do a Philosophical analysis”, which really only means it’s an analytic excercize that strives to first give the most generous interpretation to positions, and then examines them. See the general definition of  a U-Phil.

Some Examples:

Mayo, Senn, and Wasserman on Gelman’s RMM** Contribution

U-Phil: A Further Comment on Gelman by Christian Hennig.

For a whole group of reader contributions, including Jim Berger on Jim Berger, see: Earlier U-Phils and Deconstructions

If you’re writing a note on objectivity, you might wish to compare and contrast Excursion 4 Tour I with a paper by Gelman and Hennig (2017): “Beyond subjective and objective in Statistics”.

These invites extend through January.

Categories: SIST, Statistical Inference as Severe Testing | Leave a comment

SIST* Blog Posts: Excerpts & Mementos (to Dec 31 2018)

Surveying SIST Blog Posts So Far

Excerpts

  • 05/19: The Meaning of My Title: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars
  • 09/08: Excursion 1 Tour I: Beyond Probabilism and Performance: Severity Requirement (1.1)
  • 09/11: Excursion 1 Tour I (2nd stop): Probabilism, Performance, and Probativeness (1.2)
  • 09/15: Excursion 1 Tour I (3rd stop): The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon (1.3)
  • 09/29: Excursion 2: Taboos of Induction and Falsification: Tour I (first stop)
  • 10/10: Excursion 2 Tour II (3rd stop): Falsification, Pseudoscience, Induction (2.3)
  • 11/30: Where are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3
  • 12/01: Neyman-Pearson Tests: An Episode in Anglo-Polish Collaboration: Excerpt from Excursion 3 (3.2)
  • 12/04: First Look at N-P Methods as Severe Tests: Water plant accident [Exhibit (i) from Excursion 3]
  • 12/11: It’s the Methods, Stupid: Excerpt from Excursion 3 Tour II  (Mayo 2018, CUP)
  • 12/20: Capability and Severity: Deeper Concepts: Excerpts From Excursion 3 Tour III
  • 12/26: Excerpt from Excursion 4 Tour I: The Myth of “The Myth of Objectivity” (Mayo 2018, CUP)
  • 12/29: 60 Years of Cox’s (1958) Chestnut: Excerpt from Excursion 3 tour II.

Mementos, Keepsakes and Souvenirs

  • 10/29: Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)
  • 11/8:   Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II)
  • 10/5:  “It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based” (Keepsake by Fisher, 2.1)
  • 11/14: Tour Guide Mementos and Quiz 2.1 (Excursion 2 Tour I Induction and Confirmation)
  • 11/17: Mementos for Excursion 2 Tour II Falsification, Pseudoscience, Induction
  • 12/08: Memento & Quiz (on SEV): Excursion 3, Tour I
  • 12/13: Mementos for “It’s the Methods, Stupid!” Excursion 3 Tour II (3.4-3.6)
  • 12/26: Tour Guide Mementos From Excursion 3 Tour III: Capability and Severity: Deeper Concepts

*Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (Mayo, CUP 2018).

Categories: SIST, Statistical Inference as Severe Testing | Leave a comment

Mayo-Spanos Summer Seminar PhilStat: July 28-Aug 11, 2019: Instructions for Applying Now Available

INSTRUCTIONS FOR APPLYING ARE NOW AVAILABLE

See the Blog at SummerSeminarPhilStat

Categories: Announcement, Error Statistics, Statistics | Leave a comment

Midnight With Birnbaum (Happy New Year 2018)

 Just as in the past 7 years since I’ve been blogging, I revisit that spot in the road at 9p.m., just outside the Elbar Room, look to get into a strange-looking taxi, to head to “Midnight With Birnbaum”. (The pic on the left is the only blurry image I have of the club I’m taken to.) I wonder if the car will come for me this year, as I wait out in the cold, now that Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (STINT) is out. STINT doesn’t rehearse the argument from my Birnbaum article, but there’s much in it that I’d like to discuss with him. The (Strong) Likelihood Principle–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics (and cognate methods). 2018 was the 60th birthday of Cox’s “weighing machine” example, which was the basis of Birnbaum’s attempted proof. Yet as Birnbaum insisted, the “confidence concept” is the “one rock in a shifting scene” of statistical foundations, insofar as there’s interest in controlling the frequency of erroneous interpretations of data. (See my rejoinder.) Birnbaum bemoaned the lack of an explicit evidential interpretation of N-P methods. Maybe in 2019? Anyway, the cab is finally here…the rest is live. Happy New Year! Continue reading

Categories: Birnbaum Brakes, strong likelihood principle | Tags: , , , | Leave a comment

You Should Be Binge Reading the (Strong) Likelihood Principle

 

.

An essential component of inference based on familiar frequentist notions: p-values, significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory, or my preferred error statistics, as we get error probabilities from the sampling distribution). This feature results in violations of a principle known as the strong likelihood principle (SLP). To state the SLP roughly, it asserts that all the evidential import in the data (for parametric inference within a model) resides in the likelihoods. If accepted, it would render error probabilities irrelevant post data.

SLP (We often drop the “strong” and just call it the LP. The “weak” LP just boils down to sufficiency)

For any two experiments E1 and E2 with different probability models f1, f2, but with the same unknown parameter θ, if outcomes x* and y* (from E1 and E2 respectively) determine the same (i.e., proportional) likelihood function (f1(x*; θ) = cf2(y*; θ) for all θ), then x* and y* are inferentially equivalent (for an inference about θ).

(What differentiates the weak and the strong LP is that the weak refers to a single experiment.)
Continue reading

Categories: Error Statistics, Statistics, strong likelihood principle | 1 Comment

60 Years of Cox’s (1958) Chestnut: Excerpt from Excursion 3 Tour II (Mayo 2018, CUP)

.

2018 marked 60 years since the famous weighing machine example from Sir David Cox (1958)[1]. It’s one of the “chestnuts” in the exhibits of “chestnuts and howlers” in Excursion 3 (Tour II) of my new book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST). It’s especially relevant to take this up now, just before we leave 2018, for reasons that will be revealed over the next day or two. So, let’s go back to it, with an excerpt from SIST (pp. 170-173).

Exhibit (vi): Two Measuring Instruments of Different Precisions. Did you hear about the frequentist who, knowing she used a scale that’s right only half the time, claimed her method of weighing is right 75% of the time?

She says, “I flipped a coin to decide whether to use a scale that’s right 100% of the time, or one that’s right only half the time, so, overall, I’m right 75% of the time.” (She wants credit because she could have used a better scale, even knowing she used a lousy one.)

Basis for the joke: An N-P test bases error probability on all possible outcomes or measurements that could have occurred in repetitions, but did not. Continue reading

Categories: Birnbaum, Statistical Inference as Severe Testing, strong likelihood principle | 2 Comments

Excerpt from Excursion 4 Tour I: The Myth of “The Myth of Objectivity” (Mayo 2018, CUP)

.

Tour I The Myth of “The Myth of Objectivity”*

 

Objectivity in statistics, as in science more generally, is a matter of both aims and methods. Objective science, in our view, aims to find out what is the case as regards aspects of the world [that hold] independently of our beliefs, biases and interests; thus objective methods aim for the critical control of inferences and hypotheses, constraining them by evidence and checks of error. (Cox and Mayo 2010, p. 276)

Whenever you come up against blanket slogans such as “no methods are objective” or “all methods are equally objective and subjective” it is a good guess that the problem is being trivialized into oblivion. Yes, there are judgments, disagreements, and values in any human activity, which alone makes it too trivial an observation to distinguish among very different ways that threats of bias and unwarranted inferences may be controlled. Is the objectivity–subjectivity distinction really toothless, as many will have you believe? I say no. I know it’s a meme promulgated by statistical high priests, but you agreed, did you not, to use a bit of chutzpah on this excursion? Besides, cavalier attitudes toward objectivity are at odds with even more widely endorsed grass roots movements to promote replication, reproducibility, and to come clean on a number of sources behind illicit results: multiple testing, cherry picking, failed assumptions, researcher latitude, publication bias and so on. The moves to take back science are rooted in the supposition that we can more objectively scrutinize results – even if it’s only to point out those that are BENT. The fact that these terms are used equivocally should not be taken as grounds to oust them but rather to engage in the difficult work of identifying what there is in “objectivity” that we won’t give up, and shouldn’t. Continue reading

Categories: Error Statistics, SIST, Statistical Inference as Severe Testing | 4 Comments

Tour Guide Mementos From Excursion 3 Tour III: Capability and Severity: Deeper Concepts

Excursion 3 Tour III:

A long-standing family feud among frequentists is between hypotheses tests and confidence intervals (CIs). In fact there’s a clear duality between the two: the parameter values within the (1 – α) CI are those that are not rejectable by the corresponding test at level α. (3.7) illuminates both CIs and severity by means of this duality. A key idea is arguing from the capabilities of methods to what may be inferred. CIs thereby obtain an inferential rationale (beyond performance), and several benchmarks are reported. Continue reading

Categories: confidence intervals and tests, reforming the reformers, Statistical Inference as Severe Testing | Leave a comment

Capability and Severity: Deeper Concepts: Excerpts From Excursion 3 Tour III

Deeper Concepts 3.7, 3.8

Tour III Capability and Severity: Deeper Concepts

 

From the itinerary: A long-standing family feud among frequentists is between hypotheses tests and confidence intervals (CIs), but in fact there’s a clear duality between the two. The dual mission of the first stop (Section 3.7) of this tour is to illuminate both CIs and severity by means of this duality. A key idea is arguing from the capabilities of methods to what may be inferred. The severity analysis seamlessly blends testing and estimation. A typical inquiry first tests for the existence of a genuine effect and then estimates magnitudes of discrepancies, or inquires if theoretical parameter values are contained within a confidence interval. At the second stop (Section 3.8) we reopen a highly controversial matter of interpretation that is often taken as settled. It relates to statistics and the discovery of the Higgs particle – displayed in a recently opened gallery on the “Statistical Inference in Theory Testing” level of today’s museum. Continue reading

Categories: confidence intervals and tests, Statistical Inference as Severe Testing | 2 Comments

Summer Seminar PhilStat: July 28-Aug 11, 2019 (ii)

First draft of PhilStat Announcement

 

Categories: Announcement, Error Statistics | 5 Comments

Blog at WordPress.com.