I’m cleaning away some cobwebs around my old course notes, as I return to teaching after 2 years off (since I began this blog). The change of technology alone over a mere 2 years (at least here at Super Tech U) might be enough to earn me techno-dinosaur status: I knew “Blackboard” but now it’s “Scholar” of which I know zilch. The course I’m teaching is supposed to be my way of bringing “big data” into introductory critical thinking in philosophy! No one can be free of the “sexed up term for statistics,” Nate Silver told us (here and here), and apparently all the college Deans & Provosts have followed suit. Of course I’m (*mostly*) joking; and it was my choice.

Anyway, the course is a nostalgic trip back to *critical thinking*. Stepping back from the grown-up metalogic and advanced logic I usually teach, hop-skipping over baby logic, whizzing past toddler and infant logic…. and arriving at something akin to what R.A. Fisher dubbed “the study of the embryology of knowledge” (1935, 39) (a kind of ‘fetal logic’?) which, in its very primitiveness, actually demands a highly sophisticated analysis. In short, it’s turning out to be the same course I taught nearly a decade ago! (but with a new book and new twists). But my real point is that the hodge-podge known as “critical thinking,” were it seriously considered, requires getting to grips with some very basic problems that we philosophers, with all our supposed conceptual capabilities, have left unsolved. (I am alluding to Gandenberger‘s remark). I don’t even think philosophers are working on the problem (these days). (Are they?)

I refer, of course, to our inadequate understanding of how to relate deductive and inductive inference, assuming the latter to exist (which I do)—whether or not one chooses to call its study a “logic”[i]. [That is, even if one agrees with the Popperians that the only logic is deductive logic, there may still be such a thing as a critical scrutiny of the approximate truth of premises, without which no inference is ever detached even from a deductive argument. This is also required for Popperian corroboration or well-testedness.]

We (and our textbooks) muddle along with vague attempts to see inductive arguments as more or less parallel to deductive ones, only with probabilities someplace or other. I’m not saying I have easy answers, I’m saying I need to invent a couple of new definitions in the next few days that can at least survive the course. Maybe readers can help.

______________________

I view ‘critical thinking’ as developing methods for critically evaluating the (approximate) truth or adequacy of the premises which may figure in deductive arguments. These methods would themselves include both deductive and inductive or “ampliative” arguments. Deductive validity is a matter of form alone, and so philosophers are stuck on the idea that inductive logic would have a formal rendering as well. But this simply is not the case. Typical attempts are arguments with premises that take overly simple forms:

If all (or most) J’s were observed to be K’s, then the next J will be a K, at least with a probability p.

To evaluate such a claim (essentially the rule of *enumerative induction*) requires context-dependent information (about the nature and selection of the K and J properties, their variability, the “next” trial, and so on). Besides, most interesting ampliative inferences are to generalizations and causal claims, not mere predictions to the next J. The problem isn’t that an algorithm couldn’t evaluate such claims, but that the evaluation requires context-dependent information as to *how the ampliative leap can go wrong*. Yet our most basic texts speak as if potentially warranted inductive arguments are like potentially sound deductive arguments, *more or less*. But it’s not easy to get the “more or less” right, for any given example, while still managing to say anything systematic and general. That is essentially the problem…..

______________________

The age-old definition of argument that we all learned from Irving Copi still serves: a group of statements, one of which (the conclusion) is claimed to follow from one or more others (the premises) which are regarded as supplying evidence for the truth of that one. This is written:

P_{1}, P_{2},…P_{n}/ ∴ C.

In a deductively valid argument, if the premises are all true then, necessarily, the conclusion is true. To use the “⊨” (double turnstile) symbol:

P_{1}, P_{2},…P_{n} ⊨ C.

Does this mean:

P_{1}, P_{2},…P_{n}/ ∴ necessarily C?

No, because we do not detach “necessarily C”, which would suggest C was a necessary claim (i.e., true in all possible worlds). “Necessarily” qualifies “⊨”, the very relationship between premises and conclusion:

It’s logically impossible to have all true premises and a false conclusion, on pain of logical contradiction.

We should see it (i.e., deductive validity) as qualifying t*he process of “inferring,” a*s opposed to the “inference” that is detached–the statement placed to the right of “⊨”. A valid argument is a procedure of inferring that is 100% reliable, in the sense that if the premises are all true, then 100% of the time the conclusion is true.

Deductively Valid Argument*: Three equivalent expressions:*

(D-i) If the premises are all true, then necessarily, the conclusion is true.

(I.e., if the conclusion is false, then (necessarily) one of premises is false.)

(D-ii) It’s (logically) impossible for the premises to be true and the conclusion false.

(I.e., to have the conclusion false with the premises true leads to a logical contradiction, A & ~A.)

(D-iii) The argument maps true premises into a true conclusion with 100% reliability.

(I.e., if the premises are all true, then 100% of the time the conclusion is true).

(Deductively) Sound argument: deductively valid + premises are true/approximately true.

All of this is baby logic; but with so-called inductive arguments, terms are not so clear-cut. (“Embryonic logic” demands, at times, more sophistication than grown-up logic.) But maybe the above points can help…

________

With an inductive argument, the conclusion goes beyond the premises. So it’s logically possible for all the premises to be true and the conclusion false.

Notice that if one *had* characterized deductive validity as

(a) P_{1}, P_{2},…P_{n} ⊨ necessarily C,

then it would be an easy slide to seeing inductively inferring as:

(b) P_{1}, P_{2},…P_{n} ⊨ probably C.

But (b) is wrongheaded, I say, for the same reason (a) is. Nevertheless, (b) (or something similar) is found in many texts. We (philosophers) should stop foisting ampliative inference into the deductive mould. So, here I go trying out some decent parallels:

In all of the following, “true” will mean “true or approximately true”.

*An inductive argument (to inference C) is strong or potentially severe only if any of the following (equivalent claims) hold [iii]*

(I-i) If the conclusion is false, then very probably at least one of the premises is false.

(I-ii) It’s improbable that the premises are all true while the conclusion false.

(I-iii) The argument leads from true premises to a true conclusion with high reliability (i.e., if the premises are all true then (1-a)% of the time, the conclusion is true).

To get the probabilities to work, the premises and conclusion must refer to “generic” claims of the type, but this is the case for deductive arguments as well (else their truth values wouldn’t be altering). However, the basis for the [I-i through I-iii] requirement, in any of its forms, will not be formal; it will demand a contingent or empirical ground. Even after these are grounded, the approximate truth of the premises will be required. Otherwise, it’s only *potentially* *severe*. (This is parallel to viewing a valid deductive argument as potentially sound.)

We get the following additional parallel:

Deductively unsound argument:

Denial of D-(i), (D-ii), or (D-iii): it’s logically possible for all its premises to be true and the conclusion false.

OR

One or more of its premises are false.

Inductively weak inference: insevere grounds for C

Denial of I-(i), (ii), or (iii): Premises would be fairly probable even if C is false.

OR

Its premises are false (not true to a sufficient approximation)

There’s still some “winking” going on, and I’m sure I’ll have to tweak this. What do you think?

Fully aware of how the fuzziness surrounding inductive inference has non-trivially (adversely) influenced the entire research program in philosophy of induction, I’ll want to rethink some elements from scratch, this time around….

______________

So I’m back in my Thebian palace high atop the mountains in Blacksburg, Virginia. The move from looking out at the Empire state building to staring at endless mountain ranges is… calming.[iv]

**References:**

[i] I do, following Peirce, but it’s an informal not a formal logic (using the terms strictly).

[ii]The double turnstile denotes the “semantic consequence” relationship; the single turnstyle, the syntatic (deducibility) relationship. But some students are not so familiar with “turnstiles”.

[iii]I intend these to function equivalently.

[iv] Someone asked me “what’s the biggest difference I find in coming to the rural mountains from living in NYC?” I think the biggest contrast is the amount of space. Not just that I live in a large palace, there’s the tremendous width of grocery aisles: 3 carts wide rather than 1.5 carts wide. I hate banging up against carts in NYC, but this feels like a major highway!

Copi, I. (1956). *Introduction to Logic*. New York: Macmillan.

Fisher, R.A. (1935). *The Design of Experiments*. Edinburgh: Oliver & Boyd.

** **

** **

Note: I caught a number of errors in earlier drafts, so please use only the version posted after 11p.m. Thursday, Aug. 22, 2013. Thanks.

It disappoints me that philosophers are not seemingly working on very basic problems that you hint at in the beginning of this post (with the reference to Gandenberger’s remark).

Here is one comment I have pertaining to the parallels you draw between deductive and inductive arguments: Classifying inductive inferences as strong or weak makes it seem more related to measuring strength of evidence, which seems closer epistemology than (formal, first-order) logic. Hence, a question I have is: was this apparent tie to epistemology intended when you created the parallels between deductive arguments and inductive inferences (toward the very end of this post)? Also, may I ask you to clarify what it means to count as a “sufficient approximation” (to the truth of the premises)?

Hi Nicole: Generallly the premises are at best approximately true, I intended to signal the denial, although mere “false” will do. Of course there “are ties” between inferring and epistemology—we know most things by inferring, right? Evaluating inductive inferring has never been a dichotomous classification as is deductive validity—so I don’t really get your question.

One of the occupational hazards of being a philosopher: having to teach an intro-level course about a topic on which there is not even a somewhat defensible party line.

You might find it helpful to draw on Ronald Giere’s Understanding Scientific Reasoning if haven’t done so already. It provides a practical approach that (no surprise) fits well with error statistics.

I think this is a really interesting post. I’m not sure what my position is on the relation between deductive and inductive logic — I’d need to think about it A LOT more — but here are some preliminary thoughts after having read your post:

My first thought is pretty embryonic. It’s that it seems like a key difference between deductive reasoning and inductive reasoning will have something to do with the fact that inductive reasoning contains as a basic component what some have called “material inference”, which, as I understand it, can be construed as just the kind of inference expressed by a simple hypothetical like ‘If A then B’. It’s a direct (and informal) move from one (or more than one) proposition to another. It seems like ampliative induction is a kind of material inference (or something).

My second thought is to agree with you that both deductive and inductive reasoning should be construed as ‘P1, P2,…Pn ⊨ C’. At least one of the problems with construing inductive reasoning as ‘P1, P2,…Pn ⊨ probably C’ is that in reasoning inductively, we’re sometimes sure that the conclusion follows (though not formally) from the premises. The only reason to add ‘probably’ in our formulation of inductive reasoning is if we’re thinking from the perspective of deductive logic. The conclusion may not follow logically necessarily (i.e., formally) from the premises, but that doesn’t mean that the conclusion only probably follows from the premises. I think the goal of inductive reasoning is to get to reasoning processes in which the conclusion does necessarily follow from the premises, though not formally. (I think that I’m just pointing to the distinction between “logical necessity” and “physical necessity”.)

Matt: Thanks for sharing your embryonic first thoughts! (I’m glad in general, by the way, to see some phil grad students jumping into the comments.)

“it seems like a key difference between deductive reasoning and inductive reasoning will have something to do with the fact that inductive reasoning contains as a basic component what some have called ‘material inference’”.

I take it that’s the gist of my saying it’s context dependent. However, given that one can add a hypothetical to any argument to make it valid, to stop at that just turns the task into assessing the truth of the additional premise. That’s what I think the role of (something like) critical thinking is. Its advantage to merely assessing each and every premise from scratch is exactly that: to provide general (although not purely syntactical or formal) principles/methods for evaluating types of “If A then B” claims (or other types). Just saying its material (perhaps along the lines of Norton), doesn’t go very far in giving a way to appraise inductive arguments systematically. And I’m not prepared to give that up entirely–nor need we. (That’s where statistical science enters—at least for a group of cases.)

“We’re sometimes sure that the conclusion follows (though not formally) from the premises”. But why is that “a problem” for viewing deductive validity using |= (which has nothing to do with how sure you are)?

I find your next point very interesting though: that we want inductive arguments to corroborate (i.e., license with severity in my sense) claims with “physical necessity”. But does that cash out in terms of my I-i, ii, or iii? (perhaps with the approx. truth of prems?)

Greg: Of course I know Giere (and his books) very well for many years. He’s a fellow exile (the only philosopher whose view was even close to mine way back when):

https://errorstatistics.com/2011/10/22/the-will-to-understand-power-neymans-nursery/

He was the person who helped me more than anyone else for the first decade or so that I was in philosophy; and he’s a good friend as well.

I once wrote a short revision of a section of Understanding Scientific Reasoning so that the statistical computations for a difference between two means was included. That’s because in Giere’s previous edition, he had a rule of thumb based on checking if the two confidence intervals overlapped. I showed him how far off this rule of thumb can be, so he included my computation in his next edition. Apparently when it came around to a further edition, the publisher said to kick it out because I guess it was too technical—that little baby computation!

But anyway, as much as Giere and I largely agree on things, he goes a bit too far in holding an N-P-style behavioristic justification for all cases. Naturally, I’d cash out the conditions I gave (in my post) to be relevant inferentially–for the severity of the particular inference.

There comes a time–in my case, it was already before EGEK–that one realizes there’s no party line that will do the job, and that no one can or will solve the problem. The choice is: buy the party line or be ready to forge one’s own path (even if it means exile)!

(1) I now don’t think there is such a thing as “inductive inference” particularly nothing that could be filled in to make it deductive. The step from data to a generalization or a higher level theoretical hypothesis is a matter of deciding whether to accept the result as close enough to true for one’s purposes. And purposes, of course, involve values, most basically, the cost, in one’s context, of not accepting a hypothesis that is close to true or accepting one that is far from the truth. So considering a hypothesis is very different in the context of approving a drug for mass consumption vs. for an ongoing scientific investigation. There is no “universal” context.

(2) Ron: Great to hear from you! But I don’t understand what you wrote actually. Of course any inference can be made deductive by adding an appropriate premise, so that can’t be what you mean. And of course there’s context dependence, which is why I stress it’s not formal. Say it’s scientist Perrin making an inference about molecules from Brownian motions experiments only after finding the same estimate on 13 distinct phenomena over all those years, or pick a favorite example. He would first need to make one of your inferences to how “far or close the hypothesis is to true”, in order to then apply the decision criteria. So you don’t get rid of inference that way. Standards of evidence can be more or less stringent for any reasons you like, but determining a standard is approximately met (e.g., that the estimate of Avogadro’s is not further off than such and such) has to be prior to deciding if it’s close enough for a given purpose or what have you. Right?

And I take it you grant we have to have means by which to scrutinize Giere’s lax (or Mayo’d stringent) values for a given purpose.

(3) Deborah,

You can’t determine “that the estimate of Avogadro’s is not further off than such and such.” The most you have is something like a power function giving the probabilities that “that the estimate of Avogadro’s is not further off than such and such.” Going further requires a decision. But let’s get back to teaching scientific reasoning.

When I first wrote USR, I began with baby deductive logic and then fashioned inductive analogs of standard deductive forms, particularly modus tollens. Students then were to construct and evaluate arguments for or against hypotheses relative to given data. Over time I concluded that constructing explicit arguments, with premises and a conclusion, was a worthless exercise. It obscured the science. So I wrote the third edition eliminating the logic chapter and replacing the arguments with a simple decision tree, a crucial branch of which was the consideration of severity (though not by that name). This puts the focus on the point of gathering evidence or designing experiments, to decide whether to accept or reject a hypothesis relative to given purposes, which might be just advancing the science. So the issues is not so much “inference” vs “decision” as approaching scientific reasoning using the apparatus of logic (arguments, premisses, conclusions) or focusing directly on what needs to be decided, is the hypothesis good enough to take seriously or not. This goes against the tradition of “critical thinking” which is all about argument analysis. Science is more structured than everyday reasoning.

Ron! Leaving your philosophical queries aside for the moment, I decided (!) if you shared another comment I’d mention to everyone how you came to my rescue in Sept 2011 when I was non-trivially injured in San Fran on the way to a conference we were both going to (on experiment and modeling). I couldn’t walk at all, and poor Ron spent hours in the emergency room with me, helping to wheel me around—and push me in a wheelchair. remember? I was an invalid the entire time! And Ron was an angel!

https://errorstatistics.com/2011/09/20/a-highly-anomalous-event/

(4) Ron: You seem to be taking the Neyman behavioristic philosophy to an extreme, but I don’t think you mean it. In any event I agree with Fisher who said that Neyman was only giving us his preferred way of speaking (i.e., data are for inductive behavior, not for inductive learning or inference). If you want to call every evidential judgment a kind of decision, it scarcely matters. Unless, of course, you take it seriously.

I take it you would not allow just any “utilities” and “interests” to influence the interpretation of the data for finding things out (e.g., GM crops threaten x’s religious beliefs, so it is appropriate for x to interpret data on potential risks as serious no matter what.) “So considering a hypothesis is very different in the context of approving a drug for mass consumption vs. for an ongoing scientific investigation”. Right, but who except maybe extreme social constructivists would equate the two? It’s the ability to see the difference that cuts against the view that it’s all social negotiation, subjective interests, might makes right. My own view is that I don’t think much of science involves accepting and rejecting hypotheses—and the dichotomized view of tests as “accept/reject” machines has caused a lot of damage. I think scientists primarily care to figure out what, if anything, we might learn from the data. Policymaking is a different ball game, but I insist on holding policy-makers accountable: put in the utilities, fine, but there better be some evidence-based assessment.

Logic & Stat: Maybe we should take advantage of the foray into logic here. I am well aware that the logic vs set-theory divide can result in philosophers speaking in a language somewhat unfamiliar to statisticians. But we know that things can be equivalently expressed in logic or in set theory, so it shouldn’t be a big deal. Now a simple logical truth is that

A = (A & (B V ~B)).

I am using = for logical equivalence between statements, being without any Elbian help on the blog (they seem to have gone on holiday). It should be three bars. This says that if you conjoin a tautology and A, the resulting sentence is logically equivalent to A. “& “ is an and operator. If A is true, then (A & (B V ~B)) is true. And if A is false, then (A & (B V ~B)) is false. The two statements always have the same truth value.

Say I wanted to express that a statement C adds no content or no information for purposes of drawing an inference from A. I might express this as:

“To conjoin C to A is inferentially the same as conjoining a tautology to A”

(A & C) = (A & (B V ~B))

Or I might say conjoining C is the same as conjoining an utter irrelevancy to A, for purposes of the inference from A. Perhaps the reader can guess where I’m going with this?

(Here below, I’m drawing on a quick email exchange with Deborah Mayo yesterday.)

In the post it is said that

(*) An inductive argument is strong if and only if, if the conclusion is false, then very probably one of the premises is false.

When reading this, I immediately thought that (*) could be turned into the following comparative variation:

(**) The inductive argument from (the conjunction of) premise(s) E to conclusion H is stronger than [as strong as] the inductive argument from E to H* if and only if Pr(E|not-H) is lower than [equal to] Pr(E|not-H*).

I find (**) fairly sensible on independent grounds (see, e.g., here) and I also thought Mayo would concur. She said she was skeptical, however, which puzzles me a bit. For, after all, if one is willing to see 1 – Pr(E|not-H) as a measure of severity, and relies on severity to explicate inductive strength, then (**) readily follows. (No?)

I would love to know what people think about this.

(Super interesting post, anyway!)

Vincenzo: Thanks for this. No, that’s not a measure of severity.

P(E|not-H) looks to be the “Bayesian catchall factor”.

Vincenzo: Suppose I am tasked by a group of football coaches to ensure that a specially made coin that will be used to determine the first possession at the start of a game is fair. I give the coin to my graduate student and have her flip the coin in the same manner as done at game time 1000 times. My null hypothesis is that the coin is perfectly fair, with probability of heads =0.5. Experiment is performed and we see heads 480 times. I estimate that the probability of heads 480 times or less is approx 0.10. What is the probability of not fair? I do not know. I go back to the coaches and tell them that the coin cannot be said to be unfair by my experiment and ask them how far off should it be before we disqualify it? They tell me that they hoped for 50% heads in use, but would be willing to accept as low as 45% before disqualification (or as high as 55%). I can see that the severity for the inference that my result is too high to support an alternate hypothesis of 0.45 as the prob of heads is approx 0.97, which is quite strong. Now, I report to the coaches that the coin meets their standard for fairness. Do you see a place for the probability of not-H for the hypothesis of fairness? Any relation between not-H and severity? I do not.

Well, if the coin is biased 55%, for that matters), then observing 480 heads (on those experimental conditions) was unlikely, namely, Pr(E|not-H) was (relatively) low. That would support the inference to the conclusion (H) that the coin meets that standard for fairness. As far as I understand it at all, this is just how statement (*) – which I took from the post – would apply to this example. Am I missing something?

Sorry, I’m having problems with editing… The first sentence above should read “well, if the coin is biased with less than 45% Pr of heads (or more than 55%, for that matters), …”

Vincenzo: I think our difference centers on how you can define P(E|not-H). I do not see how you can correctly define it. What would the setup be with the numbers in the example?

John: Right, yes, pretty sure we would disagree there.

But, let me add, this was not meant to be my point. Which could now be framed as follows, instead:ifthere is no acceptable way to assess Pr(E|not-H) as (relatively) low in your example,theneither (i) principle (*) – which attracted my attention from the original post – does not make sense of the inductive step in that example, or else (ii) I just fail to understand what (*) means. (I take (ii) as a very real possibility, by the way.)Vincenzo: I think the problem is the concept of “not-H.” When you focus on that, such as trying to assign a number to it, it really is not there. It is a catchall and as such is not legitimately operationized. Dropping not-H does away with **.

John: Concerning your example, dropping not-H does away with (*), too, right? It’s a pity, but so be it🙂

Vincenzo: I think that * survives intact. Again, I think you have to try to operationalize not-H to see that it cannot help clarify any of this. The problem is you have incorporated not-H into your argument. In my simple example, the hypothesis is that the coin is fair and there are specific premises that speak to the qualities of the coin (balanced), expectations for probability of heads, and the nature of the experiment. If the coin is unfair, then at least one of the premises must be problematic. I can arrive at this realization without an ill-defined concept like P(E|not-H). If my results suggest the coin is fair, what I care about is the severity of the test that leads me to that conclusion, which I calculated against a relevant benchmark in the example.

Vincenzo: The bottom line is that any “base” measure of “confirmation” or “support” or the like will at most provide for the measure of “fit” or “misfit” (with some claim H) in a severity assessment. When we ask about error probabilities, we treat the fit measure as a random variable*. This lets us pick up on things like stopping rules, ad hoc procedures to generate good fits, cherry picking and the like. The “not-H” is associated with a specific claim, (e.g., p(heads) is at least .6, independence is violated). If there’s a high probability for so good a fit with H, even if not-H is true, then the test H has passed has low severity. We do not have to delineate all possible ways a hypothesis could fail in order to evaluate a given claim–unlike the typical Bayesian catchall factor.

With more general claims outside the formalism of statistics, these assessments are more informal.

*Think of how a P-value measure can be viewed as a statistic taking on values with different probabilities under various hypotheses.

John, Deborah: Thanks for your replies. Again, no doubt, in general terms, the disagreement branches out in various interesting directions. However, I still think that my point was quite simpler (and possibly less engaging) than all this.

(1) Quote from John’s example: “I report to the coaches that the coin meets their standard for fairness”, namely, that the probability of heads is between 45% and 55%.

(2) I take this to be the conclusion of an inductive inference, and label it “H”.

(3) Then I look at Deborah’s statement (*): for the inductive argument to be strong (enough), it must be the case that, if the conclusion (H) is false (i.e., if not-H is true), then some part of the premises / evidence E must probably (enough) be false.

(4) So, for the argument at issue in John’s example (see 1 above) to be strong, Pr(E|not-H) must have been low (enough).

(5) But, John says, Pr(E|not-H) would not be defined in this set up, and so one can not say it was low (enough).

CONCLUSION: One can not say that the inductive inference to H in John’s example was strong on the basis of (*).

The argument above is pretty much deductive in nature. (1), (3), and (5) is plain exegesis, (2) and (4) seems uncontroversial notation and terminology to me. So then: what is it that you disagree with?

Vincenzo: you say that 4) is uncontroversial, but I do not agree, as stated previously. What is the setup and numeric value of P(E|not-H)? Perhaps that will move us forward.

John: No, the truth of (4) does not require a settled value for Pr(E|not-H). There’s only one little thing happening in (4): I’m encoding in a string of symbols Deborah’s phrase “if the conclusion is false, then … one of the premises etc.”

Let me try this way: would you explain how the inference to H in your example (“the coin meets the coaches’ standard for fairness”) would be sanctioned by (*)? Namely, how would you show that, given the falsity of

thisconclusion, some of the evidence would probably (enough) be false? [That’s what (*) requires.Otherwise, that example and (*) must part.]Vincenzo: I read your ** as assigning probability values to the catchall hypotheses. This is not the same approach as inferring that if my results tell me that the coin is fair, yet it is not, then there is probably a problem with my premises– perhaps my setup for the statistical test was incorrect for the purpose, or the experiment was executed inappropriately (student tried to drop the coin with head up each time), or perhaps a very unlikely yet possible outcome was observed despite the statistic being right and the experiment executed properly. This is an everyday problem in the research world.