**Just as in the past 5 years since I’ve been blogging, I revisit that spot in the road at 11p.m., just outside the Elbar Room, get into a strange-looking taxi, and head to “Midnight With Birnbaum”. (The pic on the left is the only blurry image I have of the club I’m taken to.) I wonder if the car will come for me this year, given that my Birnbaum article has been out since 2014… The (Strong) Likelihood Principle–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics (and cognate methods). Yet as Birnbaum insisted, the “confidence concept” is the “one rock in a shifting scene” of statistical foundations, insofar as there’s interest in controlling the frequency of erroneous interpretations of data. (See my rejoinder.) Birnbaum bemoaned the lack of an explicit evidential interpretation of N-P methods. Maybe in 2017? Anyway, it’s 6 hrs later here, so I’m about to leave for that spot in the road… If I’m picked up, I’ll add an update at the end.**

You know how in that (not-so) recent Woody Allen movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf? He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve ~~2011~~ ~~2012~~, ~~2013~~, ~~2014~~, ~~2015~~, 2016) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i] There are a couple of brief (12/31/14 & 15) updates at the end.

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics. I happen to be writing on your famous argument about the likelihood principle (LP). (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept.

ERROR STATISTICIAN: Yes, but I actually don’t think your argument shows that the LP follows from such frequentist concepts as sufficiency S and the weak conditionality principle WLP.[ii] Sorry,…I know it’s famous…

BIRNBAUM: Well, I shall happily invite you to take any case that violates the LP and allow me to demonstrate that the frequentist is led to inconsistency, provided she also wishes to adhere to the WLP and sufficiency (although less than S is needed).

ERROR STATISTICIAN: Well I happen to be a frequentist (error statistical) philosopher; I have recently (2006) found a hole in your proof,..er…well I hope we can discuss it.

BIRNBAUM: Well, well, well: I’ll bet you a bottle of Elba Grease champagne that I can demonstrate it!

ERROR STATISTICAL PHILOSOPHER: It is a great drink, I must admit that: I love lemons.

BIRNBAUM: OK. (A waiter brings a bottle, they each pour a glass and resume talking). Whoever wins this little argument pays for this whole bottle of vintage Ebar or Elbow or whatever it is Grease.

ERROR STATISTICAL PHILOSOPHER: I really don’t mind paying for the bottle.

BIRNBAUM: Good, you will have to. Take any LP violation. Let x’ be 2-standard deviation difference from the null (asserting m = 0) in testing a normal mean from the fixed sample size experiment E’, say n = 100; and let x” be a 2-standard deviation difference from an optional stopping experiment E”, which happens to stop at 100. Do you agree that:

(0) For a frequentist, outcome x’ from E’ (fixed sample size) is NOT evidentially equivalent to x” from E” (optional stopping that stops at n)

ERROR STATISTICAL PHILOSOPHER: Yes, that’s a clear case where we reject the strong LP, and it makes perfect sense to distinguish their corresponding p-values (which we can write as p’ and p”, respectively). The searching in the optional stopping experiment makes the p-value quite a bit higher than with the fixed sample size. For n = 100, data x’ yields p’= ~.05; while p” is ~.3. Clearly, p’ is not equal to p”, I don’t see how you can make them equal.

BIRNBAUM: Suppose you’ve observed x’, a 2-standard deviation difference from E’. You admit, do you not, that this outcome could have occurred as a result of a different experiment? It could have been that a fair coin was flipped where it is agreed that heads instructs you to perform E’ and tails instructs you to perform the optional stopping test E”, and you happened to get heads, and then performed the experiment E’ (with n = 100) and obtained your 2-standard deviation difference x’.

ERROR STATISTICAL PHILOSOPHER: Well, that is not how I got x’, but ok, it could have occurred that way.

BIRNBAUM: Good. Then you must grant further that your result could have come from a special experiment I have dreamt up, call it a BB-experiment. In a BB- experiment, if the outcome from the experiment you actually performed has an outcome with a proportional likelihood to one in some other experiment not performed, E”, then we say that your result has an “LP pair”. For any violation of the strong LP, the outcome observed, let it be x’, has an “LP pair”, call it x”, in some other experiment E”. In that case, a BB-experiment stipulates that you are to report x’ as if you had determined whether to run E’ or E” by flipping a fair coin.

(They fill their glasses again)

ERROR STATISTICAL PHILOSOPHER: You’re saying that if my outcome from the fixed sample size experiment E’ has an “LP pair” in the (optional stopping) experiment I did not perform, then I am to report x’ as if the determination to run E’ was by flipping a fair coin (which decides between E’ and E”)?

BIRNBAUM: Yes, and one more thing. If your outcome had actually come from the optional stopping experiment E”, it too would have an “LP pair” in the experiment you did not perform, E’. Whether you actually observed x’ from E’, or x” from E”, you are to report it as x’ from E’.

ERROR STATISTICAL PHILOSOPHER: So let’s see if I understand a Birnbaum BB-experiment: whether my observed 2-standard deviation difference came from E’ or E” the result is reported as x’, as if it came from E’, and as a result of this strange type of a mixture experiment.

BIRNBAUM: Yes, or equivalently you could just report x*: my result is a 2-standard deviation difference and it could have come from either E’ (fixed sampling, n= 100) or E” (optional stopping, which happens to stop at the 100^{th} trial). That’s how I sometimes formulate a BB-experiment.

ERROR STATISTICAL PHILOSOPHER: You’re saying in effect that if my result has an LP pair in the experiment not performed, I should act as if I accept the strong LP and just report it’s likelihood; so if the likelihoods are proportional in the two experiments (both testing the same mean), the outcomes are evidentially equivalent.

BIRNBAUM: Well, but since the BB- experiment is an imagined “mixture” it is a single experiment, so really you only need to apply the weak LP which frequentists accept. Yes?

ERROR STATISTICAL PHILOSOPHER: But what is the sampling distribution in this imaginary BB- experiment? Suppose I have Birnbaumized my experimental result, just as you describe, and observed a 2-standard deviation difference in a fixed sample size experiment E’. How do I calculate the p-value within a Birnbaumized experiment?

BIRNBAUM: I don’t think anyone has ever called it that.

ERROR STATISTICAL PHILOSOPHER: I just wanted to have a shorthand for the operation you are describing, there’s no need to use it, if you’d rather I not. So how do I calculate the p-value within a BB-experiment?

BIRNBAUM: You would report the overall p-value, which would be the average over the sampling distributions: (p’ + p”)/2

Say p’ is ~.05, and p” is ~.3; whatever they are, we know they are different, that’s what makes this a violation of the strong LP (given in premise (0)).

ERROR STATISTICAL PHILOSOPHER: So you’re saying that if I observe a 2-standard deviation difference from E’, I do not report the associated p-value p’, but instead I am to report the average p-value, averaging over some other experiment E” that could have given rise to an outcome with a proportional likelihood to the one I observed, even though I didn’t obtain it this way?

BIRNBAUM: I’m saying that you have to grant that x’ from a fixed sample size experiment E’ could have been generated through a BB- experiment.

*My this drink is sour! *

ERROR STATISTICAL PHILOSOPHER: Yes, I love pure lemon.

BIRNBAUM: Perhaps you’re in want of a gene; never mind.

I’m saying you have to grant that x’ from a fixed sample size experiment E’ could have been generated through a BB-experiment. If you are to interpret your experiment as if you are within the rules of a BB experiment, then x’ is evidentially equivalent to x” (is equivalent to x*). This is premise (1).

ERROR STATISTICAL PHILOSOPHER: But this is just a matter of your definitions, it is an analytical or mathematical result, so long as we grant being within your BB experiment.

BIRNBAUM: True, (1) plays the role of the sufficiency assumption, but one need not even appeal to this, it is just a matter of mathematical equivalence.

By the way, I am focusing just on LP violations, therefore, the outcome, by definition, has an LP pair. In other cases, where there is no LP pair, you just report things as usual.

ERROR STATISTICAL PHILOSOPHER: OK, but p’ still differs from p”; so I still don’t how I’m forced to infer the strong LP which identifies the two. In short, I don’t see the contradiction with my rejecting the strong LP in premise (0). (Also we should come back to the “other cases” at some point….)

BIRNBAUM: Wait! Don’t be so impatient; I’m about to get to step (2). Here, let’s toast to the new year: “To Elbar Grease!”

ERROR STATISTICAL PHILOSOPHER: To Elbar Grease!

BIRNBAUM: So far all of this was step (1).

ERROR STATISTICAL PHILOSOPHER: : Oy, what is step 2?

BIRNBAUM: STEP 2 is this: Surely, you agree, that once you know from which experiment the observed 2-standard deviation difference actually came, you ought to report the p-value corresponding to that experiment. You ought NOT to report the average (p’ + p”)/2 as you were instructed to do in the BB experiment.

This gives us premise (2a):

(2a) outcome x’, once it is known that it came from E’, should NOT be analyzed as in a BB- experiment where p-values are averaged. The report should instead use the sampling distribution of the fixed sample test E’, yielding the p-value, p’ (.05).

ERROR STATISTICAL PHILOSOPHER: So, having first insisted I imagine myself in a Birnbaumized, I mean a BB-experiment, and report an average p-value, I’m now to return to my senses and “condition” in order to get back to the only place I ever wanted to be, i.e., back to where I was to begin with?

BIRNBAUM: Yes, at least if you hold to the weak conditionality principle WCP (of D. R. Cox)—surely you agree to this.

(2b) Likewise, if you knew the 2-standard deviation difference came from E”, then

x” should NOT be deemed evidentially equivalent to x’ (as in the BB experiment), the report should instead use the sampling distribution of the optional stopping test E”. This would yield p-value p’ (~.3).

ERROR STATISTICAL PHILOSOPHER: So, having first insisted I consider myself in a BB-experiment, in which I report the average p-value, I’m now to return to my senses and allow that if I know the result came from optional stopping, E”, I should “condition” on E” and report p”.

BIRNBAUM: Yes. There was no need to repeat the whole spiel.

ERROR STATISTICAL PHILOSOPHER: I just wanted to be clear I understood you.

BIRNBAUM: So you arrive at (2a) and (2b), yes?

ERROR STATISTICAL PHILOSOPHER: OK, but it might be noted that unlike premise (1), premises (2a) and (2b) are not given by definition, they concern an evidential standpoint about how one ought to interpret a result once you know which experiment it came from. In particular, premises (2a) and (2b) say I should condition and use the sampling distribution of the experiment known to have been actually performed, when interpreting the result.

BIRNBAUM: Yes, and isn’t this weak conditionality principle WCP one that you happily accept?

ERROR STATISTICAL PHILOSOPHER: Well the WCP is defined for actual mixtures, where one flipped a coin to determine if E’ or E” is performed, whereas, you’re requiring I consider an imaginary Birnbaum mixture experiment, where the choice of the experiment not performed will vary depending on the outcome that needs an LP pair; and I cannot even determine what this might be until after I’ve observed the result that would violate the LP?

BIRNBAUM: Sure, but you admit that your observed x’ could have come about through a BB-experiment, and that’s all I need. Notice

(1), (2a) and (2b) yield the strong LP!

Outcome x’ from E’ (fixed sample size n) is evidentially equivalent to x” from E” (optional stopping that stops at n).

ERROR STATISTICAL PHILOSOPHER: Clever, but your “proof” is obviously unsound; and before I demonstrate this, notice that the conclusion, were it to follow, asserts p’ = p”, (e.g., .05 = .3!), even though it is unquestioned that p’ is not equal to p”, that is because we must start with an LP violation (premise (0)).

BIRNBAUM: Yes, it is puzzling, but where have I gone wrong?

(The waiter come by and fills their glasses; they are so deeply engrossed in thought they do not even notice him.)

ERROR STATISTICAL PHILOSOPHER: There are many routes to explaining a fallacious argument. Here’s one. What is required for STEP I to hold, is the denial of what’s needed for STEP 2 to hold:

Step 1 requires us to analyze results in accordance with a BB- experiment. If we do so, true enough we get:

premise (1): outcome x’ (in a BB experiment) is evidentially equivalent to outcome x” (in a BB experiment):

That is because in either case, the p-value would be (p’ + p”)/2

Step 2 now insists that we should NOT calculate evidential import as if we were in a BB- experiment. Instead we should consider the experiment from which the data actually came, E’ or E”:

premise (2a): outcome x’ (within in a BB experiment) is/should be evidentially equivalent to x’ from E’ (fixed sample size): its p-value should be p’.

premise (2b): outcome x” (in a BB experiment) is/should be evidentially equivalent to x” from E” (optional stopping that stops at n): its p-value should be p”.

If (1) is true, then (2a) and (2b) must be false!

If (1) is true and we keep fixed the stipulation of a BB experiment (which we must to apply step 2), then (2a) is asserting:

The average p-value (p’ + p”)/2 = p’ which is false.

Likewise if (1) is true, then (2b) is asserting:

the average p-value (p’ + p”)/2 = p” which is false

Alternatively, we can see what goes wrong by realizing:

If (2a) and (2b) are true, then premise (1) must be false.

In short your famous argument requires us to assess evidence in a given experiment in two contradictory ways: as if we are within a BB- experiment (and report the average p-value) and also that we are not, but rather should report the actual p-value.

I can render it as formally valid, but then its premises can never all be true; alternatively, I can get the premises to come out true, but then the conclusion is false—so it is invalid. In no way does it show the frequentist is open to contradiction (by dint of accepting S, WCP, and denying the LP).

BIRNBAUM: Yet some people still think it is a breakthrough (in favor of Bayesianism).

ERROR STATISTICAL PHILOSOPHER: (update 12/31/14) I have a much clearer exposition of what goes wrong in your argument in this published paper from 2010. However, there were still several gaps, and lack of a clear articulation of the WCP. In fact, I’ve come to see that clarifying the entire argument turns on defining the WCP. Have you seen my 2014 paper in *Statistical Science?*

BIRNBAUM: Yes I have seen it, very clever! Your Rejoinder to some of the critics is gutsy, to say the least. Congratulations!

ERROR STATISTICAL PHILOSOPHER: Thanks, but look I *must* ask you something.

BIRNBAUM: I do follow your blog. I might even have a palindrome for your December contest!

ERROR STATISTICAL PHILOSOPHER: Wow! I can’t believe you read my blog, but look I *must* ask you something before you leave this year.

*sudden interruption by the waiter*

WAITER: Who gets the tab?

BIRNBAUM: I do. To Elbar Grease!

ERROR STATISTICAL PHILOSOPHER:** To Elbar Grease! Happy New Year!**

ADD-ONS (12/31/13, 14, 15 &19):

ERROR STATISTICAL PHILOSOPHER: I have one quick question, Professor Birnbaum, and I swear that whatever you say will be just between us, I won’t tell a soul. In your last couple of papers, you suggest you’d discovered the flaw in your argument for the LP. Am I right?

BIRNBAUM: Savage, you know, never got off my case about remaining at “the half-way house” of likelihood, and not going full Bayesian. Then I wrote the review about the Confidence Concept as the one rock on a shifting scene…

ERROR STATISTICAL PHILOSOPHER: Yes, but back to my question, you disappeared before answering last year…I just want to know…

WAITER: We’re closing now; shall I call you a taxicab?

BIRNBAUM: Yes.

ERROR STATISTICAL PHILOSOPHER: ‘Yes’, you discovered the flaw in the argument, or to ‘yes’ to the taxi?

MANAGER: We’re closing now; I’m sorry you must leave.

ERROR STATISTICAL PHILOSOPHER: We’re leaving I just need him to clarify his answer….

*Large group of people bustle past.*

Prof. Birnbaum…? Allan? **Where did he go? **(oy, not again!)

**Link to complete discussion: **

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder).*Statistical Science* 29 (2014), no. 2, 227-266.

[i] Many links on the strong likelihood principle (LP or SLP) and Birnbaum may be found by searching this blog.

[ii] By the way, Ronald Giere gave me numerous original papers of yours. They’re in files in my attic library. Some are in mimeo, others typed…I mean, obviously for that time that’s what they’d be…now of course, oh never mind, sorry.

Happy New Years! And thank you for posting one of my favorite New Year traditions!

Gigi: Thanks for all of your incredible support and help through the year–thanks to you I’m able to keep up even exiled in a remote place like Elba!

I just re-read Birnbaum’s 1977 synthese article.

I still struggle to understand the evidential interpretation of NP methods and in particular the relation to repetitions.

Can you remind me what difference it makes to the attempt to interpret NP evidentially whether the repetitions are real or hypothetical?

Om: Not sure if you mean mine or his. Both both would appeal to hypothetical repetitions, although you could simulate them if you wanted My inferential interpretation is rooted in the Severity Principle For starters, if a method had little or no capability of finding specified flaws in a claim H, then finding none fails to provide good evidence for H. We eschew such gambits s cherry picking, selective reporting, p-hacking because they make it easy to find an impressive looking effect, even if results are common under chance. (spurious p-value). It’s not a matter of a long run high error rate, though you’d get that if you kept it up, it’s a matter of the poor job in the case at hand. Or, since you like Popper, as do I, data count as evidence for H only if the process can be represented as a failed attempt to falsify H. Statistical tests may be used to cash out Popper’s requirement.

Birnbaum’s confidence concept is as far as he got in providing an evidential interpretation: essentially, a method should have a high probability of correctly accepting, and a low probability of incorrectly falsifying, claims. As he left it, it’s too performance oriented, and doesn’t hold in many cases. Pratt discusses it in that Synthese 1977 journal–the one in which Birnbaum presents it. But Birnbaum is right to suggest that some such intuition is the rock in a changing scene. He just didn’t go far enough, in my judgment.

Thanks.

The idea that we shouldn’t use methods that have a high probability of being misleading under repeated use seems generally reasonable – though of course depends on the details of how this is supposed to work in practice.

(There is probably an inherent ‘fragility’ or no-free-lunch component to most general principles – they can depend very sensitively on how they are phrased and implemented).

On the other hand, the idea that we can generally take methods that are well-calibrated in hypothetical repeated trials and use them to draw meaningful inferences in e.g. single real trials is much more problematic to me.

I know you’ve attempted to get towards this idea, and Birnbaum somewhat attempts to do this, but I’m still not really comfortable with it.

Instead, I would be much more confident in results from actually applying a hypothetically reliable method many times and seeing the results properly ‘converge’ to something stable. I’m sure most people would.

Of course you could possibly ‘go meta’ again and call this a single instance of an even more reliable method (i.e. reliable in the sense of hypothetical repetitions of the actual repetitions of a hypothetically reliable method…). The question still seems to me to be what is doing the work here?

The fact that ‘hypothetical repetitions of actual repetitions’ are more convincing than ‘hypothetical repetitions of hypothetical repetitions’ suggests to me (as does ordinary intuition of course!) that the ‘actual’ part is the most important part.

Basically I find myself back to the (fairly common) view that the decision-making/behavioural interpretation of NP methods is conceptually straightforward but not directly relevant to evidential goals in principle (and perhaps to decision-making goals in practice), while evidential concepts are desirable but difficult to formulate (and perhaps especially so within NP theory).

Om: Good long run performance doesn’t suffice. Silly methods can satisfy that. The assessment must be relevant to characterizing the capability of the particular method in relation to unearthing mistaken in a particular inference being entertained. It also must be post-data. If, for example, your method (say with cherry-picking) makes it extremely easy to find some impressive-looking effect, even when it’s spurious, then I deny you’ve provided evidence for a “real effect” in the particular case at hand. That is the severity principle. I aver that we always reason this way (using a cluster of different forms), and that formal statistics only enters (or should) when hypothetical frequencies are relevant to characterizing the capacities of methods to unearth mistaken interpretations of data. These are contexts where stat models offer a repertoire of ways to represent the “mistakes” and where the sampling distributions provide assessments of severity. When this is not the case, more qualitative (and possibly much stronger) methods should be used, generally invoking theory.

‘Good long wrong performance doesn’t suffice’

I agree (in reference to drawing particular conclusions).

This leaves one with the choice of – following Neyman and the ‘inductive behaviour’ long run interpretation essentially eschewing knowledge of the particular case, or making it relevant to the case at hand. As you are well aware, of course.

Some do the former – eg Wasserman basically endorses the behavioural interpretation (see eg ‘All if statistics’). This means he can make any ‘inference’ in the particular case as long as it holds in the long run average sense. He states something like this re confidence intervals – you don’t require repetitions of the same setup, just application of the same method to any setup. Consequence – no guarantees for the particular setup.

I know you want to improve the confidence approach to apply to the particular case but I simply can’t see how to using NP style methods to do it, unless you require a minimum number of actual repetitions of the ‘same’ case. Perhaps you do require this but I can’t see where.

More broadly there seems to me to be some sort of ‘qualitative’ change between n=1 and n=infinity of actual measurements of the same system, where you can claim some validity to the case at hand, but again I’m still not clear on where you require (at least qualitatively) some minimum ‘number’ of repetitions of the same case.

In a sense, I suppose I’m looking for something like a philosophical interpretation of asymptotic methods based on actual repetitions on the same system. I guess a philosophical translation of some form of the law of large numbers.

Are you familiar with Robert Batterman’s philosophical work on asymptotics?

A review of his book-

http://ndpr.nd.edu/news/23089-the-devil-in-the-details-asymptotic-reasoning-in-explanation-reduction-and-emergence/

Something like this set of ideas seems relevant given the large presence of asymptotic arguments in statistics.

Om: I don’t understand why you’re not hearing what I’m saying. I just provided a data dependent use of error probabilities that is just what’s needed for qualifying the warrant of a specific inference. Go slower over what I’ve said.

‘Data dependent’ is pretty ambiguous to me. An observed pvalue is data dependent but I don’t see how it gets you to where you want to get.

Om:never said that sufficed; that would be silly.

Didn’t mean to imply you did, just that ‘post data error prob’ can mean a few things, some of which aren’t sufficient.

I’m just trying to nail down what would be sufficient and how the real vs actual repetition thing comes in.

Om: It’s the form of justification that matters. If you say it’s justified to infer H because I used a ‘method’ that is actually used for all different problems that gets it right k% of all times it’s used, it would be a pure performance appeal. The method could be flipping a coin to decide whether to use a reliable scale or a biased scale, and still get overall k% of the time correct.

Of course.

My point is that NP methods are uniquely vulnerable to such criticisms. There have been various attempts to introduce relevant conditioning to avoid these critiques but the fit of NP methods (naturally long run) with particular conditioning remains somewhat unclear to me.

Also note that the perspective of ‘inferring H’ is very decision-theoretic.

Om: 2 things: First, inferring is the process of arriving at a conclusion based on premises. I wouldn’t call all of logic decision theory, that would make the term nearly vacuous.

Second, thinking about my example, would it matter if the method’s instantiations were actual (meaning?) or hypothetical? Would the problem with them be the same or different? Why?

Re decision theory.

Fair enough.

But NP is still about ‘drawing an inference’ rather than ‘representing’ (‘information’ or ‘evidence’). And the rest of the machinery still seems most naturally presented in decision theoretic language.

You often mention Cox, but he frequently refers to NP as a distinct, largely decision-theoretic approach (most of his books have a section on NP where he says ‘we don’t take the NP route here’).

And yes I know you wrote some things together with Cox. And yes, I think it could be possible to give confidence arguments an evidential interpretation (hence why I’m asking) but I just don’t see how the NP setup/language helps clarify this. For example I can’t think of a time/place where Cox mentions power, especially not with any enthusiasm. Why/why not?

Re conditioning, actual and hypothetical.

Yes here conditioning makes the difference here, of course. So both actual and hypothetical have a problem – actual repetitions of an irrelevant system shouldn’t matter. But again, appropriate conditioning is typically more problematic for the NP approach than any other.

And once you have a fixed number of actual repetitions of an appropriately conditioned system, what extra work do hypothetical repetitions do? I’m not denying it’s possible to provide justification, I’m just wondering what that looks like.

Om: Yes, “drawing inferences” is just what we do in logic, and it’s why Fisher (1955) could rightly say about Neyman’s calling it an “activity” or “action” nothing more than his particular preferred use of words.

Cox uses power for planning and didn’t start to use post-data or (what I call) “attained power” until our (2006) paper. On the other hand, as I pointed out to him, he had already said somewhere that sensitivity is measured by treating the P-value as a statistic and computing Pr(P mu’ (in a 1-sided test). So it’s really nothing new for the Fisherian logic.

On your other points, please see:

Click to access ch%207%20cox%20&%20mayo.pdf

OK.

Cox has written a lot since 2006, including comments on foundations, but still hasn’t mentioned power or severity etc much at all. Why not?

I guess I’ll wait til your book and see if that convinces me.

BTW – To use Fisher (1955) in support of your position is odd to me. What about the rest of the sentences? Surely this counts as ‘cherry picking evidence’?!

RE logic – a real pet peeve of mine is the tendency to try to formulate areas of study like science, mathematics, statistics etc in purely or largely logical terms.

Representing things, coming up with the proper characterisations/definitions etc is at least as important as ‘drawing inferences’.

I find a focus on logical language has similar limitations to a focus on decision theoretic language, though of course they have their place in appropriate contexts.

This might be more of a personal preference but partially explains my distaste for the NP aesthetic.

Fisher’s simple:

‘we may learn…at what level it would have been doubtful…doing this we have a genuine measure of confidence’

is fair enough to me (and is basically what Cox states in eg his principles book).

But this also comes after a lot of denigration of ‘errors of the second kind’ etc etc. So one is still open to this being an ‘isolated’ or fluke result.

He then says

‘What we look forward to in science is further data’.

And this seems to me to be the best way to address this latter concern.

Om: It’s in his 2006 book, I figured that was good enough along with our papers. He might think it’s mainly for interpreting the case of what he calls “a full family of models”–as with N-P tests, but actually it’s applicable to all the members of his taxonomy (as I show in my book).

I don’t know what comments on foundations by him you mean.

I thought I alluded to Fisher 1955 in connection with the use of the term “action” for making an inference, not in support of my view.

Could you provide the relevant quote from his 2006 book?

In my reading this is another case where he explicitly states he has little use for NP methodology.