High sample size is good, but too high can yield pathologically powerful tests. Robustness is good, but not the coarseness of very low assumptions; we want to find things out but not trivial truths. To me, the idea of very high power goes against the Popperian grain of wanting to block spurious effects and ad hoc saves. Did your teacher say the Bayesians have no trade-offs?

By the way, I deliberately invented severity in such a way that it would always be good (to have passed a severe test). When I first started I didn’t define it that way. Everything else can change, but if you assess severity correctly, swapping out hypotheses as needed, it will always be good. ]]>

Actually, the setup was presented to me when I was a PhD student by a Bayesian, who knew the right answer, and who used it in order to demonstrate that the frequentists use misleading terminology. (He didn’t win me over, but he had some kind of point.)

]]>Severity replaces the pre-designated cut-off cα with the observed d0. Thus we obtained the same (but a more custom-tailored) result remaining in the Fisherian tribe, as seen in frequentist Evidential Principle FEV(ii) (in Mayo and Cox 2010).

*And don't get me started on shpower analysis (though I'll post on this soon).

some of the notation is garbled in the initial comment, I’ll see what this looks like.

]]>C. If the test’s power to detect an alternative µ’ to µ is high, then a statistically nonsignificant x is good evidence for µ (or good evidence favoring µ over µ’).

But (as you know) it is easy to construct counterexamples in which P>0.05 yet by all common evidence measures such as likelihood ratios or P-value functions, the evidence favors µ’ over µ.

How would you describe these high-power fallacies in the “tribal” terms you used above? ]]>

If you think of Bayesian analysis in terms of having a representation for how Nature determined unknown parameter(s) as a probability model that generated the values of those parameters and then a probability model as a representation for generating the observations (data generating model) given those parameters values – you will notice they have to build two rather ambitious representations to say the least.

This is standard Bayes but with a proper prior that is supposed to reflect background information (educated guesses) on how such parameter values would be generated by Nature (and is in no way a literal model!)

With randomization to say two groups, one treated and one not, if the treatment has no effect (and the trial is perfectly run) the representations can be exactly the same for both groups. Now additionally you will have to add representations/(prior and data model) for the unknown treatment effect in the treatment group – but stick with no effect for now. Then as things “cancel” out in the comparison (what you are most interested in), it turn out the representations can be wrong in lots of ways that don’t matter much.

Without randomization, the representations have to be different and reflect pre-existing group differences and there is less tolerance for things that are wrong. An example of something that can be wrong is missing a pre-existing difference that does not effect the outcome but those are things we usually don’t know.

Now the representations/(prior and data model) for the unknown treatment effect might be very wrong (e.g. an additive effect when its non-additive or varies) but this is also true in a frequentist approach – the 95% CI coverage usually is only close to true when there is no effect. In fact that is one of my diagnostic checks of whether a Bio-statistician actually understands randomized clinical trails – if they think the CI will have 95% coverage for the true treatment effect.

Keith O’Rourke

]]>Here’s the reference, or one of them, I had in mind:

In his attempt to give a Bayesian justification, Larry Wasserman wrote on his blog: “Without randomization, we can indeed derive a posterior for but it is highly sensitive to the prior. This is just a restatement of the non-identifiability of . With randomization, the posterior is much less sensitive to the prior.”

https://normaldeviate.wordpress.com/2013/06/09/the-value-of-adding-randomness/

]]>> understanding enough of what he [Peirce] meant to dig for gold in his work.

Probably the best strategy.

> the animation?

I was just thinking of what to do, G. Cumming has already done a fair bit along these lines http://andrewgelman.com/2015/07/21/a-bad-definition-of-statistical-significance-from-the-u-s-department-of-health-and-human-services-effective-health-care-program/?replytocom=229520#respond

I don’t like the “bad hammer, good screw driver language” but they are worth looking at.

Keith O’Rourke

]]>If this is not so, we can consider two different cases.

1) The Bayesian has to analyse data that have been collected by somebody else

2) The Bayesian has to analyse data that have been collected by themselves

In the first case I think that the value should be obvious. If randomisation has been employed and if the Bayesian knows this then a) it is not necessary for him or her to make the mind of the experimenter part of the model and b) even although randomisation is not always perfectly optimal it is very rarely bad whereas lots of non-random allocations are terrible. For example, it has been suggested by a Bayesian philosopher that just as good would be to label the two treatments in a trial A and B and get the patient to choose. But any true Bayesian would know that unless one has a perfect prior spike that the probability that patients will choose A is exactly 0.5, such a scheme is less efficient than randomisation.

So having to allow for possible stupid behaviour on the part of others is the price you pay for analysing non-randomised experiments.

In the second case it is less obvious. The danger is however that your inferential superego has its noble modelling intentions undermined by its experimenting id. One might be in two minds as to whether this is really a problem but then one might really be in two minds, in which case it is.

]]>The arguments I’ve heard (for Bayesians defending randomization) have to do with controlling bias in priors or being more convincing in a group, as opposed to in private research. Clearly, its role in validating significance tests isn’t high on their list, so anyway, I was hoping you could clarify the Bayesian justification, at least among the “prominent exceptions” you have in mind. ]]>

Here are some references (most of which you undoubtedly already know), off the top of my head:

First, of course, there’s Fisher in DOE, which is of course all about testing (“every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis”, and so on).

Thomas Cook and David DeMets have a fabulous book from 2008 called Introduction to Statistical Methods in Clinical Trials, in the Preface for which they lay out their 3 underlying principles for randomized trials. Their second principle is that “RCTs are primarily hypothesis testing instruments. While inference beyond simple tests of the [study] hypotheses is clearly essential for a complete understanding of the results, we note that virtually all design features of an RCT are formulated with hypothesis testing in mind. … Even in the simplest situations, however, estimation of a ‘treatment effect’ is inherently model-based, dependent on implicit model assumptions, and the most well conducted trials are subject to biases that require that point estimates and confidence intervals be viewed cautiously.” Personally, I completely agree with the quoted text.

Then there’s Oscar Kempthorne’s 1977 paper in Journal of Statistical Planning and Inference, pp. 1-25, called “Why Randomize?” Or his 1979 paper in Sankhya pp. 115-145, called “Sampling Inference, Experimental Inference, and Observational Inference.”

David Freedman has papers on regression models (http://www.stat.berkeley.edu/~census/neyregr.pdf), logistic regression models (http://www.stat.berkeley.edu/~census/neylogit.pdf), and proportional hazards regression models (chapter 11 of his posthumously published book Statistical Models and Causal Inference) in experimental studies.

There’s this paper, which is one of my favorites: http://www.ncbi.nlm.nih.gov/pubmed/?term=groundhog+day+cause+and+effect

There’s this one, which is also a good paper: http://www.ncbi.nlm.nih.gov/pubmed/8220408

Hope these are helpful!

]]>http://errorstatistics.com/2012/07/09/stephen-senn-randomization-ratios-and-rationality-rescuing-the-randomized-clinical-trial-from-its-critics/ ]]>