I do allow for the possibility that just as statistics arose out of the hot air that was eugenics, some useful methods/theory will probably yet arise from this big data stuff.

]]>Within a statistical model, all of the evidence in the data relevant to a parameter of interest is contained in the relevant likelihood function.

What is the principle that you would say “as we know it”?

]]>Given a null hypothesis and a test statistic T, I’d say that the test statistic implicitly defines a class of alternatives, namely those distributions for which the probability to reject the null using T is larger than for the null itself. So one could be Neyman-Pearsonian and start from an alternative to derive a T, but also one could choose T first, which then implicitly defines the “direction” against which the null hypothesis is tested, or in other words the alternative. So for me the concept of an alternative is very important for understanding what a test does, even though it doesn’t have to be used for Neyman-Pearson optimality considerations.

I wonder whether Fisher would have agreed with me on this. He may object against my affection for the concept of an alternative, but it may be that he really had more of an issue with power optimization, which I haven’t involved here, and not so much with the plain idea that any test tests against a certain specific “alternative direction” and it is useful to think about what that might be in order to choose an appropriate test (this, I’d think, is Stephen’s concern when thinking about odds ratios vs. risk differences – they point in slightly different alternative directions one of which may be less appropriate than the other).

In philosophical discussions, maybe. But no way in practice; at least one factor in this is, I think, use of MCMC methods to do the calculations. These handle posteriors from smooth priors with relative ease, but with spiked priors, of the sort that lead naturally to Bayes Factors, implementation is far trickier. Gelman et al’s book on Bayesian Data Analysis largely ignores Bayes Factors.

]]>While I accept most of your discussion, I’m still not sure whether p-value concepts (or N-P concepts) have any advantages for estimation (though of course they can be used) over other methods.

The idea of using p-value-style ideas for model checking seems more acceptable, though (again there may be reasonable alternative strategies however).

]]>First, I think Stephen put it best here http://errorstatistics.com/2015/02/19/stephen-senn-fishers-alternative-to-the-alternative-2/#comment-117798 – no one has convinced anyone else that they know how to deal with nuisance parameters adequately.

No end to vagueness and confusion about LP but most everyone I think considers it to involve the the full likelihood function. When they say sufficiency, as in sufficient statistics, they mean they can generate the full likelihood function from those statistics (its the definition of sufficiency these days).

To get outside deductive logic and to really get at induction (which should be what is obviously wrong with most advertised versions of Bayesian statistics.) one needs to get something more than the posterior. Induction needs to be a process of getting less wrong about representations not just less wrong about parameter values in a given representation. Mayo may wish to call this error statistical but I would prefer something more like pragmatistic induction or induction of induction.

Mike Evans has some material you might find interesting here http://www.utstat.toronto.edu/mikevans/ and in for LP in particular http://projecteuclid.org/euclid.ejs/1382706342 .

A very high level take on Mayo and Mike’s approach to LP might be usefully characterized as Mayo – “The implications are silly, the assumptions must be wrong”and Mike – “The fully explicated assumptions are silly, therefor any implication will be silly”.

Enjoy the intellectual adventure.

]]>I just meant that I haven’t carried out detailed calculations comparing severity to other things myself. Being lazy, basically.

I agree that bayes factors seem to be currently popular, but I think that this is a bad idea for the reasons described. In terms of visible bayesians – Gelman has openly argued against bayes factors based on what seem to be similar intutions about these issues. Robert (x’ian) seems to be moving away from bayes factors from a cursory reading of his blog, too, towards more posterior-style summaries of tests.

I disagree about lack of bayesian foundations for non bayes factor measures – I think bayes factors are only a (generally speaking) poor approximation to what would be ‘proper’ Bayesian inference using a full posterior.

]]>For an underlying model that captures the ‘true process’, it then seems that a p-value is basically a check of the ‘mesurement error’ part.

One alternative idea that comes to mind is to include the error distribution parameters as unknowns in the model (e.g. the standard deviation for a simple normal, zero mean case) and estimate them based on the data using the standard Bayes approach for obtaining the marginals of all unknowns.

If one knows the ‘true’ measurement error then one can then compare the estimated error distribution with the known distribution (say visually at first – could also use other metrics) to see if they are consistent.

If the ‘true process’ is mischaracterised as well then there could e.g. be unusual visible correlations between the measurement error estimate and the process model estimate, indicating that they are trying to compensate for one another. Having the various marginals would be very useful for this. This again is assessed as part of model checking.

So in general the p-value idea makes sense to me as part of ‘self-consistency’ checks. But I also think from a practical (maybe theoretical?) point of view it helps to separate the phases into

– assume a model and do the estimation

– check for self-consistency of model assumptions

This amounts to the error statistics approach as emphasised by Spanos’, and the Box/Gelman etc approach.

p-value style reasoning seems most useful/important as part of the ‘self-consistency’ or ‘inductive premise’ part, but likelihood/bayes does seem useful for the estimation part in complex cases. Box bascially said all this years ago. Frequentist p-values aren’t the only way of carrying out consistency checks, however – many Bayesian or information-theoretic ideas have been proposed as well.

The use of p-values or other sample-space-dependent reasoning in the *estimation* phase (*in addition to* in the self-consistency phase) appears to amount to adding ‘sample-space/data noise’ into a hierachical model – why not just include it in a hierarchical bayes model?

]]>On bayes factors. I see them as more popular than posteriors. when they’re given up, it tends to mean the person is not doing anything reecognizably Bayesian, nothing with Bayesian foundations. ]]>

Of course – obviously I’m not up on all the history and subtleties.

A few points, though:

– Based solely on these quotes Fisher seems to be putting forward the view that – given/conditional on/assuming the validity of a model – the subsequent estimation can/should be carried out based on the whole likelihood function.

– Whether this means extra criteria/operations – e.g. some integration/differentiation or whatever over the sample space – should be used or not is not clear, again based only on these quotes. Some people (e.g. Fraser) seem to have looked at how one can use continuity in the model relation between parameter changes and changes in observables to include additional local sample-space-style information in likelihood-style estimation, but this is a bit beyond me for now. It may relate to Fisher’s fiducial ideas?

– How to combine/compare likelihoods from different experiments/different models may be subtle (Keith’s thesis appears to look at this question) because of the dependence of the likelihood-based estimation on the validity of the model.

– However, the ‘Law of Likelihood’ can be considered misguided from a purely Likelihoodist view even in a pure estimation context and without any frequentist considerations. In particular because it encourages what amounts to simple point-against-point or ‘first derivative’ style comparisons using the likelihood.

– One could supplement the ‘Law’ with another similarly ‘local’ concept such as consideration of the second derivative of the log-likelihood as a measure of ‘variance’ or ‘information’ or even ‘confidence’ at a point. This would (?) naturally lead to the construction of ‘likelihood intervals’, without any frequentist considerations but which converge to classical confidence intervals under certain assumptions.

– In general, however, if one is following the Likelihoodist paradigm it is probably better to consider the whole likelihood function (again, restricting attention to estimation/a valid model) when considering how the ‘evidence’ is distributed over parameter/hypothesis space, rather than any additional/simplistic ‘Law of likelihood’ or similar.

– My personal reading/translation of the general statement of the severity principle, putting aside its specific frequentist implementation (e.g. any operations over sample space), is that it encourages (locally-speaking) considering the sort of ‘higher-order derivatives’ or ‘higher-order counterfactuals’ in parameter/hypothesis space mentioned above. The more global implementation would presumably involve some sort of integral of the ‘fit-measure’ over the alternative hypotheses, and this seems to be the case (I haven’t looked at any detailed calculations).

– These ideas apply similarly to Bayes factors – they are potentially bad for essentially exactly the same reasons that point-against-point likelihood ratios are bad. That is, they ignore the ‘full’ information in the posterior. E.g., speaking locally, they have no consideration of ‘curvature’ or ‘variance-like’ measures (or any other higher-order derivatives). This can again be corrected based purely on Bayesian or Likelihoodist-style considerations – use the whole posterior/likelihood instead!

– Again I don’t know much, but what what I’ve seen it seems that many Bayesians are advocating against Bayes factors for basically the above reasons, and moving towards replacing uses of Bayes factors with consideration of full posterior distributions. Again, this need not (and doesn’t seem to) be motivated for any frequentist reasons, rather a desire to ‘use the full posterior information’.

]]>