Agree, but want to point out simple two stage simulation can be used here instead of MCMC.

1. Draw the unknown parameter from prior distribution

2. Draw sample from data generating distribution with that parameter value from 1.

Only keep draws where simulated P-value = =0.047.

The kept distribution will be a sample from the posterior.

It is called ABC http://en.wikipedia.org/wiki/Approximate_Bayesian_computation and I have used it with some limited success in teaching people with limited mathematics.

I have a comment at the bottom giving details and references http://magazine.amstat.org/blog/2013/10/01/algebra-and-statistics/

]]>Thank you for restoring my hope that there are still some sane voices out there. To repeat your key points:

“the likelihood under the alternative (hard – because a mixture of likelihoods corresponding to each distinct value under the null).

3) In practice you can’t do this except by coming off the fence. You need a prior distribution for all values of the parameter (including the single value under the null).

4) You get very different values depending on what you assume”

So how do they propose to get the likelihood under the alternative? Can they take it to be the alternative giving max likelihood, as some do? Or must they “come off the fence”?

]]>1) If you want to calculate a Bayesian posterior distribution you should use the exact P-value

2) If proceeding the way you propose you need to calculate the likelihood under the null (easy if a precise hypothesis) and the likelihood under the alternative (hard – because a mixture of likelihoods corresponding to each distinct value under the null).

3) In practice you can’t do this except by coming off the fence. You need a prior distribution for all values of the parameter (including the single value under the null).

4) You get very different values depending on what you assume

5) If doing a simulation, yes you do need to retain only those that are (to a standard of precision to be determined) equal to P=0.047

6) This is not a good way to do it. Have you considered MCMC? ]]>

I was asking your opinion about the following case. One wishes to interpret the result of a single experiment that gives P = 0.047. To help with this, you simulate 100k t tests. Do you then look only at those that come out with, say, 0.045 < P < 0.05, or do you look at all results which give P < 0.05, when assessing things like the fraction of false positives.

]]>Although I certainly wouldn’t want to defend Schnall’s statistical inference –it falls down on error statistical grounds– it’s puzzling that this author sees no reason to defend ignoring error probabilities and using a spiked prior of .5. I don’t think this will help social psych to become more replicable.

We discussed the “crisis of replicability” in psych recently: http://errorstatistics.com/2014/06/30/some-ironies-in-the-replication-crisis-in-social-psychology-1st-installment/

]]>“Some statisticians would say that, once you have

observed, say, P = 0.047, that is part of the data so we

should not include the or less than bit. That is

indisputable if we are trying to interpret the meaning of

a single test that comes out with P = 0.047. To

interpret this we need to see what happens in an

imaginary series of experiments that all come out with

P near to 0.05″

That’s easily investigated by simulated t tests

( page 8 http://arxiv.org/abs/1407.5296 )

Now for me the question becomes how to implement this approach in problems that come up for me day-to-day. While they don’t have a monopoly, Bayesian-oriented statisticians do seem to be taking the lead in pursuing and popularising new tools to tackle things like inference for differential equations or individual-based models and ‘uncertainty quantification’ thereof. And given one can view Bayesian inference as one of constructing inverse functions through regularisation procedures it then becomes much less objectionable in principle.

As C. Glymour says in your and Spanos’ book –

‘Bayesian statistics is one thing (often a useful thing, but not germane to my topic); Bayesian epistemology is something else.’

I would have to be very stubborn (even more than I am!) to refuse to use practical tools they provide, if applicable.

As you say, though, Duhemian problems are important. Some further questions for me then become

- Can Bayesian statisticians handle their Duhemian problems in practice, e.g. by breaking from Bayesian epistemology if they need to?

- Can Frequentist statisticians handle their inverse problems and Duhemian problems in practice as well as Bayesians can?

- If yes to both, how important is it to choose between them (again, in practice) and are there other choices to make instead?

Probably my answers would be yes, yes and not much/yes (all qualified)…on the other hand I can see how it would be interesting philosophically, and maybe for the development of new approaches, to understand what’s going on once everyone starts breaking from their purported principles. I would definitely be interested to read a philosophical account anyway.

*As a strange aside I came to the rough form of my view in the previous comment while trying to understand when to use proof by contradiction in mathematics. Initially I didn’t like it, but I came to understand it as a method for solving inverse-style problems, that is, for reconciling the directions of desired inference and available calculation when these don’t naturally align (and when constructing an inverse is too hard). I also like the quote from G. Polya:

‘Both “reductio ad absurdum” and indirect proof are effective tools of discovery which present themselves naturally to an intent mind. Nevertheless, they are disliked by a few philosophers and many beginners; satirical people and tricky politicians do not appeal to everybody.’

]]>Regarding philosophical issues, inference and inverse problems:

I’ve replied below under one of Stephen’s comments as I believe he’s captured the way I think about the general issues of inference quite well. Also, my only other comment on this blog, from a little while ago, was actually on that last Popper post you linked! I should say that, despite some of my own quirky views which may disagree with yours (see below!), I do buy many of the points you’ve raised here and in your books and you’ve certainly helped my understanding of Popper, Peirce, Duhem etc. I’ll have to ask some more questions when I get a chance. For now I’ve just left my own rambling opinion, as is standard on the internet.

Regarding overuse of p-values:

I’m no expert but it seems there are some good, influential voices such as Wasserman, Gelman etc (encouragingly, I can think of quite a few more!) who are defending appropriate use of p-values while also encouraging other things like exploratory data analysis, increased modelling and new ways of representing data as complementary (or more important, often) tools. While more non-statisticians entering the ‘data analysis’ field does bring some dangers (which should certainly be pointed out by the likes of yourself and Stephen) it also appears to bring an increased desire for visualisation and informative representations of data.

So, having more of a ‘feel’ for the data and using more representative models seems, I think, to be a nice way of balancing out too much ‘blind’ testing and use of strawman hypotheses.

]]>We don’t assign degrees of belief to the auxiliaries (to solve Duhem)—they are either falsified directly, or, much more cleverly, we find a way to distinguish their effects. (As in the example I often mention in relation to tests of GTR: a mirror distortion does not look like a deflection effect (or a shadow effect, or a corona effect, etc.); or in cooking, “too much salt” error is distinct from “too much water”. Thus, we can rule out many by distinguishing their error properties (usually through deliberate probes), and what we cannot distinguish, we usefully report, e.g., we cannot distinguish gravitational waves from stellar dust, or however that recent case went. Of course, experimental planning can do a great deal to control these sources ahead of time, e.g., via randomization.

]]>This is precisely how I see it! Give or take a misinterpretation on my behalf…

My interpretation:

Both are attempting to solve the ‘inverse problem’ of going from data to model, i.e. inference. This problem is inverse with respect to the ‘forward problem’ of going from model to data, i.e. standard probability calculations. Funnily enough Wasserman actually has this as Figure 1. in ‘All of Statistics’.

There are (broadly speaking) two ways of solving an inverse problem – actually attempt to construct the inverse operator to your forward operator, giving you an `inverse probability’ calculation (Bayes) for your parameters, or repeatedly solve the forward problem for different parameter values and reason by contradiction (a la Popper, Fisher) to eliminate values which give mismatches.

Confusingly, as Stephen says, the Bayesian is attempting to tackle the inverse problem ‘directly’ in the sense of actually constructing the inverse mapping; the frequentist is reasoning ‘indirectly’ in the same sense that proof by contradiction is indirect reasoning.

This is also why the Bayesian requires a ‘prior’ or ‘regularisation’ – the inverse function is in general non-unique or non-existent without further assumptions. There are many problems where introducing these assumptions directly in order to construct the inverse doesn’t seem to be a terrible idea; however I do think that the most general approach of the two is the ‘proof by contradiction’ method, again essentially because of the same reasons that Popper gave when he discussed the problem of induction. Combinations of the two approaches don’t seem that unnatural in practice, however, since mathematicians and scientists frequently employ both direct and indirect arguments and calculations in their work.

So to re-emphasise: the Bayesian solution to the inverse problem (inference) is direct/deductive since they are trying to directly construct the inverse function (an inverse probability calculation for parameters given data), while the Frequentist approach is indirect/analogous to proof by contradiction since they retain the forward calculation direction (probability calculations given parameters) and try to ‘learn from error’.

Which finally brings us to the extra wrinkle that Mayo introduces to tackle the whole Duhem issue (for the Frequentist case) – actual proof by contradiction must be replaced by a sort of probabilistic contradiction. In this sense the proof by contradiction then becomes an inductive move, further complicating everything!

My two cents.

]]>For Meng see http://andrewgelman.com/wp-content/uploads/2014/06/2014_MeNiRe.pdf

(which has references to Evans)

First, I think that you are confusing two things

1) P-values

2) significance

If your point is that you don’t like the common use of P < 0.05 as significant then use a different threshold yourself for significance, better still, avoid the label. Failing that, use a different system altogether but please explain what. As I have pointed out before, in drug regulation the type I error for registration is set much lower (arguably, at 1/1600 since two phase III trials are required to be significant).

Second, whether or not you like this, model selection methods are commonly in use that are less stringent than P<0.05. I invite you to check out AIC and BIC and I would be interested to know in your own work using maximum likelihood whether you have ever had to choose between simpler and more complex models and if so how.

Third, I want to make it quite clear that I reject entirely this statement of yours "If, as I think is true, that you accept that a screening test that gives 50% false positives is not going to be helpful. " Such a test, depending on circumstance could be extremely useful. For example, if we had a simple test that if it labeled someone as being about to develop Ebola was right 50% of the time this could be extremely useful in disease control. There are many other circumstances one could think of and there simply is no such general rule defensible in terms of practical decision making.

Fourth, for reasons I have given before, the 'crisis of replication' cannot be solved by changing the threshold for 'significance'.

]]>I’ve also explained my position further.

As I say, the numbers aren’t the problem, it’s the fact that a “hypothesis” being true can’t shift its meaning within a given computation. (At least not without falling into the ‘fallacy of probabilistic instantiation’).

A couple of relevant posts:

http://errorstatistics.com/2012/04/28/3671/

Related papers:

Mayo, D. G (1997a), “Response to Howson and Laudan,” Philosophy of Science 64: 323-333.

http://www.phil.vt.edu/dmayo/personal_website/(1997) Response to Howson and Laudan.pdf

Mayo, D. G. (1997b), “Error Statistics and Learning from Error: Making a Virtue of Necessity,” in L. Darden (ed.) Supplemental Issue PSA 1996: Symposia Papers, Philosophy of Science 64, S195-S212.

http://www.phil.vt.edu/dmayo/personal_website/(1997) Error Statistics and Learning from Error Making a Virtue of Necessity.pdf