A. Spanos Probability/Statistics Lecture Notes 7: An Introduction to Bayesian Inference (4/10/14)
A. Spanos Probability/Statistics Lecture Notes 7: An Introduction to Bayesian Inference (4/10/14)
Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited.
Excerpts and links may be used, provided that full and clear credit is given to Deborah G. Mayo and Error Statistics Philosophy with appropriate and specific direction to the original content.
If the data are a typical realization from a particular stochastic process, they will lie in the typical set with high (sampling) probability. Therefore, the set of models ruled out with high severity is exactly that for which the sample is not in the typical set. Does anything more need to be said about M-S testing?
Corey: if one could operationalize the joint distribution of all possible statistical models that could have given rise to this particular data, then your idea will be in the right direction. However, no such joint distribution can be operationalized and thus M-S testing needs to be applied in a piece-meal fashion to different subsets of model assumptions, cleverly grouped in ways that will secure a reliable diagnosis of what systematic statistical information in the data the statistical model in question does not account for.
Spanos: “The joint distribution of a set of statistical models” — that’s an oddly Bayesian turn of phrase…
Fact remains, for any *particular* model under consideration, you can identify the typical set and then see if the data lie within it.
Corey: the first thing one needs to know about M-S testing is that it constitutes testing outside the boundaries of the “particular” model in question. The null is that model and the alternative is the set of all other possible models that could have generated the particular data.
Spanos: I am indeed aware of how M-S testing is aimed at securing the auxiliary assumptions upon which the primary inference rests. But within any given family of models, one can examine the location of the data relative to the typical set for any given parameter value. If the observed data is not in the typical corresponding to any possible parameter value, then by definition the entire family is inadequate according to the M-S testing criterion that the data must appear to be a typical realization of the posited stochastic process. To put it another way, the hypothesis “no member of the family is the ‘true’ distribution” has passed a severe test.
This presentation of the notion of admissibility is far from charitable. The motivating idea behind admissibility is that it’s desirable to switch from an inadmissible estimator to one that dominates it — and provided that the loss function is deemed relevant, that seems hard to argue with. I find that the presentation is not clear on the fact that admissibility was only ever meant to filter out certain “bad” estimators from consideration, and was never intended to be a *sufficient* condition for a “good” estimator. (This point vitiates the “crystal ball estimator” argument; your arguments questioning the relevance of canned loss functions like squared-error-loss — or indeed, any loss function — are better, albeit not dispositive.)
Also, since the notion of admissibility only explicitly references the risk function, i.e., a sample-space expectation, you might want to make it clear why Bayesians would care about it. It’s far from obvious that the risk-Pareto-optimal set of estimators is risk-equivalent to the set of (generalized) Bayes estimators.
Corey: I was very careful in the exposition to refer to admissibility as a “minimal” property, which coincides with your claim that “admissibility was only ever meant to filter out certain “bad” estimators from consideration”! My argument is that, in addition to using the wrong definition of the MSE, admissibility should not be used as a minimal property for frequentist point estimators because it is bad at filtering out “bad” estimators. My example illustrates that point estimators do not come any worse than the crystalball estimator (which ignores the data completely and is also inconsistent), but admissibility did not “filter it out”! In that sense, admissibility is not a pertinent criterion for filtering out “bad” estimators because it allowed in the worst possible estimator but kicked out many consistent estimators. Indeed, consistency, not admissibility, is the pertinent minimal property for frequentist point estimators. As you know, there are many examples of admissible but inconsistent estimators in the statistics literature, including the famous James-Stein estimator.
Aris: I deny that admissibility is worse at filtering out “bad” estimators than consistency: consistency is a tail event, so there are literally an infinity of consistent estimators that are terrible for any practical sample size. My point: these kinds of objections aren’t worth crediting because they rest on uncharitable technicalities.
You seem to be overlooking the idea that we can ask for more than one “minimal” property. (The word “desideratum” does have a plural form!) Given two consistent estimators, one of which dominates the other with respect to “wrongly” defined MSE, are you *really* going to assert that there’s no reason to prefer the dominant one?
(There *is* literally an infinity. Yeesh.)
Aris: I’m not following your logic. If a criterion is meant to be necessary but not sufficient, the fact that it does not invalidate the crystal ball estimator does not make it not pertinent.
There’s no free lunch – unbiasedness comes at a cost with respect to the bias-variance tradeoff. By elevating unbiasedness and neglecting risk, it’s arguable that the emphasis on unbiased estimation has literally done more harm than good in statistical practice.
I think Larry W’s adage applies – “if we elevate lessons from toy examples into grand principles we will be led astray.”
Aris: A bit late, but I also want to correct the claim that the James-Stein estimator is inconsistent. The consistency or lack thereof of the estimator is a property of the way in which one assumes that the amount of data grows without bound. For concreteness, suppose that the data are an n-by-m matrix with the elements of each column sharing a mean parameter. Your claim of inconsistency rests on fixing n = 1 and letting m grow without bound, but of course no estimator is consistent under this assumption. If you allow n to grow without bound too, then consistency is recovered.