Nate Silver describes “How we’re forecasting the primaries” using confidence intervals. Never mind that the estimates are a few weeks old, and put entirely to one side any predictions he makes or will make. I’m only interested in this one interpretive portion of the method, as Silver describes it:
In our interactive, you’ll see a bunch of funky-looking curves like the ones below for each candidate; they represent the model’s estimate of the possible distribution of his vote share. The red part of the curve represents a candidate’s 80 percent confidence interval. If the model is calibrated correctly, then he should finish within this range 80 percent of the time, above it 10 percent of the time, and below it 10 percent of the time. (My emphasis.)
OK. But when we look up the link to confidence interval, this seems to fall squarely within (what is correctly described as) the incorrect way to interpret intervals.
How to Interpret Confidence Intervals
Suppose that a 90% confidence interval states that the population mean is greater than 100 and less than 200. How would you interpret this statement?
Some people think this means there is a 90% chance that the population mean falls between 100 and 200. This is incorrect. Like any population parameter, the population mean is a constant, not a random variable. It does not change. The probability that a constant falls within any given range is always 0.00 or 1.00.
The confidence level describes the uncertainty associated with a sampling method. Suppose we used the same sampling method to select different samples and to compute a different interval estimate for each sample. Some interval estimates would include the true population parameter and some would not. A 90% confidence level means that we would expect 90% of the interval estimates to include the population parameter; A 95% confidence level means that 95% of the intervals would include the parameter; and so on.
Everything, with the possible exception of the bold portion, is quite clear: the probability characterizes the performance of the estimation method. What I’m wondering is how Silver’s glossary definition underwrites his claim:
If the model is calibrated correctly , then he should finish within this range 80 percent of the time, above it 10 percent of the time, and below it 10 percent of the time.
“This range” would seem to refer to a particular estimate, but then Nate’s interpretation is that there’s an 80% chance that the population mean falls between the specific lower and upper bound, above it 10 percent of the time, and below it 10 percent of the time. Yet this is what his glossary definition of confidence intervals correctly calls an incorrect construal.
What do you think’s going on?
Even though his construal violates (or appears to violate) his own warning about incorrect interpretations, the bold portion of the definition is equivocal. The bold portion alludes to what I call a “rubbing off” construal of a method’s error probability: If the particular inference—here an interval estimate—is a (relevant) instance of a method that is correct with probability (1 – α), then the (1 – α) “rubs off” on the particular estimate. What’s supposed to “rub off” according to Silver?
Following Silver’s glossary, we might try on the idea that it’s “the degree of uncertainty” (in the interval) that’s rubbing off. A more common construal is in terms of degree of “confidence”. (Should we prefer one to the other? Both are vague and informal.) Neither quite warrants his construal. The severity construal of a confidence level allows it to qualify how well (or poorly) tested various claims are. But this too differs from the illicit probabilistic instantiation Silver reports.
 See, for example, “Duality: Confidence intervals and the severity of tests”.
 The model being calibrated correctly, I take it, refers to the model assumptions being approximately met by the data in the case at hand.