Chance, rational beliefs, decision, uncertainty, probability, error probabilities, truth, random sampling, resampling, opinion, expectations. These are some of the concepts we bandy about by giving various interpretations to mathematical statistics, to statistical theory, and to probabilistic models. But are they real? The question of “ontology” asks about such things, and given the “Ontology and Methodology” conference here at Virginia Tech (May 4, 5), I’d like to get your thoughts (for possible inclusion in a Mayo-Spanos presentation).* Also, please consider attending**.

Interestingly, I noticed the posts that have garnered the most comments have touched on philosophical questions of the nature of entities and processes behind statistical idealizations (e.g.,https://errorstatistics.com/2012/10/18/query/).

1. When an interpretation is supplied for a formal statistical account, its theorems may well turn out to express approximately true claims, and the interpretation may be deemed useful, but this does not mean the concepts give correct descriptions of reality. The interpreted axioms, and inference principles, are chosen to reflect a given philosophy, or set of intended aims: roughly, to use probabilistic ideas (i) to control error probabilities of methods (Neyman-Pearson, Fisher), or (ii) to assign and update degrees of belief, actual or rational (Bayesian). But this does not mean its adherents have to take seriously the realism of all the concepts generated. In fact ,we often (on this blog) see supporters of various stripes of frequentist and Bayesian accounts running far away from taking their accounts literally, even as those interpretations are, or at least were, the basis and motivation for the development of the formal edifice (“we never meant this literally”). But are these caveats on the same order? Or do some threaten the entire edifice of the account?

Starting with the error statistical account, recall Egon Pearson in his “Statistical Concepts in Their Relation to Reality” making it clear to Fisher that the business of controlling erroneous actions in the long run, acceptance sampling in industry and 5-year plans, only arose with Wald, and were never really part of the original Neyman-Pearson tests (declaring that the behaviorist philosophy was Neyman’s, not his). The paper itself may be found here. I was interested to hear (Mayo 2005) Neyman’s arch opponent, Bruno de Finetti, remark (quite correctly) that the expression “inductive behavior…that was for Neyman simply a slogan underlining and explaining the difference between his, the Bayesian and the Fisherian formulations” became with Abraham Wald’s work, “something much more substantial” (de Finetti 1972, 176).

Granted, it has not been obvious to people just how to interpret N-P tests “evidentially “ or “inferentially”—the subject of my work over many years. But there always seemed to me to be enough hints and examples to see what was intended: A statistical hypothesis H assigns probabilities to possible outcomes, and the warrant for accepting H as adequate—for an error statistician– is in terms of how well corroborated H is: how well H has stood up to tests that would have detected flaws in H, at least with very high probability. So the grounds for holding or using H are error statistical. The control and assessment of error probabilities may be used inferentially to determine the capabilities of methods to detect the adequacy/inadequacy of models, and express the extent of the discrepancies that have been identified. We also employ these ideas to detect gambits that make it too easy to find evidence for claims, even if the claims have been subjected to weak tests and biased procedures. A recent post is here.

The account has never professed to supply a unified logic, or any kind of logic for inference. The idea that there was a single rational way to make inferences was ridiculed by Neyman (whose birthday is April 16).

2. Proposed (“we never meant this literally”) withdrawals from the Bayesian interpretations do not seem so innocuous. Perhaps some will say this just shows my bias. Let me grant that the popular idea of interpreting prior probability distributions as non-subjective, in some sense or other, is not so radical (though I’d still want to know how to interpret posteriors and why). But what we usually see now is some blurring of the two: touting the advantage of Bayesian methods because they incorporate background beliefs, while also advertising “conventional” (default, reference, or “objective”) priors as having minimal influence on inference. [1] See “Grace and amen Bayesianism within this deconstruction. Also relevant: Irony and Bad Faith: Deconstructing Bayesians.

Perhaps the most popular view nowadays regards the prior as some kind of uninterpreted mathematical construct, merely serving to get a posterior. These same Bayesians, some of them, advocate “testing” the prior, but this is hard to grasp if we do not know what the priors intend to be, or stand for. Then there are those Bayesians, perhaps they are a radical (but influential) subgroup, who deny the machine of updating by Bayes theorem altogether. In Gelman (2011) (our special topic of RMM):

“Our key departure from the mainstream Bayesian view (as expressed, for example, [in Wikipedia]) is that we do not attempt to assign posterior probabilities to models or to select or average over them using posterior probabilities. Instead, we use predictive checks to compare models to data and use the information thus learned about anomalies to motivate model improvements.” (p. 71).

In Gelman and Robert (2013), we hear that a major source of Bayesian criticism comes from assuming “that Bayesians actually seem to believe their assumptions rather than merely treating them as counters in a mathematical game.” (p. 3) This comes as a surprise to those of us who thought the Bayesians really meant it. So what is the game being played?

[W]e make strong assumptions and use subjective knowledge in order to make inferences and predictions that can be tested by comparing to observed and new data (see Gelman and Shalizi, 2012, or Mayo, 1996 for a similar attitude coming from a non-Bayesian direction). (p. 3)

So maybe some kind of a “non-Bayesian checking of Bayesian models” would offer more a more promising foundation, at least for Gelman’s brand of “Bayesian falsificationism” (Gelman 2011). See my 2013 Comments on Gelman and Shalizi [2]. On the face of it, any inference, whether to the adequacy of a model (for a given purpose), or to a posterior probability, can be said to be warranted just to the extent that the inference has withstood severe testing: one with a high probability of having found flaws were they present. *The ontology matters less than the epistemology.*

Thus, the severity idea, could conceivably illuminate what’s going on with Gelman’s model checking; I find the idea promising, but do not really know what he thinks.

But to pursue such an avenue still requires reckoning with a fundamental issue at the foundations of Bayesian method: the interpretation of and justification for the prior probability distribution. Error statisticians use idealizations, but they are tightly constrained by the need for error probabilities, in a statistical model, to approximate the actual ones, even if only hypothetical, or checked by simulation. We are modeling real processes, not knowledge of processes.

Gelman and Robert (2013) allow:

“that many Bayesians over the years have muddied the waters by describing parameters as random rather than fixed. Once again, for Bayesians as much as for any other statistician, parameters are (typically) fixed but unknown. It is the knowledge about these unknowns that Bayesians model as random” (p. 4).

Bayesians will …assign a probability distribution to a parameter that one could not possibly imagine to have been generated by a random process, parameters such as the coefficient of party identification in a regression on vote choice, or the overdispersion in a network model, or Hubble’s constant in cosmology. There is no inconsistency in this opposition once one realizes that priors are not reflections of a hidden “truth” but rather evaluations of the modeler’s uncertainty about the parameter. (p. 3)

The choice, of course, is not between modeling a “hidden ‘truth’” and modeling “the modeler’s uncertainty”. Actually, in the majority of the examples I have seen, it seems better to imagine the parameter being generated by a random process. On the other hand, “the modeler’s uncertainty about the parameter” is one of the most unclear parts of Bayesian modeling. It is not that we can’t see measuring the degree of evidence, corroboration, severity of test, or the like, that is accorded a claim about a fixed parameter. We can and do. It is just that those measures will not be well represented as posterior or prior probabilities, obeying the probability calculus.

Possibly an idea I once proposed–a variation on a view held by the frequentist Reichenbach– can work (in EGEK, ch. 4 1996). Reichenbach suggested that scientists might eventually be able to assess the relative frequency with which a given type of hypothesis or theory is true. This might provide it a frequentist probability assignment. I don’t see how one could get such a relative frequency (or rather I can see many different reference sets that could be used), nor why knowing such quantities would be useful in appraising the evidence for a *given* hypothesis H. My variation (Chapter 4 Duhem, Kuhn, and Bayes, pp 120-4) is to consider the relative frequency with which evidence of a certain strength, (e.g., passing k tests with increasingly impressive error probabilities) is generated, despite H being false. This is attainable. But that of course take us to an error probabilistic assessment!

Maybe this style of Bayesianism doesn’t need a clear ontology so long as it’s got a clear epistemology. But does it?***

**What do readers think?**

***To see the full list of speakers: “Ontology and Methodology” conference. Actually our presentation will likely take a different tack, but I still want to hear your thoughts.**

****Registration is free, but required, by April 20-25.**

***I should say right off (for those who do not know) that my work is not in metaphysics, but on philosophical problems about inductive-statisical inference , experiment and evidence.My colleague (and co-conference organizer) Ben Jantzen is the “ontology” guy, and the third colleague involved, Lydia Patton, does O & M as well as HPS.

For further references, see those within posts and papers linked here, or search this blog.

De Finetti, B. (1972), Probability, Induction, and Statistics: The Art of Guessing. NY, Wiley.

Gelman, A. (2011). Induction and deduction in Bayesian data analysis. *Rationality, Markets and Morals (RMM) *2, 67–78.

Gelman, A.and C. Shalizi. (Article first published online: 24 Feb 2012). “Philosophy and the Practice of Bayesian statistics (with discussion)”.*British Journal of Mathematical and Statistical Psychology (BJMSP).*

Gelman, A, and Robert, C. (2013). Not only defended but also applied: The perceived absurdity of Bayesian inference.

http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf

Kass and Wasserman, L. (1996). The Selection of Prior Distributions by Formal Rules. *Journal of the American Statistical Association* 91, 1343-1370.

Mayo, D. G. (1996).[EGEK] *Error and the growth of experimental knowledge*. Chicago: University of Chicago Press.

_____ (2005). Evidence as passing severe tests: Highly probable vs. highly probed hypotheses. In P. Achinstein (Ed.), *Scientific Evidence* (pp. 95-127). Baltimore: Johns Hopkins University Press.

_____ (2011). Statistical science and philosophy of science: where do/should they meet in 2011 (and beyond)?” *Rationality, Markets and Morals (RMM)* 2, Special Topic: Statistical Science and Philosophy of Science, 79–102.

_____ (2013). Comments on A. Gelman and C. Shalizi: Philosophy and the practice of Bayesian statistics. *British Journal of Mathematical and Statistical Psychology*, forthcoming.

Mayo, D. and Cox, D. (2010). Frequentist statistics as a theory of inductive inference. In D. Mayo and A. Spanos (Eds.), *Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science *(pp. 247-275). Cambridge: Cambridge University Press. This paper appeared in *The Second Erich L. Lehmann Symposium: Optimality*, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics, 247-275.

Mayo, D. and Spanos, A. (2011). Error statistics. In P. Bandyopadhyay and M. Forster (Volume Eds.); D. M.Gabbay, P. Thagard and J. Woods (General Eds.). *Philosophy of statistics: Handbook of philosophy of science *Vol 7 (pp. 1-46). The Netherlands: Elsevier.

Pearson, E. S. (1955). Statistical concepts in their relation to reality. *Journal of the Royal Statistical SocietyB *17, 204-207.

Senn, S. (2011). You may believe you are a Bayesian but you are probably wrong. *Rationality, Markets and Morals (RMM)* 2, Special Topic: Statistical Science and Philosophy of Science, 48-66.

Deborah, Your discussion seems to me to be question begging, and too remote from the problems scientific examples. present for the questions that are begged You write, e.g., that “a statistical hypothesis assigns probabilities to outcomes…” But what is being assigned? Its not like a color or a weight, or any physical property there is a physical law about that is relevant to the assigned probability (quatnum theory is a long way from most statistical hypotheses). It’s not degree of belief. It’s not limiting frequencies (the statistical theory uses countable additivity most places). Further, look at social science. The statistical hypothesis of any political science claim makes a sample inference when providing p values and standard errors from some set of nations and economies. Where is the distribution of nations and economies these probabilities come from?

Only two views seem to me possible. Spano’s /Popper–probability is a mysterious propensity that swishes through and around physical system. Or my view.

C

To: Clark: I don’t see how the discussion is question-begging; rather, that “a statistical hypothesis assigns probabilities to outcomes…” seems to emphasize the (restricted or limited) scope of statistical methods, at least as they were originally intended. That is, it’s important to accurately draw the boundary between the research questions for which it is appropriate to use statistical methods and the scientific questions/problems for which statistical methods may not be the appropriate tools. Yet, this boundary is rarely talked about, at least from my (limited) exposure to statistics, computer science, and philosophy. Hence, the discussion may be ‘remote’ to current, popular scientific problems; but that is *because* statistical methods have been considerably extended and applied to a variety of research areas *without* much careful thought or reflection on the foundations for such statistical methods. Moreover, I think it could be that many researchers use statistical methods in ways that these methods were not meant to be used: it is not implausible that some researchers have gone too far in extending the scope of statistical methods.

Clark: Hi Clark! Great to hear from you. However, I don’t understand your comment on a few counts. Here’s one: I don’t see any question-begging. We are talking about a statistical hypothesis in a model. By definition it assigns probabilities to values of a random variable, say Y,and data y are used to learn about the Y’s distribution–an idealized representation of an aspect of the data generating process. I’m not sure about your example: “ statistical hypothesis of any political science claim makes a sample inference when providing p values and standard errors from some set of nations and economies”. What is the sample inference about these nations/economies? Well maybe it’s something like: such and such policy decreases income by an amount. I don’t know it’s your example. But if they do make the inference from data y, based on p-values, the null hypothesis would give statistic T(Y) a probability distribution: P(T > t; Ho). Else no p-value. Relative frequencies will do. I didn’t know Spanos believed in propensities, but we do talk about capabilities of tests.

Would you also regard other cases of applied mathematics as distant from science? Here’s fanciful example for you:

https://errorstatistics.com/2013/03/27/higgs-analysis-and-statistical-flukes-part-2/

Clark: Although I’m very hesitant to comment on blogs about issues that require precision and good notation using symbols, let me try to delineate some of the issues you raise very briefly.

First, R. A. Fisher’s metaphor of the data being ‘a sample from a hypothetical (infinite) population’ served in the early stages (1920s and 1930s) of model-based inductive inference did provide a certain intuition for the notion of a ‘representative sample’ (a set of IID random variables). However, that metaphor is inept and misleading for model-based frequentist modeling and inference more generally, for a variety of reasons I discuss elsewhere, including non-IID data. The viewpoint I articulated in my Sythese (2011) paper [it will eventually appear in the next issue] is that the notion of a statistical model Mθ(x) is more appropriately viewed as a stochastic generating mechanism stemming from a particular parameterization of the stochastic process {X(t), t=1,2,…} underlying the data x0; I give credit to Cramer and Neyman. What renders a particular Mθ(x) relevant/appropriate is not whether Spanos or Glymour can think of a population that would render data x0 a representative sample, but whether Mθ(x) would render the data x0 a ‘truly typical realization’ thereof. Moreover, the ‘typicality’ is testable vis-à-vis data x0; do data x0 exhibit the chance regularities assumed by Mθ(x)? If not, the model is inappropriate. The ontology of Mθ(x) is a bit more involved, but I will discuss it at the O&M conference.

Second, in light of the above viewpoint, the distribution of the sample f(x;θ) attributes probabilities to all legitimate events defined via well-behaved (Borel) functions of the sample X=(X1, X2,…, Xn,). These functions include estimators, test statistics and predictors whose sampling distributions are determined by f(x;θ). Hence, in the case of a test statistic d(X), the relevant error probabilities are derived from its sampling distribution that is evaluated under hypothetical reasoning: different hypothesized values of θ. In estimation the relevant sampling distributions are evaluated under factual reasoning: the true state of nature, whatever that happens to be.

Third, I’m not sure how the model-based perspective I articulated in the Synthese paper relates to Popper’s propensity interpretation. The only overlap I can see is that he also assumes some form of a chance set-up. Beyond that, I can see numerous differences.

“What renders a particular Mθ(x) relevant/appropriate is not whether Spanos or Glymour can think of a population that would render data x0 a representative sample, but whether Mθ(x) would render the data x0 a ‘truly typical realization’ thereof. Moreover, the ‘typicality’ is testable vis-à-vis data x0; do data x0 exhibit the chance regularities assumed by Mθ(x)? If not, the model is inappropriate.”

Spanos, I couldn’t agree more. It should be noted though that this exact same reasoning can be used on priors or for the probability of any singular event. I your weight is modeled N(180,10) then the “truly typical realizations” of this model will give you a weight of approximately 180 +/- 15 lbs. If your weight actually lies in this interval then “typicality” is satisfied. Moreover it’s possible to objectively know things like this before learning the true weight, thereby giving a great deal of confidence in the “model”. I’m certain for example that your weight is between 0 and “the weight of the earth” even though I’ve never met you, so your true weight will satisfy “typicality” for the model N(0, weight of the earth). “Typicality” is both testestable and sometimes knowable without knowing the exact value of x0.

Interestingly, whether you do this for a parameter or data x0, there are both inherent objective and subjective aspects to this. It’s objective because it’s testable, but it’s subjective because it’s not unique. There are lots of models for whom x0 will satisfy the “typicality” condition.

More here: http://www.entsophy.net/blog/?p=70 what you call “typicality” I called the “truthfullness condition”.

Entsophy: I appreciate the comment, but “typicality” in selecting Mθ(x) has nothing to do with the particular numbers in data x0, or the particular parameter values. Instead, it refers to the chance regularities the data exhibit. For instance, when I decide that my statistical model is an AR(1): x(t)=a0+a1X(t-1)+u(t), typicality refers to whether the data x0 satisfy the probabilistic assumptions underlying the stochastic process {X(t), t=1,2,…,n,..} the AR(1) is a parameterization of, i.e. whether x0 constitute a typical realization of a Normal, Markov and stationary process.

It’s trivial to say that pretty much every timeseries ever *ISN’T* an AR(1) process of the form you suggested. It may be sufficiently like such a process that certain results you get are sufficiently close to exact that you don’t care about the errors, but it isn’t going to be exactly such a process, unless maybe you’re talking about purely simulated data in a computer, and even then it’s technically probably only a 64 bit floating point approximation thereof, and certainly going to be a < N bit discrete approximation where N is a not very large integer.

In other words, we need "satisfies the probabilistic assumptions" to be much more well defined if we want "typicality" to be defined by that.

Entsophy's point seems to be that ultimately to determine whether something "satisfies the probabilistic assumptions" we will either need a large repeatable sample and a computable accept/reject test, or we will need something rather weaker if we are to work with whatever smaller dataset we actually have.

Entsophy: It’s important to distinguish nonuniqueness with “subjectivity”. Data/evidence do not uniquely determine a full theory in a domain,because what is known at any time is always incomplete. It would be odd to claim that introduces subjectivity.

The nonuniqueness doesn’t introduce subjectivity, but if you choose a uniquely specified class of models which excludes other models which might be part of the nonunique set of models that can explain the data pretty well, then *this choice* introduces subjectivity.

so when someone says that they will model a measurement error as a normal random variable, but it could pretty well be modeled by say a beta random variable between definite fixed bounds, this choice of the normal is subjective.

Slightly off topic, but . . . I think that one of the worst ideas that’s entered into mainstream statistics is the idea of getting uncertainty statements by inverting hypothesis tests. This is an idea that can work in some cases; what bothers me is that people are often trained to believe that this is the most fundamental justification for inference. It gets people all tangled up.

Andrew: Thanks for your comment. I didn’t know people were trained to regard this as the most fundamental justification for inference. Do you mean “confidence distributions”? Or do you favor those? I think if one takes confidence intervals as offering claims about what outcomes would be expected with given probability, assuming various discrepancies, then it can work. This reminds me, Andrew, that you’ve said when a pivotal exists, our approaches would not differ, but I never got clear on what that case looks like for you.

“ “Bayesians will …assign a probability distribution to a parameter that one could not possibly imagine to have been generated by a random process.” ”

Well I think that’s a bit too strong: http://arxiv.org/abs/astro-ph/0302131 ! OTOH, as an inferencing world-tube, unable to self-locate in ‘the’ multiverse and with access only to (some of) the information available on my causal past spacetime subset, I couldn’t agree more that “the ontology matters less than the epistemology”.

Paul: Well clearly the two are interrelated (onto-meth) but it isn’t always clear just how. I do think the different statistical philosophies grow out of some underlying ontological assumptions, at least initially. It might be useful to examine them.

Thanks for the reply! Personally, I’ve found it more or less decisively useful: it’s exactly that sort of examination which leads me to adopt ontological assumptions about myself and my relationship to an external world like those I described in the previous comment. Those assumptions then inform my statistical philosophical preferences (Jaynesian – and psi-epistemic¹ if QM interpretations are included).

¹ http://mattleifer.info/2011/11/20/can-the-quantum-state-be-interpreted-statistically/

Paul: I’ve never understood the attraction to Jaynesian metaphysics.

It is worth noting that George Box who died only a few days ago was a great Bayes-frequentist synthesizer. His 1980 read paper to the Royal Statistical Society outlines his philosophy on the subject. His approach was Bayesian estimation conditional on an assumed model and frequentist model checking. I have some reservations about this mix but I also have to recognise that it seems to work rather well in practice. George Box was not only a great theoretical statistician but also a great practical one. Box and Tiao is years ahead of its time.

Stephen: Box’s call for eclecticism continues to be influential. However, he made some remarks, of direct relevance to our topic, that could mislead. For instance: “the confirmatory stage of an iterative investigation…will typically occupy, perhaps, only the last 5 per cent of the experimental effort. The other 95 per cent—the wondering journey that has finally led to that destination—involves as I have said, many heroic subjective choices (what variables? What levels? What scales?, etc. etc….Since there is no way to avoid these subjective choices…why should we fuss over subjective probability?”(Box 1983, 70)

The move appears to go from the fact that models require lots of discretionary choices, to models are objects of belief, to model appraisal/statistical inference are about subjective probability—or that they might as well be.

Since Clark Glymour commented on this post, I might note that this reminds me of what he calls mistaking “epiphenomena (degrees of belief) for phenomena (content).” (Glymour 2010, 335)

Box, Lenard, and Wu 1983, “Scientific Inference, Data Analysis and Robustness”.

Glymour 2010, “Explanation and Truth” in Error and Inference (Mayo and Spanos eds).

Sorry for being late in this discussion… it’s an interesting issue.

My first reaction is to note, once more, that when discussing such issues it is helpful to distinguish the interpretation of probability from the choice and interpretation of inferential techniques.

As far as I see it, regarding the sampling model, Gelman and many contemporary Bayesians are frequentists when it comes to the interpretation of probability. What this means is discussed by Spanos above, although I don’t completely agree. He wrote:

” What renders a particular Mθ(x) relevant/appropriate is not whether Spanos or Glymour can think of a population that would render data x0 a representative sample, but whether Mθ(x) would render the data x0 a ‘truly typical realization’ thereof.”

I think that the “truly typical” criterion alone won’t do. Every precise outcome of a normal distribution, say, has probability zero and is as such atypical (and on the other hand any set with nonzero probability deemed atypical, i.e., rejection regions of misspecification tests, will happen at some point under the assumed model). What is typical is defined by choice of alternative model and test statistic, and what kind of population we think of is important for this choice.

Back to “Gelman-Bayesianism”, the key problem to me seems to be that if the sampling model has a frequentist interpretation and the parameter prior doesn’t have it, it is not clear how the use of probability calculus can be justified that is based on treating both as probabilities “of the same kind”. A possible way out is to say that the prior is a purely mathematical device that is used to construct a method of inference about the frequentist sampling distribution. One would then have to evaluate the results using criteria that make sense from a frequentist point of view (such as observed prediction quality, or error probabilities; probably involving prior weights of models if some errors seem to be more important/relevant than others), and if the results are good, why not? It wouldn’t allow to interpret the posterior probabilities as proper well defined probabilities, though. Gelman may not care much but I think many Bayesians would.

I think, though, that if one wants to “test the prior”, one can’t really say that “it’s an artificial mathematical device only”.

Probably, such Bayesians would need other, more frequentist, criteria than looking at posterior probabilities for such checks.

Christian: Thanks for your comment, I know this is up your alley. I will write more later. I totally agree with the problem of how the probabilities can “be of the same kind” in Gelman-style Bayesianism. Nor is it enough to be a “dualist” about probabilities, or a pluralist even. If it’s going to be frequencies, then we get to a possible empirical Bayes or Reichenbachian option, with some of the consequences I mention. I do often hear that these priors are equivalent to more data of some kind. How does that work?

Mayo: The equivalence of (some) priors to more data works like this. Suppose we’re (OK, suppose I am) doing an analysis with an “informative” conjugate prior P1 and a data set D1. The hyperparameter space is usually an open subset of Euclidean space, and the boundary of the hyperparameter space corresponds to parameter settings which make the prior distribution improper, but “just barely so” in some sense. If you’ll excuse the extreme abuse of notation, for my posterior distribution P2 = P1 + D1, there will be some just-barely-improper conjugate prior P0 and some data set D0 such that P2 = P0 + (D0 + D1). So, in an entirely Bayesian sense, P1 = P0 + D0. That is, if I had started with prior P0 and augmented data set D0 + D1, I would have obtained exactly the same posterior as I did with my informative prior P1 and actual data set D1, so my prior distribution P1 is in some sense equivalent to the base prior P0 plus the extra data D0.

Corey:Thanks for this sketch relating an (actual) posterior P2 to the posterior that would have resulted with additional (non-actual) data D0 + (non-actual) prior P0. Is D0 more data from the type of experiment from which D1 arose? Even if so, this would not show P2 = D0 + more data (from the type of experiment giving rise to D1). I need to check some of the texts where I read the purported claim, which I don’t have here.

Mayo: Yes, D0 is more of the same kind of data (typically IID from a distribution in the exponential family). You can find that same claim in Wikipedia. (In that article, the base prior P0 corresponds to alpha = beta = 0.)

Why “purported”? What I’ve given is a mathematical identity within the Bayesian framework, and of course it’s meaningless outside that framework. (You’ve left out the base prior in your statement of the claim…)

Christian: Just to comment very briefly on a couple of other points not noted in my earlier remark.

On the “possible way out” I don’t see how the prior can serve as “an artificial mathematical device only” to obtain a frequentist sampling distribution generated by the alleged fixed parameter value(s). I think there’s a mix-up of notions that have never been sorted out. I try (to some extent) in the chapter I’m currently writing/struggling with. But the goal (mine) isn’t to define probability.

I agree with you about distinguishing (a) the interpretation of probability from (b) the choice and interpretation of inferential techniques, although that is an atypical view we hold. (That’s one of the reasons I don’t find the label “frequentist” very useful.) Still, obviously the (a) and (b) are related.

“I don’t see how the prior can serve as “an artificial mathematical device only” to obtain a frequentist sampling distribution generated by the alleged fixed parameter value(s).”

You’re right. What I meant was not to use the prior in some way to more or less directly obtain a frequentist sampling distribution, but rather that the result of a Bayesian method (including the prior) could be interpreted in such a way that it can be evaluated in a frequentist way by evaluating predictions, computing the MSE of a posterior mode estimator, evaluating a decision rule obtained from the method by means of frequentist error probabilities etc.

For example, regarding optimal tests, fixing the type I error probability and then minimising the type II error probability is only one possible optimality criterion (and in many situations it can’t be optimised uniformly). Minimising errors averaged (in some hopefully well-motivated way) over the parameter space is a legitimate criterion, too.

One aspects of the ontology of statistics that no one seems to have mentioned so far is the extent to which postulating a parametric statistical model commits one to the existence of “real properties” that the parameters represent. I doubt most practical statisticians think about this, but it is an old issue, one that in part drove de Finetti to try to eliminate reference to parameters in statistical prediction through representation theorems. My sense is that he was somewhat successful, but in a rather limited domain: one has a wide range of these theorems for various types of exchangeable data, but not much beyond that. Do any other readers know about the state of the art in this niche, particularly if there are representation theorems for classes of statistical models with marked degrees of complexity (esp. non-exchangeability)?

Sam: Thanks for your comment. In one post, we had a very long discussion (in the comments) on this issue with some part-time(?) de Finettians.(e.g.,https://errorstatistics.com/2012/10/18/query/).

I haven’t reread it recently, but I recall my own “take away” message was something like: he does not get away from statistical generalizations, but replaces one kind that is relevant (for prediction and explanation) with one that, even where operational*, is of dubious relevance to finding things out.

*I don’t even think one learns about ones own beliefs through private reflections on proposed bets, but the main thing is that to be relevant, I’d still need to connect them to a reliable procedure–so I don’t get anywhere, at least when it comes to the aims of learning about the world.

But I, of course, am a frequentist in exile. There are many (including some who have commented on this post) who feel warmly toward, and know much more about, DF–and/or his latest incarnations. So I’ve no doubt he’s succeeded in casting a little bit of a spell over many who are inclined towards idealistic/constructivist philosophies…and even a couple of fairly hard-nosed contributors to this blog. You can search this blog if interested.

I am not sure if much can be said about non-exchangeable probability specifications… I don’t think exchangeability should be thought of as limited….

Recent applied work using the Aldous-Hoover representation for row collumn exchangeable arrays is:

James Robert Lloyd, Peter Orbanz, Zoubin Ghahramani, and Daniel M. Roy. Random function priors for exchangeable arrays with applications to graphs and relational data. In Advances in Neural Information Processing Systems 26, pages 1-9, Lake Tahoe, California, USA, December 2012.

There are other papers that do the same sort of modelling without specific reference to the Aldous-Hoover theorem.

I think the usual subjective Bayesianism can be regarded as both 1,2 and 3 (and usually 4) of the following ideas:

1) A focus on real things, real observations and real decisions.

2) use of decision theoretic (subjective) probability to deal with 1)

3) The use of conditioning to obtain predictive distributions in the fully specified case. The use of the fundamental theorem of prevision to obtain imprecise predictive specifications in the imprecise case. (…and I don’t know what in the approximate case…)

4) Modelling exchangeable sequences by giving parameters distributions (the de Finetti representation) or possibly some other representation (e.g. Aldous-Hoover).

I like this combination, but it seems by adopting a non-standard collection of these ideas; new statistical philosophies could be adopted… including non-standard choices of probability intepretation and inferential tools…. so Christian and Deborah I agree with you here..

Objective Bayes uses 3 and 4 and to some degree 1.

Computing a p-value under an exchangeable specification uses 4 and maybe 3. (perhaps related to Gelman Bayes)

The use of Bayes nets uses 1,2 and 3. This to me counts as subjective Bayes, but isn’t statistics in the ususal sense.

Some frequentist predictive methods use 1 only. http://en.wikipedia.org/wiki/Prediction_interval

I would like to understand these methods better, I expect them to be quite limited….

The standard frequentist response is (I think) to reject all of these ideas.

David:

Well it isn’t clear that it even successfully represents real decisions, not mine anyway*, but I was mainly considering the use of statistical models for inference/learning/finding things out broadly. If the tool is ill suited for finding out approximately what is the case (as regards a given problem), why would it be desirable to base decisions on it? (I think even Bayesians, most of them, distinguish inference and decision.

*One familiar kind of “decision” problem to me, would be deciding to buy/sell a stock. Even there, I would always weigh the extent a conjectured move has been stringently tested, what errors I have not been able to scrutinize, etc.–not my subjective beliefs/hopes. There are other factors which are at odds with subjective Bayesian decision-making, though not immune to a more relevant formalization. Having said all that, finding things out is distinct, but basic even for decision-making. Delayed in airport, so expect errors.

Describing your approach to the decision of deciding to buy (or sell) a stock could be very illuminating. I get the impression that you are applying your error-statistical method.

I’d like to better understand a few things:

1. What does it mean to stringently test ‘a conjectured move’? What is being tested? Why would a test (of whatever that might be) be applicable to this decision?

2. How do errors that you have not been able to scrutinize enter into the decision?

3. Does academic research demonstrating the efficiency of the stock market enter into your decision?

Paul: It would be great fun to try to answer these, but don’t have time just now. Perhaps remind me at some point….

Deborah,

I think the stock example of personal decision making is a good one. Maybe to make it more precise it is useful to imagine writing an algorithmic trading algorithm, as it is the joint distribution of prices that is of interest. If you are suggesting subjective Bayes is not appropriate here it seems you are adopting quite a radical position presumably you are rejecting some assumption that the usual axiomisation of decision making under uncertainty is built upon, which one?

In this case, I don’t see how there is anything relevant other than your judgements about the joint distribution of the stock price. The stock example seems particularly difficult for a frequentist because there is no repetition even more than normal “you can’t step into the same river twice”.

I don’t think inference and decision making are very different. I would view inference as more detailed then decision making i.e. inference involves judgements of what the future may bring and gives guidance for decision making for any set of possible decisions or any utility function. I see things the other way around: If a tool is inappropriate for decision making why would you judge it useful for finding things out?

David: I would have hoped that if there’s anything readers would have gleaned from this blog is that frequentist statistical inference in science does not depend on repetitions of the same event, repeating hypotheses or stepping into the same river twice. On the contrary, it’s a method for learning about this world, not a relative frequency of worlds or even repetitions of the inference process. *When they learned about the Higgs, to take a recent example, by considering the probability of a 5 sigma difference, the p-value, as usual, let’s us assess if a hypothesis about a statistical fluke was well-probed, magnitudes ruled out, etc. If someone reports the accuracy of a scale, do you really deny it can inform about your weight?

*I say “on the contrary” because the arguments Bayesians give for their various accounts seem to rest on claims that in a long enough series of continual updating (assuming non-extreme priors, and ~ iid samples) there would be convergence in posterior probabilities, or other behavioristic, asymptotic guarantees.

Paul: I’ve sketched severity in this post and you can look up severe testing on this blog. Basically, a claim has passed a severe test to the extent that the ways it could be in error have been well-probed and found absent. Here a conjectured move in stock price could be the “hypothesis”. Information about what is the case is obviously relevant to decisions having to do with what will occur. However, as I say, there are many other “facts” that enter.

As for errors I haven’t been able to scrutinize, not having bothered to check certain details, e.g.,about some of the items in a Co.’s new pipeline, alters the extent to which any inference or conjecture about price targets may be off.

Although my constructivist leanings give me some affinity with de Finetti’s subjectivist approach, I think (re David’s point #1) that what is required there is not so totally real as de Finetti and others try to tell us. In by far the most situations the betting games required for eliciting the prior are not really played, and if they are, they are usually totally artificial and little can be done to validate elicited priors satisfactorily before observing the data (and afterwards it’s too late according to the original setup). Note that I use the term “prior” here in de Finetti’s sense incorporating all aspects of the model set up before data, including what is usually referred to as “sampling distribution”.

I don’t see much less idealism in the construction of a subjective prior (postulating that this corresponds to some “real” characteristic of the individual’s thinking) than in postulating a frequentist parametric model as a “mental image” of a real data generating process, although I wouldn’t use the wording “finding out what is the case” for inference using such a model.

Actually bundling phenomena into something that can be called “real data generating process” is already a mental construct but that’s not a reason not to work with it. Whatever is called “real” can be described as a construction of a human observer, and subjectivist Bayes can’t get around that either.

Christian,

If subjective Bayes is viewed as the following paradigm then I agree with you.

1. elicit a high dimensional prior distribution, using betting scenarios of the joint distribution of both the data that you want to use and the unknown quantity that the utility of your decision depends upon.

2. condition on the data in order produce a predictive distribution.

3. compute expected utility of decisions.

i.e. it is not possible to do step 1. in a reasonable way due to the sheer scale of the task in order to achieve performance guarantees for step 3.

So, I am actually a critic of the usual fully specified conditional view of subjective Bayes as a foundational argument… Although powerful algorithms are derived by using convenience priors in order to formulate 1.

On the other hand a more modest version might be possible, such as:

1. access decision preferences of hypothetical decisions and real decisions. Try to be articulate, but do not attempt to fully specify the underlying joint subjective probabilities.

2. test to see if there is any incoherence in the implicit underlying subjective probabilities, if so consider how the decision preferences need to be changed to remove the incoherence and go back to step 1.

In my eyes useful probability i.e. decision theoretic probability is and can only be subjective probability. If I want to yield subjective probabilities as output, I must be prepared to specify them as inputs. The problem I see that the frequentists and the objective Bayesians have is that they want probability to conform to two criteria the first is as a driver for decisions (otherwise it would be useless) the second is either frequency or symmetry. It seems to me attempts to do this compromise the decision theoretic properties of probability.

David: I disagree. I cannot see how one wants to make decisions that are not based on knowledge/reliable inference. It totally perplexes me. I don’t even want to know what I believe so much as how I can be misled by beliefs.

Now would I view relevant error probabilities in terms of symmetry.

David: I agree that it makes sense to apply subjective probabilities in a modest fashion as you outline. But such modesty in my view should also prevent us from making universalist statements such as “useful probability can only be subjective probability”, because being modest also means acknowledging that the foundations are not as convincing in practice as they may seem theoretically (Mayo has different views on the latter anyway), given that in practice the approach cannot be carried out in its “pure” form. We won’t get anything “uncompromised” with any approach.

Deborah: I don’t really know how to respond, except to say maybe we need to agree to disagree once again.

As I mention in a couple of places if probability is used to guide decision making then that is called subjective probability. I see two possible ways out of this for a frequentist or objective Bayesian:

1) Deny that they want the probabilistic output to be used in order to drive decision making in the real world. This seems to be the most popular route for frequentists.

2) Argue that subjective probability is the output, but not the input. This seems to be the most popular route for objective Bayesians.

David: If you want your decisions to be based on evidence of what’s the case, be it science or stocks, then you’d better have an account that can teach you about what is the case about the question at issue, even if approximate. What is at issue is whether a given account actually ought to be considered relevant for decisions–you cannot start out assuming it is, simply because you say so–, and in my book (and my portfolio) if the evidence isn’t indicative of what is the case, then it’s a disaster for determining actions.

Christian: Having followed this blog, I think you’ll agree that there has been nothing in the way of clarifying the meaning, or supporting the use, of “subjective probabilities”. On the contrary, we’ve seen extensive renouncing of all principles that at one time were held as core for subjectivists. For just one example:

https://errorstatistics.com/2012/04/15/3376/

Likewise, we have seen the failure of extensive elicitation procedures (search the Jim Berger posts). Hence the popularity of “conventional” Bayesianism, with all its questionable and shifting meanings.

Mayo: Thanks for linking the post again; indeed I missed that one (although I don’t miss much here, I think).

Anyway, I think that problem here is a general problem with “general principles”. Many people want something they can stick to generally (philosophy and science are full of such people), and such people tend to argue that “the only way of seeing this is so-and-so”, which it usually isn’t. So it’s not too difficult for intelligent people to come up with examples in which these general principles will lead to nonsense. That’s allright; I totally agree that nobody *has to be* a de Finetti-style subjectivist. However, one can also approach such principles with a more open-minded and constructive attitude, asking, “what’s in them what I can use, or from which I can learn something?” (And then I won’t use them where they lead to nonsense and I’ll dispute any universalist claim in their favour.) *For me* there’s enough in subjective Bayes that I can get its spirit, and that I understand how some principles that sometimes lead to nonsense can still constructively be used. The “meaning” is indeed based on the principles, but I think that one doesn’t have to subscribe to them in full generality to get it and apply it on a case-wise basis. (Whether the problems with the principles are strong enough to say “not for me at all” seems to me a rather individual matter.)

Admittedly, in practice I use it quite rarely because I don’t very often think that I need it.

Deborah: that post was interesting, but the comment thread from the start went in a crazy direction, so I didn’t comment at the time… but I did appreciate it.

I don’t think to say “useful probability is subjective probability” is a universalist statement rather it is a tautology. If probability is useful then by definition it is subjective. The key question is can probability be both useful and also conform universally to some other criteria e.g. frequency, the answer seems (to me) to be no. Any conflict between the subjective interpretation of probability and some other criteria and must be resolved in favor of the former (or probability will lose its decision theoretic motivation and hence no longer be useful).

Equally, I don’t see any need for modesty about the standard axiomisation of decision making under uncertainty that lead to subjective Bayesian statistics as this is the solution to an important historical problems. I do however see a need to recognise that arbitrarily fine subjective probability specifications as suggested in many places e.g. Lindley (2000) are not viable in practice.

Christian, your suggestion of a case-wise basis seems to me to fall into a call for eclecticism. I agree very much with Deborah when in previous posts she has raised the important question “who or what is doing the work”. How can such a judgement be made? The use of e.g. cross-validation in order to access a statistical procedure is in fact advocating the use of statistics to access statistical methods. This circular reasoning is reminiscent of Hume’s problem that justifying induction using induction “beg’s the question”. So I see something unsatisfactory about this reasoning being elevated too much.

I think from a “purist Bayesian” point of view at any point that somebody expresses a set of decision preferences without any implicit incoherence then in a broad sense this is compatible with subjective Bayes i.e. subjective Bayes happens all the time. I don’t object to the use of the output from algorithms or procedures with non-Bayesian origins in order to guide decision making, however once a decision has been advocated a subjective Bayesian interpretation of the analysis is also valid. It is then possible to audit the constraints on the implicit subjective probabilities, granted this is trivial in the common case where explicitly only one decision preference has been made.

A lot of the Bayes vs non-Bayes debate seems to be centred around: Are fully specified conditional probability methods the only thing that we need? I don’t really relate to this debate. Yes these methods are optimal but only under completely impractical circumstances. Because of the compromised specification these methods can be viewed as procedures on a similar footing to other non-Bayesian procedures.

If an eclectic practitioner runs these procedures on a data set perhaps evaluating them with hold-out and uses them to access decision preferences then as Deborah says “who or what is doing the work”. If real decisions are being made applying to real observables then (to me) this is a partially specified subjective Bayes essentially by definition.

Anyway that is how I see it…

All of this I might add is immaterial unless technical improvements can be established by the clarity that the philosophical position brings. I do believe there is room for this. In particular more formal procedures for auditing the subjective probabilities asserted on the bases of procedures can be done by using the fundamental theorem of probability (I have started work on this).

A more difficult question involving both philosophy and practice is on the use of approximate conditioning methods. The fact that an approximation is employed means these methods will under some circumstances show some of the difficulties that non-conditional frequentist methods show. This is in my view the most exciting topic in philosophy of statistics. Some related work is Bayesian Monte Carlo (Bayes numerical methods), the debate between Sims and Wasserman, Bayes Linear methods (Goldestein).

David R: If one calls subjective probability (whatever it is, no one has been able to say) degrees of personal credibility or the like, then the supposed relevance attained by latching on to the word “probability” (which is useful) vanishes. You wrote: ” Any conflict between the subjective interpretation of probability and some other criteria and must be resolved in favor of the former (or probability will lose its decision theoretic motivation and hence no longer be useful).” This is a non-sequitor. This is somewhat like saying food no longer nourishes unless its made up of degrees of belief (about nourishment).The usefulness of probability and probabilistic inference as regards decisions stems from links to relative frequency (which, if correctly used can inform about what is the case, and therefore can inform about what is best to do), even though I deny we’re in need of assigning degrees of probability to hypotheses.

Firstly a point of agreement. I would also deny we’re in need of assigning degrees of probability to hypotheses. As I hope would be obvious from previous discussion.

I am in a way surprised that you don’t seem to have a handle on subjective probability, but this is probably unfair as many who call themselves Bayesians also have trouble. Rather than debate you further I will have a stab at defining it (also I didn’t understand the analogy in your response).

Subjective probability is a marginal rate of substitution. More intuitively probability marks points of indifference to decision making in a utility free way. I concede that it is not possible to completely separate the concepts.

If A is a binary variable which represents the outcome of an unknown experiment then assuming the linear utility for money (not too unreasonable for small values) then imagine someone asks you the following questions and you answer…

you are asked: would you prefer $0.5 or $1 if A=1 and $0 otherwise?

you answer $0.5

you are asked: would you prefer $0.3 or $1 if A=1 and $0 otherwise?

you answer $1 if A=1 and $0 otherwise

you are asked: would you prefer $0.4 or $1 if A=1 and $0 otherwise?

you answer indifferent. Therefore your subjective probability for P(A=1)=0.4. Subjective probability of course has a direct decision theoretic interpretation.

This example is from Frank Lad’s book. Subjective Bayes is then the study of consistent expressions of decision preferences of usually (very) large numbers of events and/or decisions.

To get a sense of scale of fully specified probability if a binary experiment is carried out N times then if exchangeability is assumed there are N+1 subjective probability assertions required on the other hand if exchangeability is not assumed then there is 2^N probability assertions required. This is why fully I don’t support the fully specified subjective Bayesian probability such as Lindley (2000).

David: Your statement “If probability is useful then by definition it is subjective” looks very confident but I have to admit that I don’t see the reasoning behind it.

How do you see whether something is “useful”? Do you define usefulness in the very restricted way saying that the only way something can be useful is if it is measured useful by decision theory plugging in probabilities for future outcomes?

Are you seriously claiming that nothing else can be useful?

Probably you rather say that if people do something else and it is useful, something more-or-less equivalent can be written down using subjectivist decision theory. That’s a tough case to make, I think, and in any case doing the latter would probably not add much use.

Regarding eclecticism, I’m very much in favour of people understanding as clearly as possible the implications of what they are doing, and this means that throwing together different interpretations in the same application is often no good. However, I don’t see why one cannot use different interpretations of probability for different applications.

The question “what is doing the work” is of course interesting, but here, too, I’m rather sceptical about universalist statements and would want to look at specific examples.

I’m wary of claims of the kind “whenever people make real decisions based on uncertainty, they at least implicitly use Bayes”. Empirically it is well known that this is not true, and even normatively situations can be constructed in which “playing Bayesian” it’s clearly not the best you can do (such as betting against real opponents who can be influenced psychologically).

In my mind I am making such a modest claim that I am surprised anyone wants to argue. At the same time, I have probably omitted some important caveats….

I am indeed defining useful in a restricted way. I consider useful to be the ordering of decisions as applied to observables. This is exclusively (as I see it) the study of subjective Bayesian decision theory. I see it as a fundamental difficulty with non-subjective statistics that (by definition) these theories do not study the ordering of decision preferences.

…I am sure there is room to define “useful” in a more general way giving latitude to argue here…

> I’m wary of claims of the kind “whenever people make real

> decisions based on uncertainty, they at least implicitly use

> Bayes”.

I am not claiming that they are computing a conditional probability, I am simply stating that they are expressing a decision preferences, which places constraints on underlying subjective probabilities… these decision preferences can then be studied using subjective Bayesian theory… I don’t see much latitude to argue with this, do you?

A quick example: I am willing to believe that a support vector machine produces good classification boundaries, subject to and calibrated by obtaining reasonable results on a hold out set. I think that a statement of this form can be made precise with subjective probability and when considered in combination with a number of boundaries are non-trivial and worthy of study. As mentioned above this study would focus on checking coherence (rather than conditioning). Moreover, I think people are willing in practice to make real subjective probability statements in a setting such as this. Finally, I don’t see how it is possible for the decision theoretic preferences expressed to have an origin from VC theory or any other frequentist theory, decision preferences are by their nature uniquely subjective Bayesian statements.

> even normatively situations can be constructed in which “playing

> Bayesian” it’s clearly not the best you can do (such as betting

> against real opponents who can be influenced psychologically).

I am unaware of any counter examples of subjective Bayesian decision theory in a normative sense. I have read fairly widely, if you think you know one, I am happy to look at it. If betting against a real opponent contains psychological elements, I don’t see why this cannot be incorporated into a decision theoretic set up.

A quick further comment:

Firstly thanks to everyone for an interesting discussion!

There are aspects about what I am saying that are quite dogmatic e.g. I was quite dogmatic about the meaning of useful. I am also dogmatic about the importance of observables and coherence.

I am quite liberal in other respects. I am liberal about probability specification. I see the practical role of conditioning as limited. I do not defend the fully specified subjective Bayesian argument.

I have some small private doubts about some small aspects of the things that I have said dogmatically. Its difficult to list all of these without interrupting the argument in an unreasonable way. I have listed some though.

I consider the position that I argue for the traditional operational subjective position, with one small contribution. I think it is very common place for people to judge (subject to some conditions) the output of algorithms/procedures/estimators to be of practical use in decision making. I would argue that the operational subjective theory can and is “wrapped around” such methods. I put this forward as a tentative answer to “what is doing the work”.

I agree that it is better to be eclectic than to settle into the wrong position.

I am more interested in understanding this better, than winning the argument.

There is some criticism here and elsewhere about the Bayesian position. Very little of it applies to the operational subjective position. I appreciate the difficulty of arguing with Bayesians when they argue so much with themselves. That said, I think a serious attempt at finding shortcomings in the Bayesian position should consider the operational subjective theory.

David: As a constructivist, I don’t believe in the “existence” of things such as “subjective probability” and “quantitative utility” unless somebody constructs them. You can postulate the existence of “underlying subjective probabilities” where they are not explicitly elicited, but that’s metaphysical and I won’t buy it. People have dealt with uncertainty and done useful things before they had probability calculus and decision theory.

One can say that people who use probability calculus always make some kind of decisions, be it whether to publish a certain finding or not (and people, in general, make decisions all the time), but I don’t see this as an argument in favour of a specific way of formalising decision making under uncertainty that has its benefits in some situations and doesn’t work in others.

I have done a lot of statistical advisory and I hardly ever (and never in science) came across a situation in which people were happy to quantify utilities in the way required for straightforward application of decision theory. Even in commercial applications (where things look particularly easy in this respect at first sight because people earn or lose money) this is rather an exception than the rule. Now you can claim that quantitative utility still exists but it’s too complex (or people are too stupid) to specify precisely, but that’s metaphysics – what does it help? It is of course also possible to make some bold (wrong) assumptions in order to get a quantitative utility one can work with, but my interpretation of this is not “we approximate the true utility” but rather “we adapt our problem to a formal theory that we want to use because it wouldn’t apply without such adaptation”. Pretty much the same applies to prior probability assessments.

Is this useful or not (more useful than using other interpretations of probability and make statements that don’t rely on quantification of utility)? It depends on how convincing the elements that we need to construct look like in the specific situation. That where my eclecticism comes from.

Regarding counter examples, I’m not going to go to the library for comment 48 under a blog posting, but if I recall it correctly, there is some stuff in Walley’s book on imprecise probabilities. Mayo (if she still follows this) will probably have more references for you.

The “betting” example I was referring to I at some point figured out myself but afterwards I found remarks in the literature indicating that this is known (was it Walley? I don’t know anymore). The idea is that if you have reason to believe that your betting opponent has certain strong beliefs and a preference for certain types of bets, you may get more money out of him on average if you offer something that technically allows him to create a Dutch book against you than if you are coherent (of course this depends on precisely what you think your opponent thinks).

Another situation is that if you specify a prior probability assignment and later change your mind and start to believe, for whatever reason, that what you did in the beginning isn’t good anymore, it can be shown that (again depending what exactly the assignments are and how your belief has changed) it is better to adapt your remaining probability assignments for future eventsin a way incoherent with some of the original but still open bets than sticking to your initial prior assignments. This, again, is somewhere in the literature, at least indirectly (I’m quite sure it’s in one of Phil Dawid’s many papers exploring the implications of the Bayesian setup).

You may say that if beliefs are changed later in a non-Bayesian way, this means that the prior specification was wrong already. Again, that’s metaphysics. What was specified was specified and at the time there was no other.

#48 (Christian): Thanks so much for the extremely thoughtful comment. I will study it later, it’s Ontometh Conference time!

Thanks Christian for adding comment 48. I will try to respond briefly.

On metaphysics and constructionism:

The method of eliciting subjective probabilities is always done by looking at somebody’s decision preferences. In some situations you see this as legitimate and in other situations metaphysics, I don’t understand why or how you draw a distinction.

Equally to me it is not metaphysics to use partial and imprecise probability and utility specifications. The imprecision allows us to be inarticulate, and usually we are very inarticulate. The precise theory in my mind has nearly no applications. I agree fully with your comments on the difficulties with the precise theory.

On eclecticism:

I am eclectic in the sense that if something seems useful to you use it. My suggestion is that if some methods seem useful to you then you can state this as a set of decision preferences and audit the consistency of this using Bayesian decision theory. … perhaps this qualifies as constructing subjective probabilities…

I think the criteria that most (eclectic) practical people use when accessing a statistical model is of the form “did it work in the field?”, hold out and cross-validation are used as the next best proxies. This procedure uses how well it worked or how well it would have worked as a proxy for how well you expect it to work. The last of these is a subjective Bayesian statement. My recommendation is to wrap subjective Bayes around an eclectic view about statistical methods.

On subjective probability:

Subjective probability is a technical concept with a precise meaning. Yes there are some difficulties e.g. linearity of utility. Other concepts of probabilities have at least as many difficulties.

Sometimes the word “subjective” is used in the colloquial sense rather than the technical sense in order to bash subjective Bayesian theory with a cheap shot. For example to conflate subjective probability with hopes or biases is extremely misleading. Prof Mayo is a bit prone to do this. (Christian, I know you don’t).

My statement that subjective probability is useful probability was in part provoked by these comments. The adjective subjective invites throw away criticism. I was attempting to outline the advantage that using subjective probability has including explicit connection to real decisions. To say subjective probability is useful probability is a universalist statement which are hard to prove. Beyond that, I don’t see an argument against it though…

Counterexamples:

I will reread Walley and Dawid when I get a chance.

In terms of the psychology example I think your argument is simply an application of Bayesian decision theory rather than a counter example. You are simply showing that the bets you offer to an opponent differ from your subjective probabilities even to the extent that you might offer bets with a dutch book (I assume you put zero probability on the possibility that the opponent will exploit this).

Your other examples relate to temporal coherence. I don’t think temporal coherence is necessary to develop a useful operational subjective theory. The concept of “updating” probability, I think belongs to less pure forms of Bayesianism. It is an interesting area though…

I am therefore skeptical of the claim that subjective Bayes “doesn’t work in some situations”.

… anyway thanks again for the discussion and thoughtful comments….

Sorry David, I do not see why calling subjective or personalistic probability personalistic is “a cheap shot” (as opposed to an accurate shot). As subjectivist Lindley made clear, the very idea of right or wrong is meaningless for a personalist, e.g., 1976 Bayesian Statistics, in Harper and Hooker (359). See also his paper in this blog:

https://errorstatistics.com/2012/07/12/dennis-lindleys-philosophy-of-statistics/

David: I have no problems with “subjective probability is useful probability”, but rather with the word “only” in front of it, which you omitted this time.

Regarding metaphysics and legitimate application: What I’d qualify as metaphysics is to assume the existence of some underlying “true” subjective probability and to use this as an argument even in situations where nothing to this effect has been observed. Probably this idea is already present in the word “elicitation” and I’d really rather say that subjective probabilities, where applied ion the proper technical sense, are constructed. There is nothing illegitimate about that in my opinion. It may well be useful but its use in science relies on the ability to convince others that the prior decisions were valid and reasonable.

I didn’t apply the term “metaphysical” to any application of Bayesian statistics in situations where the prior has been properly justified as formalisation of someone’s belief (although whether anybody else has to accept the results is a different stroy).

What I think is metaphysical are sentences such as: “This procedure uses how well it worked or how well it would have worked as a proxy for how well you expect it to work. The last of these is a subjective Bayesian statement.” How so? Where is Bayes’ theorem in it? How can you check coherence? How can you see subjective probabilities in people doing something completely different than Bayesian statistics only by the fact that they make decisions and arrive at some statement about what they expect/think/believe? As long as you don’t enforce their preferences to be formally Bayesian, you can have no idea whether they actually are.

“My recommendation is to wrap subjective Bayes around an eclectic view about statistical methods.”

Fair enough but I don’t see what we get out of it. Perhaps you illustrate the “added value” by an example? At the moment it looks like “OK, people do other stuff but if I make a lot of effort, I can reformulate it as Bayes in or4der to illustrate my universalist claim” – a) I’m not sure if you really can and b) even if you can, we could well live without it, couldn’t we?

“In terms of the psychology example I think your argument is simply an application of Bayesian decision theory rather than a counter example. You are simply showing that the bets you offer to an opponent differ from your subjective probabilities even to the extent that you might offer bets with a dutch book”

The thing is that your argument assumes that there is an unobserved true subjective probability behind the betting procedure, but in de Finetti, the betting procedure is the major device for elicitation, and he emphasizes operationalism, and defines coherence *in terms of the betting procedure*. To say that “there is some true coherence that we cannot see in the betting procedure” is adding a metaphysical layer. What does it refer to if not the betting itself?

Deborah,

Of course I don’t object to calling personalistic probability personalistic.

I do object to suggestions that subjective or personalistic probability is in some way connected to “hopes” or “biases”.

I tried to dig up an old quote from you, but I couldn’t find it. So I will paraphrase from memory (apologies for errors). “If a prior incorporates beliefs into the analysis then I don’t want it, if a prior represents ignorance what is the point”

This quite slick statement is I think a reasonable response to some Bayesian advocates. On the other hand there is implicit misunderstanding about the role of a model/prior which is or at least should be as a convenient means to represent exchangeable probability specifications.

Bayesian statistics involves the specification of very high dimensional probability distributions. The joint distribution is what is important. It is a mistake to think the role of the prior is to allow beliefs about some parameter values to be favoured over others. For example there may be more than one model/prior specification for representing the same exchangeable sequence.

David: No I wouldn’t have said this. What I have said in various ways is something like: if a personalistic prior changes the inference, then who wants it? If it does not, then who needs it? This comes from R.A. Fisher, Kempthorne and others.

Hi Christian,

Thanks again for the reply.

In a number of places, I am unsure if there is a genuine disagreement, or a lack of precision in the language that I am using is the source of the problem. It seems that the main point of difference may be little more than in your eyes I am overstating the case for subjective Bayes.

I think the subjective Bayesian theory uniquely is the study of decision preferences. I think a credible argument can be made that this is the sole practical use of statistics. A universalist claim like this is difficult to prove, however in my view it is also difficult to argue against. My mind is open, but I think there is good reason to think that a non-trivial universalist statement appropriately qualified holds. In my opinion it is an argument worth having.

It is a fact that most applications of statistics are not subjective Bayesian and many are used to guide decision making. I find myself willing to believe that these methods are useful and am interested in applying the calculus of subjective probability and prevision to these statements, to understand their consquences.

Let me give a quick example. A neural network and a support vector machine are applied to cancer diagnosis problem. On a test set the neural network is found to diagnose correctly 97.5% of the time, on the same test set the support vector machine is found to diagnose correctly 95.8% of the time.

What is the point of such statistics? Not much I think, unless we find ourselves willing to assign a 0.975 subjective probability that the neural network will classify the next case correctly and a 0.958 subjective probability that the support vector machine will classify the next case correctly. If we find ourselves willing to assert these subjective probabilities, and speaking for myself I (tentatively) do, can we analyse the coherence of these probabilities? Well we can apply the fundamental theory of probability in order to ask the question what is the probability that the neural network and the support vector machine will agree on the next case. This is computed with a relatively simple linear problem although the set up is a bit involved. It turns out that the probability that the two classifiers will disagree is between 0.0017 and 0.067.

Is this useful? I think it may be. If I choose to stick to these subjective assessments, I have identified a region of space where the classifiers disagree which I have low probability that the next input will fall. I have also implicitly specified probabilities over the input space, something that the creator of classifiers normally do not think too much about…

I have written a short paper (only 4 pages) outlining the philosophy and technical aspects of this calculation. You or anyone else is more than welcome to read it. I would love to publish it, but to be honest don’t know what to do with it at the moment.

Christian,

I wanted to respond separately to:

> but in de Finetti, the betting procedure is the major device for

> elicitation, and he emphasizes operationalism, and defines

> coherence *in terms of the betting procedure*.

Its not the only device, in fact while de Finetti used betting in “La prévision: ses lois logiques, ses sources subjectives”, he changed it to using methods closer to the ones I outline above in “Theory of Probability”. This is nicely discussed in Donald Gillies review of “Theory of Probability”.

The reason for the change is exactly the problem you describe. In a betting framework game theoretic concerns come into play and betting is not a good elicitation device.

So I don’t believe you are pointing to a genuine counterexample.

David: Dear David,

let me start by saying that I enjoy this discussion a lot.

First posting:

Of course in order to back up the universal claim you’d need to have an argument against the possibility of existence of uses of statistics that do not fall in the subjective Bayes category. Having nice examples that some uses indeed do belong there doesn’t contribute much. However:

> I find myself willing to believe that

> these methods are useful

> and am interested in applying the calculus

> of subjective probability

> and prevision to these statements,

> to understand their consquences.

I think that that’s a fine project (and I’d be happy to read your paper; can you find my email address?). I’d guess that it will lead to positive illustrations of your statement but I don’t see how it could lead to the impossibility argument mentioned above.

Furthermore, in the cases in which you suceed, I may not agree about what this implies regarding the role of subjective Bayes. If something in statistics is done from a different point of view and you are able to reformulate it in suvbjective Bayesian terms, I could argue that a) this is not an achievement of subjective Bayes but of a different interpretation of probability regardless, because although you reformulated it, one may not have found it having only subjective Bayes at disposal, b) by reformulating it in terms of subjective Bayes, you actually change its content.

See this, for example:

> Not much I think, unless we find ourselves willing to assign a

> 0.975 subjective probability that the neural network will classify

> the next case correctly and a 0.958 subjective probability that the

> support vector machine will classify the next case correctly.

The frequentist interpretation would be: “*If* test and training data are iid and the data we see in the future as well, then these numbers are the best estimators we have of the true correct prediction probability.” Note that I don’t have to believe that the iid assumption indeed holds for this to be useful. What I in fact believe is that iid is probably wrong but I have no clue in which way. So my subjective probabilities are definitely *not* 0.975 and 0.958, but in absence of any better estimates I will still use their *ranking* for deciding which classifier to use.

Now you can say that therefore implicitly I have some kind of prior assignment allowing for other structures than iid to hold and again subjective Bayes could be applied (I’m quite sure I’d arrive at lower personal probabilities for both classifiers but I wouldn’t bother to elicit them because that’s probably a mess – and remember from before that I wouldn’t agree to the statement “I have a true subjective probability even without knowing it before I have bothered to construct one explicitly”). Fair enough. But this wouldn’t allow you to reconstruct the specific numbers 0.975 and 0.958 from a subjective Bayes perspective, whereas I can give a perfect frequentist explanation for what they are, despite them not being my subjective probabilities. Or you could reconstruct these numbers, but declaring them “subjective probabilities” means that they differ from those of the frequentist using the numbers with an attitude like mine.

Second posting:

Good point. I don’t have the time right now to discuss alternative devices (I have thought about at least one of them at some point in the past and one can find counterexamples there, too). Anyway, as long as you are operationalist, you have to specify a device that then *defines* what you mean. You can’t say, when confronted with problems with the device, that the device is not perfect in measuring the real thing, because there is no real thing apart from what the device is measuring. You can switch to another device, OK, I grant you that.

Hi Christian,

Thanks again for the response, I have enjoyed the exchange a lot as well. I have been delayed responding for a few reasons including (quickly) reading “A Constructivist View of the Statistical Quantification of Evidence” which is very interesting. I still don’t have a very nuanced understanding of the constructionist view but the idea is appealing.

Perhaps an issue at the heart of our disagreement is that I am perfectly happy to assume there is an underlying subjective probability even if it is unspecified. I don’t see this as being true but unknown, but rather as there being a range of values compatible with the assertions you have made so far. Perhaps you would say that I construct an imprecise probability framework (and don’t really understand someone else objecting to that) or alternatively perhaps I am willing to indulge in metaphysics, I am really not sure. Nonetheless, I am happy to do this and would see operational elicitation devices as an imperfect way to gain access to this. An elicitation therefore involves a list of partially ordered decision preferences that can be viewed as expectations computed over unknowns. The specification of a probability or prevision such that P(k c)=k P(c) for all k is a useful special case but, I freely admit, difficult to specify in practice. There are definitely difficulties converting decision preferences into probabilities (i.e. valid for any k) and to study decision preferences without subjective probability seems to not leave too much to study….

While it could be argued that “there are uses for statistics other than the ordering of decisions”, I am not aware of any serious attempt to do this. I therefore am happy to restrict statistics only to the procedures useful for ordering decision preferences. This may be viewed as a univeralist statement, but I don’t think it is a particularly far reaching or outrageous statement. Implicit in this view is that I see little value in non-subjective statements. Although if somebody want to subjectively assert for me p-values are useful in a given situation (and useful applies to real world decisions and observables) then I would not argue, but I would see this as a subjective statement. In the example discussed above the subjective statement that decision boundary one is better than decision boundary 2 might be made, even if it was not accompanied with an explicit subjective probability.

I should concede there are some differences between seeing value only in subjective statements and to seeing value only in subjective probability as oppose to some other kind of probability. So I will walk away from that argument…

Tangential to the above discussion, I am surprised you find the assumption of i.i.d conditional on the parameters as objectionable. I would find that assumption pretty safe, but would be concerned about failure to condition properly on both the data and the shape of the decision boundary.

Thanks for showing interest in my note, It might be best to continue this discussion by email when we both have more time…

Cheers,

David

Dear David,

I don’t seem to be able to find your email address (at least not easily), so please send me something if you want to get in touch. I believe my address is easier to find using Google (I’m at UCL Statistical Science).