Comments get unwieldy after 100, so here’s a chance to continue the **“due to chance” discussion** in some roomier quarters. (There seems to be at least two distinct lanes being travelled.) Now one of the main reasons I run this blog is to discover potential clues to solving or making progress on thorny philosophical problems I’ve been wrangling with for a long time. I think I extracted some illuminating gems from the discussion here, but I don’t have time to write them up, and won’t for a bit, so I’ve parked a list of comments wherein the golden extracts lie (I think) over at **my Rejected Posts blog[1]**. (They’re all my comments, but as influenced by readers, so I thank you!) Over there, there’s no “return and resubmit”, but around a dozen posts have eventually made it over here, tidied up. Please continue the discussion on this blog (I don’t even recommend going over there). You can link to your earlier comments by clicking on the date.

[1] The Spiegelhalter (PVP) link is here.

Deborah, “For this issue, please put aside the special considerations involved in the Higgs case. Also put to one side, for this exercise at least, the approximations of the models.” This is a bit odd. It has been the main example brought up by you at the very start: I am not sure what the qualification “special” entails. And of course by putting to one side problems of approximation excludes the odd person here or there who thinks, and quite rightly in my opinion, that a P-value is nothing other than a measure of approximation.

Nevertheless:

You write

(1):The Probability test T yields d(X) > d(x) under Ho is very small.

or equivalently

(1′) Pr(Test T produces d(X)>d(x); H0) ≤ p.

I like this. When writing articles I always state at the beginning that real data will be denoted by a lower case letter, x, and random variables in the strictly mathematical sense by an upper case letter X whereby I occasionally write X(Ho) to denote that X was generated under a probability measure specified by Ho. The point about (1) is that no assumptions whatsoever are made about the – if you will pardon the expression – ‘true generating mechanism’ of the data x. For me the only random object in (1) is X, x is simply given, the datum so to speak.

How to interpret (1) or (1′)?

Interpretation Mayo:

Notice: H0 does not say the observed results are due to chance. It is just H0:μ = 0. H0 entails the observed results are due to chance, ….

Comment:Not on my reading it doesn’t: the observed results are x, not X, and the Pr in (1′) refers to X, not to x: Pr(Test T produces d(X)>d(x); H0)=\int_{d(x)}^{\infty}dP_o(u) where P_o is the distribution of X and x is fixed, the datum.

Interpretation Davies:

The data x look very different (in an important sense expressed by the statistic d) from typical data X(Ho) generated under Ho. The odd person here or there could, and quite rightly in my opinion, interpret this as saying that Ho is a poor approximation to the data x. Note that in this interpretation there is no requirement that the data x have to be repeated. In other words the interpretation is neither Bayesian nor frequentist. The probability measure defined by Ho is a poor approximation to the datum x, that is x does not look like a typical X(Ho). Now go find out why.

The use of the word ‘chance’ in your interpretation has a distinct randomness flavour and frequentist fragrance. This applies to virtually all interpretations. Here are some thoughts on randomness.

(1) The toss of a coin, random or chaotic?

author = {Strza{\l}ko,~J. and Grabski,~J. and Stefa\'{n}ski,~A. and Perlikowski,~P. and Kapitaniak,~T.},

title = {Dynamics of coin tossing is predictable},

journal = {Physics Reports},

year = {2008},

volume = {469},

OPTpages = {59-92},

(2) Roulette, random or chaotic?

The Newtonian Casino (Penguin Press Science)

(3) Quantum mechanics, random or chaotic?

De Broglie–Bohm theory

Richard Gill, http://www.math.leidenuniv.nl/~gill/

(4) Binary expansions, random or chaotic?

author = { Martin-L\”of,~P.},

title = {The deﬁnition of random sequences},

journal = {Information and Control},

year = {1966},

volume = {9},

pages = {602-619},

One (the only?) precise concept of randomness we have is that expressed in probability theory: we cannot even generate random data. We can however generate chaos, pseudo-random number generators for example, and we can to some extent even understand chaos, or at least I think I can. It turns out that chaos often ‘looks random’, in particular deterministic pseudo-random number generators can produce results that pass certain (but not all) tests of randomness. With the help of such chaotic systems we can generate data which ‘look like’ the random data we require. There are even theorems in number theory which are central limit theorems

Author = {Tennenbaum,~G.},

Publisher = {Cambridge University Press},

Series = {Cambridge Studies in Advanced Mathematics},

Title = {Introduction to analytic and probabilistic number theory},

Volume = {46},

Year = {1995}}

In the other direction so to speak we can analyse some chaotic systems using probability theory, for example Pollard’s rho algorithm for factorizing large numbers

https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm

Random or deterministic chaos? Who knows?

Hi Laurie,

Two references on chaos, instability etc that may (or may not!) be of interest

(1) Linear vs nonlinear and infinite vs finite: An interpretation of chaos by V Protopopescu (1990). PDF: http://www.osti.gov/scitech/servlets/purl/6502672

(2) The Curse of Instability by Kuehn (2015). PDF: http://arxiv.org/pdf/1505.04334v1.pdf

BTW – Kuehn recently wrote a comprehensive introduction to the geometric perspective on dynamical systems – http://www.amazon.com/Multiple-Dynamics-Applied-Mathematical-Sciences/dp/3319123157 and was a PhD student of John Guckenheimer (https://en.wikipedia.org/wiki/John_Guckenheimer) who wrote the classic ‘Nonlinear Oscillations, Dynamical Systems and Bifurcation of Vector Fields’ with Philip Holmes. I was recently at a workshop on numerical methods for analysing dynamical systems in NZ with Guckenheimer – a very insightful person.

In my post I never mentioned the word ‘effect’ and intentionally so. All I wrote was that the model was poor, the smaller p the worse the model, and in this case one should find out why. In general there will be many possible explanations one, but only one of which is that there is an effect.

Sorry, replace

d(X)>d(x); H0)=\int_{d(x)}^{\infty}dP_o(u) where P_o is the distribution of X

by

d(X)>d(x); H0)=\int_{d(x)}^{\infty}dP_o^d(u) where P_o ^d is the distribution of d(X)

I’m glad Laurie has taken the plunge into the new (physical) territory, and in keeping with the two-lanes of discussion, I’ll focus on the “due to chance” business in principle 2(b). I wrote a comment that was a bit edgy last night, got too tired to post it, so here it is (even if it’s too edgy for Sunday morning):

Take a look at the P-value police’s justification for their rulings Take #6.

6. one in 3.5 million is the likelihood of finding a false positive—a fluke produced by random statistical fluctuation

down (or at least “not so good”)

I thank Al for posting Spiegelhalter’s reply:

“I admit I might have been a bit harsh on Carl. But maybe not. In the phrase ‘Likelihood of finding a false positive’ , the object of the probability statement is ‘false positive’, which can be easily interpreted as a combination of both the observation (‘positive’) and the hypothesis (‘false’).”

https://errorstatistics.com/2016/03/12/a-small-p-value-indicates-its-improbable-that-the-results-are-due-to-chance-alone-fallacious-or-not-more-on-the-asa-p-value-doc/#comment-139830

He doesn’t seem to mind the use of likelihood, which shouldn’t be mixed with probability, but never mind that. The probability of a false-positive has been standard error probability language for donkey’s years. Now the PVP come in to re-evaluate well trodden terms to see if a Bayesian probabilist could possibly, by any stretch, construe them as denoting a posterior, and if so, he’ll issue a fine to the frequentist.*

Now there are misinterpretations of P-values out there, but what we’re seeing these days is different. The problem we’re seeing with the P-value police (or, more generally, frequentist or error statistical police), is that terminology that has developed a clear meaning within one school is suddenly stopped on the road and questioned anew and all frequentists given tickets if a rather different school, with a different form of inference (a posterior), could misconstrue them. I will discuss some of the other lines in the list and their history later on. This is what happened even with those particle physicists who bent over backwards to use the standard terminology with great care (as some of them told me). New readings were pronounced to ticket them for using perfectly standard lines. The PVP are “Bayesianly biased” and I say, “frequentist lines matter”.

*Worse, followers of the new fad toward interpreting tests in terms of a diagnostic testing factory, where priors are “prevalences” of true nulls in an imaginary urn of nulls, and alpha and power are used to form a likelihood ratio for a Bayesian computation, tell us that p-values are being misinterpreted because they differ from the result of this new computation (which is even called frequentist). Why? Because some people decided to call the proportion of true nulls given they’re rejected at level p (or predesignated alpha), the “probability of a false positive”! Never mind that this conflicts with established meanings.

I thank Larry Wasserman for the term “P-value police”.

Oliver, many thanks for the references. I downloaded Protopopescu (1990) and found it very interesting. He touches on some of the comments I made albeit at a much higher level. In particular I liked the final sentence about statistical descriptions being preferable to exact descriptions which I related to coin tossing. He also mentions the kinetic theory of gases. Unfortunately the wikipedia entry

https://en.wikipedia.org/wiki/Kinetic_theory_of_gases

states that the molecules are moving at random but Protopopescu gets it right and says it is a chaotic system with about 10^23 molecules. What wikipedia does get right is Maxwell’s contribution. (aside: I went to the church where he was buried last September)

… the Maxwell distribution of molecular velocities, which gave the proportion of molecules having a certain velocity in a specific range. This was the first-ever statistical law in physics.

Also along these lines is the Salem-Zygmund central limit theorem for lacunary trigonometric series

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1079068/

The sine and cosine parts are also asymptotically independent.

I know the Kuehn paper, you mentioned it in a previous post. I suspect the book would be somewhat heavy going for me as would be Guckenheimer’s. I will have a look at Chalmer’s first. The Borwein-Bailey book on Mathematics by Experimentation is also on my list. I didn’t realize that blogging was so difficult.

Hi Laurie,

Thanks for the ‘Mathematics by Experimentation’ pointer. More and more applied mathematics and engineering mathematics programs are dropping analysis requirements. On the other hand computer use is increasing.

One of my teaching goals is to (eventually) sneak analysis back in throughout the curriculum in the form of constructive numerical/computational analysis. Exploration then proof by algorithm, that sort of thing. From a first glance ‘Mathematics by Experimentation’ looks like one to add to my own ever expanding list!

I haven’t had the time to contribute to this yet but as usually I find it very stimulating, particularly Laurie’s comments, which mostly make much sense to me.

Still I have one comment to an issue mentioned by Laurie that I probably have made in some form before.

Laurie wrote:

“Note that in this interpretation there is no requirement that the data x have to be repeated. In other words the interpretation is neither Bayesian nor frequentist. The probability measure defined by Ho is a poor approximation to the datum x, that is x does not look like a typical X(Ho).”

I wonder why we should be interested in using probability models as approximations for data if we do not want to interpret these models as related to repetitions. If I use a probability model as a model for some aspect of the world, I imagine this to be repeatable and I imagine the probabilities as related to frequencies that I’d expect to observe under repetition.

Note that this does by no means entail that I believe the model to be true (nor to be true with some epistemic probability). It just means that I temporarily adopt a perspective according to which some data in the world are taken as a realisation of a process that is, in principle, repeatable, and can therefore serve, for example, for simulations or predictions, interpreting simulation or future outcomes as “repetitions” in the model-sense of what we have already seen.

Your insight and your approach for defining adequacy can be used to assess how “good” and helpful it may be to adopt such a perspective, and of course I need to be able to get off it again if at some point some “anomaly” appears or I realise that I am led astray by the model in a different way from what I tested for in the first place.

So in this sense I’d say that I interpret the models in a frequentist manner (although this may not be “conventionally frequentist” because I’m not claiming that there is any true model out their in the real world; it’s just a tool for the human brain). If you want to interpret it neither in a frequentist nor in a Bayesian manner, how can you make use of it?

Hi Christian: My view is that hypothetical repetitions are relevant insofar as they serve to capture the capability of the method for a probe. Methods and problems are of a general type, even if this one application is entirely unique,and that’s why “the probability experimental results are due to background variability alone” ( a question I raised on this post) has a clear meaning for an error statistician and doesn’t refer to posterior probabilities in the background or inHo. An example might be an RCT trial. Hypothetical repetitions are relevant insofar as variability (of a type) is relevant; I agree with you that if no such considerations are relevant, you’d not appeal to formal statistics.

Hi Christian,

My (perhaps unwanted) view – adequacy could equally we be defined for deterministic data and deterministic models by allowing a fixed (non-stochastic) ‘tolerance’ or ‘error’. This reflects that ‘all models are wrong’ hence shouldn’t be judged by exact predictions even in the deterministic case.

So: a model is adequate wrt given data if it predicts that data within given tolerance criteria.

This is obviously generalisable to the cases of (stochastic models + fixed data), (deterministic models + stochastic data) or (stochastic models + stochastic data), but doesn’t appear to me to require a frequentist or Bayesian interpretation at its heart.

BTW – I would most typically interpret the stochastic data component as arising from a frequentist-style measurement model and stochastic models as arising more from the (structuralist) bayesian ‘uncertain parameters hence uncertain predictions’ perspective. Both aspects can be incorporated into a hierarchical bayes approach, though as above I believe that one could also give completely different interpretations to both.

hristian, I owe you this I now I have to it.

Stochastic models are a powerful tool for modelling the unpredictable variability we experience in the world. Examples are the toss of a coin, the roll of a die, the weight of an animal and the value of a financial asset. The stochastic models are precise mathematical models with precise definitions including the word ‘random’. Because of this I get the impression that some people think they are only applicable to random events. I disagree. The toss of a coin can be convincingly modelled by deterministic Newtonian mechanics. Under this model the inability to predict the toss is due to the inability to measure the initial conditions with sufficient accuracy to be able to do the prediction although it may well be possible to improve the prediction to a certain extent. The fact that we can analyse the toss of a coin, an extremely complex operation in terms of Newtonian mechanics, very successfully using a simple binomial model is an illustration of the strength of stochastic models, and this in spite of the fact that we don’t know why. I know of no theorems which prove that chaotic systems ‘look random’ and would be very glad if someone could give references if there are some. Not only that in many cases we are dependent on this to analyse our stochastic models, I have in mind pseudo-random number generators. The very success of modelling situations using stochastic models in no way proves that what we are modelling is in fact random.

What is the importance of repeated samples? First of all take the repeated measurements in one sample. These will exhibit a degree of variability which looks random now in an everyday use of the word. Any model you use must also exhibit this variability. If they are not variable you would not be using a stochastic model. So the sample you have exhibits variability and if you only have one sample there are problems.

To be precise. For the fun of it I decided I would try to model the daily returns of the S+P 500 over about 80 years, in all about 23,000 observations. Further more I wanted a dynamic stationary model to annoy all those people who claim it is not stationary. I mention two problems: the degree of volatility clustering and the sojourn times at a particular volatility level. For the s+P 500 I have say 70 volatility clusters but what is the variability of this? Also the lengths of the clusters, the sojourn times. They vary from one to 1000 days (I forget the exact numbers and can’t be bothered to check but of this order). Any other such long time series would also have experience the 1930s, WWII, the post-war boom etc. and in this sense there only is one data set: any other such data set would exhibit similar numbers. How do I decide on the variability of the number of clusters and the variability of the sojourn times? If I had several different independent (colloquial) samples I would have an idea of the variability of the number of cluster that my model would have to provide.

If I do have several sample then I will have some idea of the variability of the variability. I can take this even further. There may be situations where this uniformity of variability holds across certain fields and not just across the measurements in my laboratory. In interpret some of Oliver’s contributions in this sense when he talks about invariance or covariance etc. In such a situation one would choose a family of models which reflecting the experience of past sample ans expectations for future one.

Are you still with me? Suppose the we are in the situation where our family of models has worked well in the past. You now get a new sample. What do you do? My attitude is that you take this one sample as it is and check your standard model for adequacy, and you do this every time. Although the procedure does not imply that the data are repeatable this does not preclude it from being used on repeated data. There remain differences in interpretation. A model is an approximation, there are no true parameter values, Bayesian priors are additive because they have true parameter values, and frequentist confidence intervals cover true values.

I have several times used the copper contamination data and based the analysis the Gaussian family of models. In particular I provisionally and speculatively identified the true quantity of water in the sample with mu, which in turn was based on the behaviour of the mean, the standard deviation, outliers and the Kuiper metric. This is a bad idea. If you take repeated sample the chances are at some point that your sample will have one or more outliers, the model does not fit and what now? An experienced analytical chemist informed us that the mean was no good, the median was better but the mean after eliminating the outliers was better still. We ended up with Hampel’s redescending psi-function but without having checked whether this was better than the mean after removing the outliers, not enough time.

One thing I find very strange in all this is the universal refusal to specify when a Gaussian model would be acceptable. It was the same on Andrew Gelman’s blog. Maybe everybody, with few exceptions, use the pointing test: it looks Gaussian so it is Gaussian.

Deborah, I need a concrete example. Here is my copper in drinking water data set (real).

2.16 2.21 2.15 2.05 2.06 2.04 1.90 2.03 2.06 2.02 2.06 1.92 2.08 2.05 1.88 1.99 2.01 1.86 1.70 1.88 1.99 1.93 2.20 2.02 1.92 2.13 2.13

The legal limit is 2.2. What do you do? I have already indicated what I would do. What is wrong with it?

Laurie: I think you’d like Neyman on using statistical models as “picturesque” descriptions that get their justification, when they do, by enabling a connection between a sampling distribution of a statistic and an imagined underlying probability distribution. “This is a very surprising and important empirical fact” he says (in essentially those words) and whenever we find a technique such that the relative frequencies approach the mathematical probability, as given in the law of large numbers, we are allowed to say the probabilistic model adequately represents the method of carrying out the experiment. I know it’s Neyman 1952 (Lectures and Conferences), no time to look up the exact quote.

There’s an intertwining of statistical and domain specific knowledge, and the link up between the statistical and the actual is given by din’t of statistical tests. I think it’s pretty clear that it’s the statistical connections we care about (between data and theory) and not that any particular process is literally Gaussian or whatever. You tend to talk of a process outside us, as if we come across it, when the skill seems to me to be creating the linkage by methods and questions. When it comes to inference and learning using statistics (which I would distinguish from a problem of wanting to directly statistically model a phenomenon), statistically modelled questions are typically opportunistic–to get the answer to a question (about the world) rather than directly model the world. Use different checks so that what’s learned isn’t injured much by the fact they’re all strictly false, or deliberately ensure the flaw will show up elsewhere.

On your particular data–I’ll send them on to Spanos, severe deadlines and travel loom for a few weeks.

Deborah, I don’t know what ‘hypothetical repetitions’ are and why you need them to ‘capture the capability of the method for a probe’and I have great difficulties with “the probability experimental results are due to background variability alone” which you claim has a clear meaning for error statisticians.

My approximation region will specify those parameter values which are consistent with the data in a well-defined sense. I gave you the copper data. Now the parameter values (mu,sigma)=(1.98,0.12) are in the approximation region. I now simulate a sample of size n=27 data under the normal model with these parameter values. Unfortunately I do not know how to generate random samples so I take recourse to some deterministic procedure which produces sample which look random but aren’t. For fun and because the data are easily available I am going to do this using the binary expansion of pi. The first 10 digits are 1 1 0 0 1 0 0 1 0 0 which I convert to a number between 0 and one by treating these digits as the binary expansion of a number between 0 and 1 namely u=1/2+1/4+0/8+….+0/2^10=0.785156250. I now take phi^-1 of this number easily done in R as X_1=qnorm(0.785156250)=0.78972652. Now I take the second set of 10 digits 0 0 1 1 1 1 1 1 0 1. This gives X_2=-0.68373800. I repeat the process to get X_1,…,X_27 which ‘look like’ but are not i.i.d. N(0,1). I now multiply by 0.12 and add 2.1 to get C_1,…,C_27 which look like but are not N(2.1,0.12^2). Finally I truncate them to the same precision as the original data. The resulting simulated data set is

2.07 1.89 2.03 1.98 1.95 1.76 1.91 1.87 1.94 1.87 1.78 1.74 2.08 2.02 1.85 1.98 1.99 2.01 2.07 1.63 1.77 2.22 1.94 2.06 2.01 1.99 1.81

I do this 999 times, include the real data to give 1000 data sets, hand you the lot and tell you that you can nominate 50 of the 1000 data sets and if the real data set is one of them you win a prize.

We wish to know whether the true amount of copper in the drinking water is within the legal limit of 2.2 (I am making up this value). We identify the true amount of copper with mu, provisionally and speculatively. The legal limit does not specify any sigma so what do we do? The answer is you can take any sigma you want and simulating data with 2.2 instead of 1.98 you will be able to pick out the real data set without any problems. Her are the simulated data for N(2.2,0.12^2)

2.29 2.11 2.25 2.20 2.17 1.98 2.13 2.09 2.16 2.09 2.00 1.96 2.30 2.24 2.07 2.20 2.21 2.23 2.29 1.85 1.99 2.44 2.16 2.28 2.23 2.21 2.03

Are my simulations under the model your ‘hypothetical repetitions’? To my way of thinking there is nothing hypothetical about them. I just do them (not strictly true because sometimes, as in this case, the relevant probabilities can be calculated or well approximated without simulations). Why do they, the hypothetical repetitions, ‘capture the capability of the method for a probe’? How would I probe the method? I would generate other data sets but not under the model, say with a t_10 distribution instead of a normal distribution and still see if I can pick up the 2.2. I would add the odd outlier and see how it performs then. Such outliers are common, see the abbey data in MASS. At the latest at this point I would realize that the procedure was not good and had to be replaced by something else. It seems to be that my probes are more exacting than yours because I probe in a neighbourhood of the model, not just under the model itself.

One last point for today, my approximation regions can be empty indicating that the model is not an adequate approximation to the data. Confidence regions are never empty because they must contain the true parameter with the specified probability. There seem to be a great deal of confusion here , see

http://andrewgelman.com/2016/01/08/why-inversion-of-hypothesis-tests-is-not-a-general-procedure-for-creating-uncertainty-intervals/

If the model holds then the approximation region will contain the true parameter values with the specified probability. If is doesn’t hold it may be very small and even empty. If one continues to regard it as a confidence region then the worse the model, the smaller the confidence region and hence the higher the precision. I tried to put forward my point of view with some help from Christian but utter failure.

Laurie: What is “the capability of the method”, be it a scale for measuring weight or a statistical argument, if not something about how it would behave when used. Cox calls this the fundamental principle of frequentist inference, or the like.

Knowing your capabilities, is knowing how you’re likely to behave/react under such and such conditions—isn’t it?

Laurie: Thanks for the explanation. The question whether and to what extent your approach could be called “frequentist” is to the extent unimportant to which the issue arises from different ways of using the term “frequentist” (which I tend to use in a quite general and generous manner, but I accept that one can use in in a more restricted way).

I’m pretty sure that you treat probability models as “generators of data” that potentially can generate repetitions, as you do in your simulations from the model. This is what I have the term “frequentist” refer to. I’d go further and say that if you’re choosing a model to approximate a dataset, this means that you treat the dataset, temporarily, *as if* it was generated by such a probabilistic generator (without committing to the belief that this is “really true” in some sense). You’re labelling generators as “adequate” if, according to your methodology, these generators typically could have generated data that “look like” the data in question. (Repetitions become *hypothetical* if instead of generating repetitions by simulation you use theory in order to find the probabilities such as p-values on which the adequacy assessment is based.)

As you say, nobody stops us from doing this with data that we believe were indeed generated in a deterministic manner. Using a probability model (an i.i.d. one, say) instead of a deterministic one means that for analysing the data we decide to ignore the specific conditions of a particular “run” and to treat them all as “essentially the same with only random variation”, i.e., frequentist repetitions. The proof of this pudding is in the eating, in doing whatever we want to do with the data, for example prediction. If we manage to do this at the quality we need using an i.i.d. model but we despair at the attempt of doing this using a deterministic model, the probability model has the edge.

Most of this is really not far away from Mayo’s approach, I think. Just the terminology is very different.

You declare a model as adequate if a combined test putting together p-values from several test statistics in an appropriate fashion does not reject it, and you want to design this combined test in a manner that it should have the “capability to probe”, i.e., find some relevant issues with the model, trying to put Mayo’s terminology on what you do. Am I wrong?

(Actually I haven’t missed that you wrote this: “How would I probe the method? I would generate other data sets but not under the model, say with a t_10 distribution instead of a normal distribution and still see if I can pick up the 2.2. I would add the odd outlier and see how it performs then.” This is a good and worthwhile extension but as long as we’re looking at a single model, I still think you’re not too far away from Mayo.)

Christian: I’m confident you will articulate and defend the gist of my point but in clearer terms, at least until I can read this carefully, swamped.

Christian, “I’m pretty sure that you treat probability models as “generators of data” that potentially can generate repetitions, as you do in your simulations from the model”. Well I am pretty sure that is not what I am doing, it’s not the way I think about it. I have my copper data set and if the measurements are repeated to give a second sample but for the same drinking water sample I would not be at all surprised if the second sample looked different from the first. What can happen with the second sample? The readings for the first sample were taken by an experienced and careful employee in the laboratory responsible for the analysis. The second was taken by a somewhat careless employee. The second sample contains a large outliers the reasons for which are not clear. Suppose we are not analysing copper but some chemical which evaporates in the course of time. Too much time had elapsed between the first and second sample and the readings were different. The measurements had been taken by two different machines one of which was badly calibrated. The original one sample of water was separated to give two samples to be analysed individually. In the course of the separation one sample was contaminated leading to different results. Is all this to be subsumed under background variability? Actual repetitions can be completely different from hypothetical repetitions based on a model for the first data set. Just look at the data sets given by Stephen Stigler. When you look at this sort of data I simply cannot understand why simulations under a model are regarded as potentially generating real repetitions of the data.

So let us take data sets with real repetitions, namely the five data sets each of size n=20 of Michelson’s determination of the speed of light, Table 6 of Stigler, and one data sets of size n=23, Table 7. The 0.95 approximation intervals for the speed of light using exactly the same method as for the copper data are (815,999), (801,909), NaN, (767,872), (783,892), (670,836). The NaN denotes an empty approximation region. Although the interval for data set 6 overlaps with the intervals for the other data sets the approximation region for data set 6 is disjoint from those for data sets 2, 4 and 5. This is because the values for sigma are disjoint.

Given this I do not in anyway treat the simulations as potentially generating real repetitions of the sample. Given the NaN for data set 3 what do I do? Many of the data sets in analytical chemistry contain outliers. Suppose your first data set has 2 outliers. You regard simulations under the model as potentially generating real repetitions of the data. How do you model the number and values of the outliers in order to take into account potential outliers in repetitions of future data sets? I know the Bayesians do this but do the frequentists? Do you? Given the virtual impossibility of this standards for analysing interlaboratory results essentially give up modelling the data and consider only functionals, M-location-scale functionals. I can give approximation regions for these as well. I identify the speed of light with the location part. If I do this for the Michelson data I get the following intervals: (855,977),(808,898),(834,881),(771,873),(802,865),(709,802), no NaNs. Once again for emphasis, there is no model, or rather the model is any P with largest atom less than 0.5, say 0.4. You also have the option of refusing to analyse, say if there are more than 30% outliers. There are many such functionals but as you have a large library of actual real repetitions you can choose one functional out of several that performs best over the library, much as Stigler did in his paper.

When I read discussions about P-values they seem to take place in a sanitized version of the world of statistics. There are well-defined models, there is no discussion about how well the model fits but the fit is undoubtedly very good because it is never problematized, the repetitions are well behaved, there is no bias, there are no outliers, there is no talk about alternative models, there is no mention of regularization, there is no mention of stability of analysis, for example the stability of P-values under perturbations. In short everything is as clean as it possibly can be. Peter Huber write ‘ Clearly, by 1988 a majority of the academic statistical community still was captivated by the Neyman-Pearson and de Finetti paradigms, and the situation had not changed much by 1997’. The number of references to Fisher, Neyman, Pearson and Bayes on this blog indicate to me that this also holds here.

There are indeed points of contact with Mayo. I also think that any satisfactory manner of doing statistics must contain a Popperian element. I am also in favour of severe testing and probes. However I also think that she has too little and also too much baggage, Deborah, I take it that you are reading this. Too little because I miss the severe testing of models; too much because I think all this

H0 does not say the observed results are due to chance. It is just H0:μ = 0. H0 entails the observed results are due to chance, but that is different. Under this reading there is no fallacy.

is unnecessary. Why not just say that the observed results are inconsistent with the model based on Ho? The observed data is

1 1 0 0 1 0 0 1 0 0 … 1 1 1 0 0 1 1 1 1 1

of size n=10000. The model is binomial(1,p) with Ho:p=0.4. Ho entails that the results are due to chance. But they are not. They are the first 10000 digits in the binary expansion of pi. Why speculate about what caused, here chance, the observed data when all you need to say is that the data are not consistent with Ho: p=0.4. For what it is worth they are consistent with H1: p=1/2. Oliver has pointed out that you can define adequacy for deterministic data and deterministic models. One such model is defined by Ho:mu=0. You would now be prepared to reject Ho on the basis of one data set without any talk of repetitions. You can do the same with stochastic models.

Deborah, you have no doubt sent the copper data to Spanos. Ask him to analyse the Michelson data to see how you and he would treat data with repetitions.

Laurie The fact that I’m recently (on this blog) looking at disputes and problems that are now being debated regarding statistics, and in the form they ARE being debated, doesn’t mean my philosophy of science endorses or is limited to these sterile formulations. It means only that I decided in 2004 (when I began working w/ Cox) that a philosopher of statistics ought to take note of the bizarre schizophrenia cropping up in Bayesian statistics. We had the 2010 conference in England and after dealing with the Birnbaum result (for the first time) I decided to work on this PhilStat book to take up the mix of issues in today’s refighting of old philstat wars, and a couple of new ones, within the “replication crisis”. I’ll soon go back to more general philosophy of science. I doubt I’ll be able to have any influence on saving statistics from throwing out the error statistical baby with the bad statistics bathwater. Error statistics is likely to require reinvention some time in the future.

by the way, the most controversial areas, e.g., in psych, think they get the model from randomly assigning “treatments”/”controls” to subjects in various artificial experiments. That’s how they’d answer you.

Laurie: I think you misunderstood me to some extent, which is manifest in your use of the term “real” before repetitions. Indeed I agree with you that real repetitions can be completely different and an adequate model for one dataset may not be adequate for something that is interpreted as “real repetition” (note that what constitutes a “real repetition” is always dependent on some interpretation because any “real repetition” is different in quite a number of aspects from what it is interpreted to be a repetition of). I’m *not* talking about “real” repetitions but rather about repetitions as an artificial construct of the mind to explain what the models mean.

I’m still left with the question of what it is good for to approximate data by a model if this is neither interpreted in a frequentist (or related, including “propensity”) nor in a Bayesian manner.

Would you agree that adequate models could be useful for prediction? Would you agree that adequate models could be used to “simulate” lots of plausible outcomes of the modelled system so that this can produce useful information about it (like in climate simulations)?

Even if you try to estimate something like the “true copper content” or the “true speed of light” that in reality is not defined in terms of a probability model, I wonder how the information that a certain probability model is consistent with the data that you obtained from either simulating repeated artificial datasets from the model or from theoretically analysing what the expected outcome of an “ideal” such simulation would be is useful for knowing something about such real physical quantities, unless you say something like “if the model is adequate, it does make some sense to think about reality as specified by the model at least in a certain aspect (the location parameter, say)”.

(I do actually realise that “repetitions” didn’t feature in the latter sentence; I could explain what the last sentence means to me using the concept of “repetitions”, but can you do without?)

Christian, yes I did misinterpret you. Here is a long excerpt from my book, Chapter 2.15 entitled “Procedures, approximation and vagueness”.

In statistics a procedure is a set or a sequence of actions performed on a data set which together represent the analysis or part of the analysis of the data. The procedure can include exploratory data analytic tools such as visual representations as well as more formal statistical methods such as the calculation of an approximation region. It may also contain more than one analysis of the data: it may use two or more probability models, for example the gamma and log-normal distribution; it may specify possible outliers and perform the analysis both with and without the outliers. …. the statistical procedure may be part of a larger procedure that specifies not only the statistical analysis of the data but the manner in which the measurements are to be taken and the data collected. An example of such a larger procedure is \cite{DIN38402} which covers all aspects of the conduct of interlaboratory tests for water, waste water and sludge.

The word `procedure’ suggests a plurality of situations in which the procedure can be applied. This does not prevent the same definition of approximation being used in the case that more than one data set has to be analysed, but any such definition will have to take the plurality of applications into account. Here is Tukey (\cite{TUK93B}) on procedures:

\begin{quotation}

Since the formal consequences are consequences of the truth of the model, once we have ceased to give a model’s truth a special role, we cannot allow it to “prescribe” a procedure. What we really need to do is to choose a procedure, something we can be helped in by a knowledge of the behavior of alternative procedures when various models (= various challenges) apply, most helpfully models that are (a) bland (see below) or otherwise relatively trustworthy, (b) reasonable in themselves, and (c) compatible with the data.

\end{quotation}

The definitions of approximation used in a procedure approximation will be guided by the available data sets and possible future data sets of a similar nature. Thought will have to be given as to the purpose of the analysis, to setting priorities regarding the properties of the data sets and to including checks, safeguards and warnings for data sets not conforming to the usual pattern.

Mathematical statistics is a branch of applied mathematics and has its own theorems. These are often used when a procedure is constructed, the central limit theorem being one example. Theorems have precise assumptions and, assuming a theorem to be correct, precise conclusions. Sometimes the precise assumptions and conclusions cannot in principle be translated into precise statements for the formulation of a procedure. Asymptotic assumptions and conclusions are of this form. In many cases the assumptions are simply not susceptible to verification, for example, the assumption that the data are i.i.d. normal. The instructions for the use of a procedure cannot have this precision. They must of necessity be vague and not even the vagueness can be made precise. To misquote \cite{ADAM79} we cannot have `rigidly defined areas of vagueness and uncertainty’ rigidly defined areas of doubt and uncertainty.

See also Chapter 5.6 entitled “An attempt at an automatic procedure”. Does this help?

One final point that I have made in previous contributions. If one takes the water sample then I accept that there is a true quantity of copper in the water. I do not accept that there is a true parameter value for mu. We provisionally and speculatively identify the true amount of copper with mu. In the light of further samples it may turn out that a procedure based on the normal model and this identification is a poor one, and in fact it does turn out to be the case. That is why the identification is provisional. Any identification is speculative. The adequacy of a model depends only on the data, not on the identification. The identification of the parameters of the model with some numerical aspect of reality relates the model to reality. Thus the normal model may be adequte but you may do better to relate the true amount of copper to the median rather than the mean.

Christian, yes I did misinterpret you. Here is a long excerpt from my book, Chapter 2.15 entitled “Procedures, approximation and vagueness”.

In statistics a procedure is a set or a sequence of actions performed on a data set which together represent the analysis or part of the analysis of the data. The procedure can include exploratory data analytic tools such as visual representations as well as more formal statistical methods such as the calculation of an approximation region. It may also contain more than one analysis of the data: it may use two or more probability models, for example the gamma and log-normal distribution; it may specify possible outliers and perform the analysis both with and without the outliers. …. the statistical procedure may be part of a larger procedure that specifies not only the statistical analysis of the data but the manner in which the

measurements are to be taken and the data collected. An example of such a larger procedure is \cite{DIN38402} which covers all aspects of the conduct of interlaboratory tests for water, waste water and sludge.

The word `procedure’ suggests a plurality of situations in which the procedure can be applied. This does not prevent the same definition of approximation being used in the case that more than one data set has to be analysed, but any such definition will have to take the plurality of applications into account. Here is Tukey (\cite{TUK93B}) on procedures:

\begin{quotation}

Since the formal consequences are consequences of the truth of the model, once we have ceased to give a model’s truth a special role, we cannot allow it to “prescribe” a procedure. What we really need to do is to choose a procedure, something we can be helped in by a knowledge of the behavior of alternative procedures when various models (= various challenges) apply, most helpfully models that are (a) bland (see below) or otherwise relatively trustworthy, (b) reasonable in themselves, and (c) compatible with the data.

\end{quotation}

The definitions of approximation used in a procedure approximation will be guided by the available data sets

and possible future data sets of a similar nature. Thought will have to be given as to the purpose of the analysis, to setting priorities regarding the properties of the data sets and to including checks,

safeguards and warnings for data sets not conforming to the usual pattern.

Mathematical statistics is a branch of applied mathematics and has its own theorems. These are often used when a procedure is constructed, the central limit theorem being one example. Theorems have precise assumptions and, assuming a theorem to be correct, precise conclusions. Sometimes the precise assumptions and conclusions cannot in principle be translated into precise statements for the formulation of a procedure. Asymptotic assumptions and conclusions are of this form. In many cases the assumptions are simply not susceptible to verification, for example, the assumption that the data are i.i.d. normal. The instructions for the use of a procedure cannot have this precision. They must of necessity be vague and not even the vagueness

can be made precise. To misquote \cite{ADAM79} we cannot have `rigidly defined areas of vagueness and uncertainty’ rigidly defined areas of doubt and uncertainty.

See also Chapter 5.6 entitled “An attempt at an automatic procedure”. Does this help?

One final point that I have made in previous contributions. If one takes the water sample then I accept that there is a true quantity of copper in the water. I do not accept that there is a true parameter value for mu. We provisionally and speculatively identify the true amount of copper with mu. In the light of further samples it may turn out that a procedure based on the normal model and this identification is a poor one, and in fact it does turn out to be the case. That is why the identification is provisional. Any identification is speculative. The adequacy of a model depends only on the data, not on the identification. The identification of the parameters of the model with some numerical aspect of reality relates the model to reality. Thus the normal model may be adequte but you may do better to relate the true amount of copper to the median rather than the mean.

Laurie: This is interesting and I like it, but I feel that it doesn’t fully address the questions asked earlier.

You are very detailed and thoughtful about the limits of probability modelling but as long as you take part in this enterprise at all, I think there should be more of a positive justification and motivation for it.

I can only find this implicitly. “The identification of the parameters of the model with some numerical aspect of reality relates the model to reality.” But if you do this, you rely on identifying the whole model with reality, or don’t you? How can you interpret a model parameter in a meaningful way without giving such an interpretation to the model as a whole, within which the parameter is defined?

Now again there is grave danger of misinterpretation, so I add the disclaimer I have added before; “identifying the model with reality” to me does not mean to believe that reality really behaves as the model says, it rather mean thinking about reality in terms of the model, in a temporary fashion, being aware that apart from the benefits of thinking about the reality “identifying” it with the model it is also important to keep up awareness for and analyse what makes reality different from this model (including figuring out what other models approximate reality as well).

In any case there should be, I think, a description of how the model is to be understood when using it for “identifying it with reality”, or rather as a way of thinking about and analysing reality.

The frequentists have such a description, various varieties of Bayesians have them (although there is a tendency in much Bayesian work to ignore them), there’s the propensity interpretation of probability (which at least in certain versions – e.g., Donald Gillies – is close to what I’d call frequentist), and I wonder what yours is. My impression was always that it is at least close if not exactly frequentist; I could imagine that Gillies’s interpretation fits very well what I’d think I got implicitly from your work. But anyway, I think you should be more explicit about this yourself.

(Note that I have currently given your book to a student so I can’t check there.)

Deborah, your anti-Birnbaum argument accepts the Birnbaum assumptions on the likelihood, f(x,theta)=cg(y,theta) for all theta for some fixed number c. I see no point in including values of theta which are inconsistent with the data x and the same for y. If you restrict f(x,theta) to the theta consistent with x and g(y,theta) to those consistent with y does this alter anything in your or Birnbaum’s argument or in the likelihood principle ? I suspect it does which is why I find the arguments irrelevant. Or am I missing something here? Is there some convincing argument for retaining theta values which are inconsistent with the data?

Laurie: I don’t understand the context of retaining theta values. The LP is about evidential equivalence.

Christian, I fail to understand what you want. Give me the standard frequentist analysis of the copper data and its interpretation and what this contains that mine doesn’t.

Laurie: I have no issues with your analysis at all. The question I want to get at is about the interpretation of probability models. If a certain model (or a set of models) is showed adequate for a certain dataset according to your analyses, what do you think does this tell us about reality? Is it an end in itself? Is it only the parameter estimation you’re after and what else is codified in a model is not of interest (or more generally, are you mainly interested in “procedures” and in models only as far as they can give rise to and test procedures)? I’m asking this because you distance yourself from both frequentist and Bayesian interpretations of probability, but as you use probabilities for modelling reality, I’d think that you may have your own interpretation. (Or is it as I personally think that you may be closer to frequentist/propensity thinking about probabilities than you’d directly want to say?)

I am curious as to whether Laurie thinks in terms of a reference class that is real or “real in the relevant aspects” when conducting statistical analyses. Identifying the reference class forces one to confront how the statistical model relates to the realities of ultimate interest, no?

Christian, I still do not understand what you want. Let me try again.

At the scientific level probability is for me a mathemtical term in probability theory as are such terms as sigma-additivity, random, independence and conditional expectations. I think this is also Andrew Gelman’s attitude; at least I read something by him which I interpreted in this manner. Probability theory has different interpretations. The Bayesian one seems to me to be one such consistent interpretation. I am not so sure about the frequentist interpretation in terms of relative frequencies by which I mean some sort of von Mises approach. The reason for my uncertainty is that I am not sufficiently acquainted with the state of the art in this area.

I explicitly state that a model is an approximation to the data and not to some underlying real data generating mechanism. The reason for me is very simple. The underlying data generating mechanisms are so complicated, even for the simple coin toss (I gave one reference to a paper on the Newtonian mechanics of the coin toss), that it is in my opinion simply illusory. Stochastic models are much too simple independently of their interpretation, Bayesian or frequentist.

Let us take the simple coin toss done mechanically. The standard model is b(n,p). The parameter p is related to the real world through the relative frequency. The relative frequency is at the level of the data. There is no attempt either by the Bayesians or frequentists to go to reality at the level of Newtonian mechanics. You write ‘ … what do you think does this tell us about reality?’. I am at a loss. At what level am I suppose to tell you something about reality? What are the frequentists and Bayesians telling you about reality? I suppose they would answer that they are telling me something about the probability of heads for this coin. But now they use probability in a different sense, propensity perhaps. So what is the propensity for heads when a coin is tossed and the toss can be described by deterministic mechanics? Given the initial conditions the propensity is either 1 or 0. Suppose I model the binary expansion of pi by b(1,0.5). What is the propensity of a given digit? Does it make sense to do this? Well if I choose some excerpt you won’t know. Even if I told you, I can use the model to give limits for the relative frequency for any subset without knowing this in advance. The predictions can be checked against the actual frequencies.

The word ‘random’ seems to give the greatest difficulty. It is a well-defined concept in probability theory but it is difficult to relate it to the reality. In the 9. Greenland et al paper one reads

‘Many problems arise because this statistical model often incorporates unrealistic or at best unjustified assumptions. Thus is true even for so called “non-parametric” methods, which (like other methods) depend on assumptions of random sampling or randomization. These assumptions are often deceptively simple to write down mathematically, yet in practice are difficult to satisfy and verify, …’

In the above quotation the word ‘random’ occurs as a technical term in probability theory (often deceptively simple to write down) as well as a property of the real world (in practice difficult to satisfy and verify). In fact the the conditions are impossible to satisfy and verify. They are impossible to satisfy because no one knows how to generated random numbers and impossible to verify because you cannot

distinguish such numbers from those produced by a random number generator or the binary digits of pi. The reason for the success of statistical models is that many things in this world can be well

described by such models even though we have no way of knowing if the resulting data is in fact random. But we have been through this before. One can apply stochastic models to data and situations which ‘look random’ whereby this must of necessity be vague. This means that the concept random in statistical models cannot be shown to correspond to supposed real property of the world which is also called random. All we can say is that reality can often be successfully described by mathematical models involving the technical term random.

Other components of the model can often be directly related, for example identifying mu with the actual copper content of the water. This identification can also be tested to see if it is successful, in general not. The parameter sigma which measures the variability of the data will relate to something the real world

without directly being able to say what, at least not without further investigation. It may depend on the

matrix of chemicals in the sample, the variability of the apparatus used and the care with which the

measurement were made. An interlaboratory test is after all a test of the quality of the laboratories involved.

I suspect you will not be satisfied with this but I cannot think of anything else to write.

Laurie: Well thanks for trying. I agree pretty much with all of this (and was aware of most of it) and it is pretty close to my question, but not exactly on it, as you expected. However it is probably as good as it gets, and where it isn’t enough for my understanding of your approach and “philosophy”, I will have to fill the gap on my own, which I actually have done up to now but here I tried to check this with you.

Just to give you one more hint at what this was about, there are several Bayesian interpretations, not just one. It may be that you mean subjective Bayes according to de Finetti what you describe as “consistent interpretation”. But according to de Finetti, probabilities model some state of mind, not the world, not the data. I think that this is incompatible with your approach. Other interpretations such as Gillies-style propensities and probably also frequentism I think are more compatible, because they take probabilities as modelling mechanisms that generate data (there needs to be some “generous creativity” reading von Mises to interpret him that way but anyway). I think if you say that “reality can often successfully be described by mathematical models”, what is required is more than mathematics alone, what is required is a rule to “map” the mathematical objects on items of perceived reality (not implying any kind of belief that reality “really” is like this) in order to understand and define what it means to say that reality is “described” by a model, and this is what the probability interpretations deliver. So in order to make such a statement, you need one, too. Or people who want to make sense of what you say need one, which is fine by me because I can find one that does the trick for me (although this may then deviate from what you have in mind).

Christian – on my naive reading it seems like the propensity view of probability is almost the exact opposite of eg a deterministic chaos interpretation. How do you define propensity theory and how do you see it connecting to stochastic models approximating data?

I’m probably wrong (I’m not really that familiar with the details of propensities) but doesn’t propensity theory imply that there is some ‘true’ random property of objects rather than, say, it arising from epistemological limitations/properties of approximate models?

Deborah, if you used the word ‘true’ instead of adequate then this particular objection would not apply. But you use the word ‘adequate’ without the slightest hint about what you mean by this. For simplicity let us take the Gaussian case. The adequacy of (mu,sigma) is tested by using several different tests to check various properties of say N(mu,sigma) random variables. It can happen that only a few survive (see the link I gave to Andrew Gelman’s blog). The tests make use of various test statistics which are not restricted to the mean and standard deviation. The results for the two samples x and y can be very different not only in their values but in their sizes. Whatever, the decision is made to accept Gaussian models in both cases. At this point the ladder used to arrive at the Gaussian models is thrown away, the statistics used to decide on the Gaussian model are now completely removed and the whole of the analysis is now based on the mean and standard deviation. Seems very weird to me. As far as I can see it has nothing to do with Error Statistics which represent an approach to statistics with which I agree in principle if not in some of the detail.

John, I hope I understand you correctly. I do not think of a reference class but I do think of the nature of the data to be analysed. I give some examples.

The data consist of photon counts as a function of the angle of diffraction. The noise is Poisson for high counts but not for low counts due to electrical interference. Of interest are the locations and widths of the peaks. These are to measured from a slowly varying base line to complicate matters. Finally the resulting peaks are to expressed as a mixture of say Gaussian kernels. This is the information you have from the physicist. I cannot say when doing this that I have a reference class in mind, only the information and requirements of the person providing the data.

Residual-based localization and quantification of peaks in X-ray diffractograms

http://projecteuclid.org/euclid.aoas/1223908044

Local extremes, runs, strings and multiresolution (with discussion)

http://projecteuclid.org/euclid.aos/996986501

Sometimes I do not like the standard way of doing things. Take for example the simple two-way table. Under the usual parametrization the minimum number of interactions is four due to the requirement of column and row sums being zero. I have never understood this. Why should nature not be allowed to have an interaction in a single cell, or three interactions? One way of doing this is to minimize the number of interactions.

Interactions in the Analysis of Variance

http://www.tandfonline.com/doi/abs/10.1080/01621459.2012.726895

In this case I had no direct data I wished to analyse but of course there are examples in the paper.

Another example is to quantify the number of volatility clusters in financial time series. there is no canonical manner of doing this but the clustering of volatility is a real phenomenon. The following paper offered a way of doing this.

Recursive estimation of piecewise constant volatilities

https://www.researchgate.net/publication/242801405_Recursive_estimation_of_piecewise_constant_volatilities

For later work in this direction with further examples see

Frick, K., Munk, A., Sieling, H. (2014).

Multiscale Change-Point Inference With discussion and rejoinder by the authors. Journ. Royal Statist. Society, Ser. B, , 76, 495-580. arXiv:1301.7212 long version with full proofs.

So no I don’t think of reference classes just of the peculiarities of the data to be analysed.

Laurie, even if you are not explicitly considering the reference class, I am certain the physicist is. Your result will be interpreted in light of his/her understanding of how they relate to future experiments and/or the perception of some reality of a natural process, i e. a reference class. Do you think so?

Christian, all I wrote was that de Finetti for example has a consistent interpretation of Kolmogorov’s formalized probability theory. This makes no reference to the world, just his beliefs, so what, it is still consistent. One can argue about finite versus sigma-additivity but apart from his theory of beliefs can be described by a Kolmogorov system. And so can betting odds. What I am not sure about is whether a von Mises approach to probability via frequencies can also be described by a Kolmogorov model.

It is a long time since I bought Gillies but I have never thought in terms of propensities and feel no need to. As Oliver commented it also seems to me that there is some idea of randomness behind the word propensity.

You write

Gillies-style propensities and probably also frequentism I think are more compatible, because they take probabilities as modelling mechanisms that generate data

This is I think why Oliver suspects some form of randomness behind propensities. This is also the impression I have. Gillies never mentioned chaos or deterministic systems.Is there a propensity interpretation of chaos?

You also write

what is required is a rule to “map” the mathematical objects on items of perceived reality (not implying any kind of belief that reality “really” is like this) in order to understand and define what it means to say that reality is “described” by a model, and this is what the probability interpretations deliver.

but I never write that reality is “described” by a model, only that the model approximated the data in a well-defined sense. To be of any use I must map parts of this model to parts of reality. In the case of the copper data and the Gaussian model I temporarily and speculatively identify the true amount of copper in the water with mu. More specifically I can within the model use the best estimate of mu based on a data set generated under N(mu,sigma^2), this case the mean of the data, and use this via the identification mu with the true amount of copper to give my best estimate of the true amount of copper. Note that this is a precise mapping as the true amount of copper is known. As I now the true amount of copper to high accuracy, these are prepared sample, I can compare this with the mean of the data. I can also given an approximation interval for mu based on the data and interpret this as a range of reasonable values for the true amount of copper. I can then check whether the true amount lies in this interval. As far as I can see this is a very direct mapping of one aspect of the model to reality. It yields concrete numbers which can be compared. I also pointed out that sigma can also be given an interpretation but this is not as immediate. It will however be taken into account as a measure of the care with which the readings were made. Interlaboratory tests are a form of quality control. But I did this already and you were not satisfied. I was satisfied, I do not have the feeling I need anything else so I was and still am at a loss to know what this something else is.

A precise mapping theoretical concepts such as randomness, independence and i.i.d. onto reality is not possible. de Finetti has it better. He uses interchangeability but he gets this by introspection. All you can do is to say ‘this data set looks random’ or ‘this data set cannot be compressed’ or maybe even ‘this data set looks chaotic’ and I see no possibility of getting beyond such vague descriptions of the data. You write

Two side remarks:

Here is a link to a conference entitled ‘Randomness in Quantum Physics and Beyond’

http://qrandom.icfo.eu/

The Salem-Zygmund paper does actually give a concrete example of a chaotic system which looks random in that independence and a central limit theorem hold asymptotically. The only non-zero sine and cosine terms are those with frequencies n_j where n_{j+1}/n_j >q>1. Given two starting points x and y with |x-y|25 say, that is we have chaos, extreme sensitivity to the initial conditions.

Laurie:

“Gillies never mentioned chaos or deterministic systems.Is there a propensity interpretation of chaos?” –

This is an interesting question to which I don’t know the answer. It makes me aware that when I say that a Gillies-type propensity interpretation may be compatible with your approach, I actually think of it in a way quite different of how Gillies himself thinks about this. For me it is a way to think about reality and to assign a non-mathematical meaning to the mathematical model, but Gillies is concerned with reality itself rather than just a certain human view of it that may be taken acknowledging that reality is essentially different, so he would probably object against using probability models for phenomena that are usually seen as non-random, with which I don’t have an issue.

“but I never write that reality is “described” by a model” –

In the posting before you did.

“A precise mapping theoretical concepts such as randomness, independence and i.i.d. onto reality is not possible.” –

That’s probably the key issue. But I think that what is required is not a mapping onto reality, which may indeed be impossible, rather a mapping onto *ideas about reality* that define a certain point of view which can be temporarily adapted for, e.g., making predictions on this basis, keeping an awareness for the gap between this point of view and reality itself. This would be an explanation of “if I think about how the data came about in terms of the approximating model, what does this mean and imply?”

John, I never asked him and he never asked me, we never talked about it. I can see that in certain circumstances a reference class is important, for example getting data about a well-defined real population, say the incidence of bovine tuberculosis in badgers in a certain area. Then one would have to give a lot of thought as how to get and analyse the data. It would not help me for determining the copper content of drinking water. What would be your reference class in this case?