An inference need not be well warranted. By book explains these terms from scratch, and I review them on my blog. An inference is a claim arrived at & affirmed on the basis of others. The entire procedure is inferring. (SIST p. 65)

— ♕Deborah G. Mayo♕ (@learnfromerror) December 12, 2018

]]>

"Not be further from the truth"? Read Neyman Synthese 1977 again, his final word on his own theory which is about the public, objective side of statistics: behavior. Subjective elements are kept separate as inputs. The word "inference" never appears.

— Sander Greenland (@Lester_Domes) December 10, 2018

]]>

1. Claim is mu > 150.5 in light of xbar = 152

Hence compute prob disagree under not mu = 1-agree as

1- max P(x > 152 ; mu less than or equal 150.5, sigma = 1)

=

0.933

2. Don’t have book on me right now, will do later.

3. As in 1 but now compute

1- max P(x > 150.2 ; mu less than or equal 150.5, sigma = 0.1)

=

0.17

4. Compute 1- max P (x less than or equal 151.5; mu >= 152)

=

0.69

But I know Corey knows these computations. I spoze he just tried doing them standing up while doing something else because they are so easy. If you send me a revised quiz, I will happily remove your first attempt.

Attempting a quiz or other assignment earns reward cruise pts. 7 pts for a prize.

]]>For the normal model 2SE is always SEV = 0.98; that’s what I had for Q1 before I noticed that I used the wrong mean.

Curious that you failed to criticizing me for writing that the worst case severity is the highest probability instead of the lowest. I originally started writing my answers using (S-2)* instead of (S-2) and I missed that correction too.

]]>1. SEV(μ > 150.5) = Pr(X_bar ≤ 152; μ = 150) = 0.93

MAYO: Correct numerical answer, but it should be:

SEV(μ > 150.5) = Pr(X_bar ≤ 152; μ = 150.5) = 0.93. Half off.

2. nah

3. In Q1 the standard error was 1; now the standard error is 0.1 and the 2SE rejection threshold is 150.2 and SEV(μ > 150.5) = 1 – tiny ε

MAYO: Incorrect. I should have added “show your work”,then you’d spot errors. However, it’s possible you read the question assuming the observed mean M0 = 152, whereas it was to be the 2SE cut-off. I added a sentence.

4. Back to SE = 1. SEV(μ ≤ 152) is the worst case (i.e., highest) probability, under μ > 152, of getting a test statistic that accords less with “μ ≤ 152” than does the observed test statistic x_bar = 151.5. So,

SEV(μ ≤ 152) = Pr(X_bar > 151.5; μ = 152) = 0.69

MAYO: Yes.

5. I’m not there yet. Let’s see… probably something to do with how severity clears up confusions and makes equivocations that paper over disagreements between various tribes unnecessary.

MAYO: Not even close.

MAYO: So it appears that you got 1 1/2 out of 5. You may wish to resubmit.

—

]]>SEV(μ > 150.5) = Pr(X_bar ≤ 152; μ = 150) = 0.93

2. nah

3. In Q1 the standard error was 1; now the standard error is 0.1 and the 2SE rejection threshold is 150.2 and SEV(μ > 150.5) = 1 – tiny ε

4. Back to SE = 1. SEV(μ ≤ 152) is the worst case (i.e., highest) probability, under μ > 152, of getting a test statistic that accords less with “μ ≤ 152” than does the observed test statistic x_bar = 151.5. So,

SEV(μ ≤ 152) = Pr(X_bar > 151.5; μ = 152) = 0.69

5. I’m not there yet. Let’s see… probably something to do with how severity clears up confusions and makes equivocations that paper over disagreements between various tribes unnecessary.

—

So I have a question — a genuine question, mind, not a trick question or a rhetorical point-making one — regarding the application of severity reasoning in discrete settings. Suppose I have a measurement device that answers some binary yes/no question. The device is known to give correct answers 7 times in 10 and to give answers selected uniformly at random 3 times in 10.

The severity with which H passes a test with some test statistic, we are told, is the (worst-case, but that’s not relevant here) probability of getting a test statistic that accords less well with H than the one observed, supposing H to be false.

I have yet to see how to measure accordance in the discrete setting (perhaps it’s described in sections of the book I haven’t read yet). The question is: does accordance encompass being correct by happy chance in the 3 times in 10 that the measurement device gives a random answer? If so, severity is the proportion of times the answer is correct, 17 times in 20. But I suspect not, so I guess my real question is: is severity just the proportion of times the measurement device operates correctly?

]]>Statistics deals with observations (data) in aggregate. In general, the same aggregate can arise in many ways – if you are only given the final result of an aggregation procedure, you don’t necessarily know the story about how it came about.

In other words, an aggregation procedure is a non-1-1 map between some underlying subject of study to a smaller set of numbers. Alternatively, the underlying target of interest cannot be identified with a functional of the observed data, unless further info is provided.

Laurie’s EDA is an attempt to ‘disaggregate’ the data, or at least consider the extent to which the data could have come from particular aggregation procedures.

Pearl’s DAGs are formal descriptions of how the aggregates came about. Eg given observations on (A,B), an arrow A->B indicates that the procedure first deals with A, then B, even though this ‘ordering’ info is lost from just having (A,B) pairs.

Personally, I think some DAG ideas can be useful even for EDA – eg you can view it as guiding a ‘disaggregation’ procedure to get at the ‘underlying’ phenomenon of interest.

I wonder

– Can common EDA procedures be described easily in DAG terms, without necessarily saying anything about causes other than talking about ‘aggregation’ processes? Tukey commented positively on Wright’s Path Diagrams at one point

– To what extent does the EDA/aggregation idea capture counterfactual concepts? Does EDA accept that observed data is not sufficient? Does EDA only rely on informal intuitions for analysis or is a formalised ‘causal’ language helpful?

– How would an EDA analysis of (mud,rain) data go? Where would the info that ‘mud does not cause rain’ come in?

]]>My own interests would be to say something about the so-called “new paradigm of data driven science” thought to arise in data science, AI and ML. I was drawn into that area in a session last summer on “philosophy of science and the new paradigm of data-driven science” at Columbia. There are a whole bunch of issues there that cry out for philosophical scrutiny. I nearly stopped writing the book to attend to them, but realized it was already long.

I’m also very interested in learning the latest about statistics in high energy particle (HEP) physics and in experimental relativity. Your field, and the example you give, undoubtedly point to many other issues.

Back to sensitivity, I don’t say it’s a good thing. High sample size, for example, can lead to picking up on trivial discrepancies. Of course what counts as trivial varies by the field. We don’t want to pick up on trivial flaws and approximations. No difference is too small for HEP physics.

I’m mainly interested in how we can attain decent severity outside of formal statistics by building a repertoire of errors and triangulating.

]]>