O & M Conference (upcoming) and a bit more on triggering from a participant…..

copy-cropped-ampersand-logo-blog1I notice that one of the contributed speakers, Koray Karaca*, at the upcoming Ontology and Methodology Conference at Virginia Tech (May 4-5) focuses his paper on triggering!  I entirely agree with the emphasis on the need to distinguish different questions at multiple stages of an inquiry or research endeavor from the design, collection and modeling of data to a series of hypotheses, questions, problems, and threats of error.  I do note a couple of queries below that I hope will be discussed at some point. Here’s part of his abstract…which may be found on the just created O & M Conference Blog (link is also at the O&M page on this blog). Recent posts on the Higgs data analysis are herehere, and here  Kent Staley had a recent post on the Higgs as well. (For earlier Higgs discussions search this blog.)

Koray Karaca
The method of robustness analysis and the problem of data-selection at the ATLAS experiment

In the first part, I characterize and distinguish between two problems of “methodological justification” that arise in the context of scientific experimentation. What I shall call the “problem of validation” concerns the accuracy and reliability of experimental procedures through which a particular set of experimental data is first acquired and later transformed into an experimental result. Therefore, the problem of validation can be phrased as follows: how to justify that a particular set of data as well as the procedures that transform it into an experimental result are accurate and reliable, so that the experimental result obtained at the end of the experiment can be taken as valid.  On the other hand, what I shall call the “problem of exploration” is concerned with the methodological question of whether an experiment is able, either or both, (1) to provide a genuine test of the conclusions of a scientific theory or hypothesis if the theory in question has not been previously (experimentally) tested, or to provide a novel test if the theory or hypothesis in question has already been tested, and (2) to discover completely novel phenomena; i.e., phenomena which have not been predicted by present theories and detected in previous theories. Even though the problem of validation and the ways it is dealt with in scientific practice has been thoroughly discussed in the literature of scientific experimentation, the significance of the problem of exploration has not yet been fully appreciated. In this work, I shall address this problem and examine the way it is handled in the present-day high collision-rate particle physics experiments. To this end, I shall consider the ATLAS experiment, which is one of the Large Hadron Collider (LHC) experiments currently running at CERN. …What are called “interesting events” are those collision events that are taken to serve to test the as-yet-untested predictions of the Standard Model of particle physics (SM) and its possible extensions, as well as to discover completely novel phenomena not predicted before by any theories or theoretical models.

To read the rest of the abstract, go to our just-made-public O & M conference blog.

First let me say that I’m delighted this case will be discussed at the O&M conference, and look forward to doing so. Here are a couple of reflections from the abstract, partly on terminology. First, I find it interesting that he places “tiggering” (what I alluded to in my last post as a behavioristic, pre-data, task) under “exploratory”. He may be focussed more on what occurs (in relation to this one episode anyhow) when data are later used to check for indications of anomalies for the Standard Model Higgs–having been “parked” for later analysis.  I thought the exploratory stage is usually a stage of informal or semi-formal data analysis to find interesting patterns and potential ingredients (variables, functions) for models, model building, and possible theory development.  When Strassler heard there would be “parked data” for probing anomalies, I take it his theories kicked in to program those exotic indicators. Second, it seems to me that philosophers of science and “confirmation theorists” of various sorts, have focussed on when “data,” all neat and tidied up, count as supporting, confirming, falsifying hypotheses and theories.  I wouldn’t have thought the problem of data collection, modeling or justifying data was “thoroughly discussed”–It absolutely should be– just that it seems all-too-rare. I may be wrong (I’d be glad to see references).

*Koray is a postdoctoral research fellow at the University of Wuppertal, and he knows I’m mentioning him here.

Categories: experiment & modeling

Post navigation

7 thoughts on “O & M Conference (upcoming) and a bit more on triggering from a participant…..

  1. Koray wrote to say that he’s traveling, and would comment later on. In the mean time, it got me thinking about the use of “exploratory” in statistics and philosophy of science. I’ll post some thoughts.

    • Koray Karaca

      Thanks for this post, Deborah.

      As you also pointed out, in philosophical literature, it is typically assumed that experimental data are already available to experimenters in canonical forms and ready for analysis. As a result, various experimental procedures used by experimenters for data-acquisition are disregarded. In this work, I aim to characterize the methodology of experimental procedures used to select and acquire experimental data at the ATLAS experiment. The unprecedentedly high collision rate is a distinctive feature of the Large Hadron Collider (LHC) experiments—namely, ATLAS and CMS—currently running at CERN. However, this presents a novel challenge to experimenters; due to the technical limitations both in terms of data storage rate and capacity, out of a million of collision events only a few can be selected according to certain pre-determined criteria for further evaluation, and the rest of collision events are irretrievably lost. So, the data-acquisition process has to be highly selective, and this necessitates that what are called “interesting events” by experimenters are selected from among the collisions events (occurring inside the LHC) during the first level, called “first level trigger”, of data-acquisition process—which, in addition, contains two more levels of triggering (of increasing complexity and delicacy) that serve to further refine the selections made at the first level trigger. Interesting events are those collision-events that are taken to serve to test the as-yet-untested predictions (mainly the prediction of the Higgs boson) of the Standard Model (SM) of particle physics and of its possible extensions (various theories and models beyond SM), as well as to discover completely novel phenomena that have not been predicted before by any theories or theoretical models, nor detected by any previous experiments. In the ATLAS experiment, at the first level of data-acquisition process, the goal is to select out interesting events swamped in the background of the well-known events (from previous experiments) of the SM, which are irrelevant to the above-mentioned objectives of the ATLAS experiment. As I shall argue, the first level triggering is by no means random, but instead is carried out through a particular method of robustness that serves to justify the appropriateness (with respect to the above mentioned objectives of the ATLAS experiment) of the data-selection criteria adopted at the first level of triggering, rather than the accuracy and reliability of experimental data or experimental results. In this sense, the first level of data-acquisition at the ATLAS experiment can be described as a “goal-oriented” (or “behavioristic”) process as you suggested; the goal being to select as many interesting events as possible (for further evaluation during the stage of data-analysis) in order to fulfill the above-mentioned objectives of the ATLAS experiment. It is mainly for this reason that I find it appropriate to describe the data-acquisition process at the ATLAS experiment as “exploratory” rather than “validatory”; because it is geared, in the first place, towards enhancing both the discovery potential and the testing capacity of the ATLAS experiment, rather than towards securing the accuracy and reliability of experimental results to be obtained.

      • Koray: Thanks much for your comment. I don’t think we disagree on anything, other than perhaps terminology. This is a great case with a rapidly shifting data-analytic context. But it is not very unusual for theoretical “predictions” to be highly data dependent, and “discovered” only by way of exploring the data themselves. There is an interesting mix of theoretical planning and post-data discernment, described with unusual clarity by Matt Strassler (who was good enough to Skype me a few weeks ago to correct some points in my first Higgs post.) I don’t know whether the pre-analysis “theory studies” he calls for below were actually carried out:


        Strassler: “But in fact we find ourselves in a different situation, with a Higgs that may be in the process of being found in Phase 1, with a mass that (if it is there) is known to be close to 125 GeV/c2, and for which any exotic decay (given that the decays expected in the SM must be common, to make the discovery of this SM-like Higgs possible) must be rare — at most, say, 10-15% of all Higgs decays, and perhaps much, much smaller! Knowing the Higgs’ mass makes the search easier, but the smaller maximum probability for the decay makes it harder; together they certainly make the search different, different enough that the previous studies may not apply.] So we find ourselves in the unfortunate position of not yet knowing precisely how to look for large classes of exotic decays.
        But we certainly want to know this information in advance! Otherwise, how can we be sure how best to adjust the trigger strategies? How can we try to collect the right fraction of the data starting in March 2012, if we don’t know what strategies we’re going to want to use at the time of the data analyses, some of which won’t be done until 2013?!
        I personally believe that we need to act quickly — that a number of theory studies of how to detect exotic Higgs decays are needed, now.”

        I love his urgency.

        • Koray


          In my view, what makes the ATLAS experiment a novel case for the philosophy of science is the fact that, mainly due to technological limitations, most of the collision events are missed out already during the stage of initial data-taking from the detector system. This means that if selection criteria are not appropriate to discriminate “interesting events”, then this will make it impossible for the ATLAS experiment to actually explore (in a genuine way) the theoretical predictions which it is initially aimed to test. In other words, no matter what selection criteria are applied during the stage of data-analysis, once interesting events are left out during the stage of initial data-taking, there is no (genuine) way of exploring those predictions by way of analyzing data-sets acquired on the basis of inappropriate selection criteria; because such data-sets will be irrelevant to the theoretical predictions which the experiment is initially aimed to test. This indicates that, in the case of the ATLAS experiment, the exploratory power of the experiment is already at stake during the stage of initial data-taking, and this exploration problem needs to be addressed and solved before the stage of data-analysis.

          • Koray: That was precisely my point (in my post) in calling attention to the pre-data theoretical planning: “When Strassler heard there would be “parked data” for probing anomalies, I take it his theories kicked in to program those exotic indicators.” This pre-data theorizing, I was suggesting, is somewhat at odds with the more usual idea of “exploratory” as discovering patterns in the data, collected for whatever reason (as in EDA). If anything, the pre-data work here is far more theoretical than is typical, for the reasons you note. Of course, as I allowed in my comment, there’s a lot of post-data “exploration”. Perhaps it’s just a matter of degree.

            • Koray Karaca

              I think “the usual idea of “exploratory” as discovering patterns in the data” is still there, and in the ATLAS experiment, experimenters look for those patterns indicating “new physics” in data sets. But, using your terminology, “pre-data” theoretical planning is necessary (due to the technical reasons I previously noted), and, if properly implemented, it serves to ensure that data sets include also “interesting events”. Given that 99, 99… % of collision events are expected to be not interesting, an inadequate pre-data theoretical planning would most likely result in the absence of data patterns indicating “new physics”; thereby undermining the discovery potential of the experiment. This suggests that in order for the usual idea of “exploratory” as discovering new patterns in data sets to work in the case of the ATLAS experiment, an adequate pre-data theoretical planning is necessary.

  2. A note on “exploratory”. This is one of those terms that became very popular in the mid-70s, especially in statistics, that trickled down to other fields including philosophy. The names that I associate with it are John Tukey (1977)—EDA (Exploratory Data Analysis”)– in statistics, and Ron Giere (1976) in philosophy, but I’m sure there are others. Many regarded EDA, even then, as a different “philosophy” of learning from data: rather than introduce a model, one would apply various procedures to the data, often graphical. “Stem and leaf” plots have been standard fare in high school for many years. Now of course everyone does graphical analysis of various sorts.

    Much more broadly, “exploratory” analysis is typically contrasted to “confirmatory”. In my doctoral dissertation (“Philosophy of Statistics”—what else?), following Giere, I distinguished the aims and (statistical) standards for exploratory inquiry as opposed to those appropriate for hypothesis testing. This relates to a key distinction often made as to when procedures are questionable or even fraudulent, as in data dredging, post-hoc subgroups, double-counting and cherry-picking. OK for explorations, not for inference or evidence.
    In this connection, we recently looked at the Tilberg Report on Stapel (April 1, 2013).

    “Many of the above verification procedures have some value in a more exploratory setting, but then their use must be reported explicitly. The results of exploration of this kind would then be confirmed in a new independent replication prior to publication”. (p. 51)
    “the article makes no mention of this exploratory method” (p. 49)

    Click to access finalreportLevelt.pdf

    Although it seemed liberating as a grad student to have a term, “exploratory”, with which to confront all the “confirmation theory” around me, I do not really find it useful to distinguish types of inquiry like exploratory and confirmatory. I find it preferable, and more meaningful, to consider the problem of interest, whether it’s planning, collecting, modeling, or drawing inferences (or otherwise learning) from data. In a given inquiry, it is often useful to distinguish different “levels” of inquiry, but that seems different.

    Notably, there are portions of model validation and misspecification testing where there is nothing wrong with using the same data for identifying as well as testing potential violations of assumptions. It’s all in the error probabilistic (e.g., severity) assessments that are legitimate. (Readers may recall the “unit” on misspecification testing on this blog, starting with

    Giere, R. N. (1976) “Empirical Probability, objective statistical methods, and scientific inquiry, Foundations of probability theory, statistical inference, and statistical theories of science, vol. 2, edited by W. Harper and C.A. Hooker, 63-101. Reidel.

    Tukey, John (1977), Exploratory Data Analysis, Addison-Wesley.

Blog at WordPress.com.