Discrete observations and classical confidence intervals

May 31, 2010

In particle physics, experimentalists often aim to set limits on certain physical quantities, in part to verify theories. Say a theory predicts that a particle called Gobbledygook has a 10-8 chance of decaying into two Gooks and a 1-10^{-8} chance of decaying into three Gobbles. Often, the ratio between these two decay modes are closely related to important parameters in the theory. Experiments that try to set limits on the ratios of these decays can therefore give us an idea of the range of values in which those parameters fall. The fraction of total decays that a particular decay mode takes up is called the branching ratio of that decay mode.

These experiments proceed by creating a huge number of Gobbledygook decays, and counting the number of these decays that (say) result in two Gooks. The eventual count is therefore a discrete quantity — one cannot count a fractional number of decays. The branching ratio itself, which is what the experimenters try to set a limit on, is not a discrete quantity. So the limits that experimenters put on branching ratios are not subject to the restriction of discreteness — they can take on a range of continuous values.

In classical statistics, confidence intervals have the following significance. A 90% confidence interval means that if I carry out a large number of experiments and set a 90% confidence interval in each experiment about the quantity I’m measuring, then 90% of those confidence intervals will contain the actual value of the quantity I’m measuring. That is, classical confidence intervals say something about the expected coverage of the actual value that is generated by a particular method of constructing confidence interval.

So let’s say I want to put an upper limit on the branching ratio of a particular decay mode. I measure the number of such decay modes in my sample of decays, n_0, and find that n_0=0. I know that the decay mode is a Poisson process with unknown true mean u_t, i.e. P(n|u_t) = u_t^n e^{-u_t} / n!. To set a 90% confidence level upper limit on u_t, I put n=0, P(n|u_t)=0.1 and solve for u_t. This gives me the upper limit u_2 = 2.3.

Up to this point, we haven’t considered uncertainties due to the experimental setup. If there are no uncertainties whatsoever, that is, if the experimental apparatus and data analysis are of infinite precision, then the above method of constructing a 90% confidence interval, if repeated, will in fact lead to 90% of confidence intervals constructed this way covering u_t.

However, no experiments have infinite precision, so we have to take uncertainties into account. But the classical 90% confidence interval we get when we take experimental uncertainties into account in fact leads (in the above example) to u2 < 2.3, a tighter limit than the limit that an experiment with infinite precision would lead us to set! This, as Robert Cousins writes, is unacceptable since

if two experiments each find n_0=0 and have the same \hat{s}, the poorly calibrated one will report a more restrictive limit than the superbly calibrated one.

That is, we’d expect that the “more precise” experiment would allow us to place a stricter limit on the branching ratio, yet it turns out that with classical confidence intervals, the less precise experiment gives us a stricter limit!

Here’s how that happens. For the infinitely precise experiment, the 90% confidence interval is as described above. We want to measure the branching ratio R_t = u_t / s_t, where s_t is the true sensitivity of the experiment. In the infinitely precise experiment, there is no uncertainty in s_t. Thus 90% of confidence intervals about the measured branching ratio \hat{R} will cover R_t. 10% will not.

Now suppose we don’t know the true sensitivity s_t. We can only estimate it by \hat{s} \pm \sigma. Suppose \sigma = 0.1 \hat{s}. Suppose further that u_t =2.28 or u_t = 2.32, that is, u_t is close to 2.3 relative to \sigma. Then the percentage of experiments that will observe n_0 \geq 1 is very close to 90%. When we construct the confidence intervals about \hat{R} from these experiments, their upper limit will be 3.9 / \hat{s} or greater, so nearly all of the 90% will cover R_t. In the remaining 10% of experiments where n_0=0, about half of the confidence intervals will cover R_t — due to the \pm \sigma term in the sensitivity. Thus the total coverage of R_t will be approximately (90+5)%=95% — not 90%! A 90% confidence interval for the experiment with uncertainty \sigma=0.1 \hat{s}, according to Cousins, would result in an upper limit of 2.0/ \hat{s}, stricter than the 2.3 / \hat{s} that one gets in the infinitely precise experiment!

Cousins says that this strange result is due to the discrete nature of observations in a Poisson process. I think of it intuitively this way. The discreteness of the observations means that with u_t \approxeq 2.3, about 10% of experiments will throw up the result n_0=0. Because of the symmetric uncertainty about \hat{s}, about half of these will cover R_t. Now, if n_0 were a continuous variable (excuse this rather dubious counterfactual), many of these incidences of n_0=0 would instead be spread over a range of positive values of n_0. These incidences would have limits higher than the 2.3 / \hat{s} for n_0 = 0, so fewer of them would cover R_t compared to the discrete case. Thus, the discrete nature of the observations leads to over-coverage.

Note the occurrence of overcoverage does not depend on u_t being close to 2.3. But the effect is magnified the closer u_t is to 2.3.

Cousins uses this anomaly — that a “more precise” experiment can actually lead to less stringent limits on branching ratios — to argue that particle physicists should employ Bayesian statistics instead. But Bayesian statistics comes with its own collection of problems, the most obvious one being the need to choose a prior. This can sometimes be an “advantage”. In experimental particle physics, the Particle Data Group is a particularly important organisation. Every year, it publishes a Review of Particle Physics that is the “bible” for experimental particle physicists — among other things, it contains all the “accepted” values of physical constants and parameters relevant to particle physics. When Cousins wrote his paper, the PDG’s weighted average over experiments for the squared mass of the neutrino, with a central 68% classical confidence interval, was m^2 = (-54 \pm 30) eV^2. That is, the entire confidence interval was in an “unphysical” region! If one uses a prior that is zero for values of m^2 <0, then one can rule out such "unphysical" confidence intervals. But this still leaves the question of whether the prior for the "physical" region should be uniform in m, m^2, or something else. Cousins reports that "the consensus view settled on m^2, but the fact that the upper limit depends on this choice remains unsettling to many".

What I find most interesting about this statistical curiosity is the tensions at work in the desiderata for published limits on quantities like branching ratios. On the one hand, it would be nice to have a pithy description that is uniform for all the branching ratios listed in the Review of Particle Physics — all with a weighted average and the appropriate uncertainty associated with a standardised confidence level. That would be great utility for those looking for a quick overview of the experimental situation, say in order to jot down some rough pen-and-paper estimates in a related calculation. On the other hand, these pithy descriptions leave out the intricacies described in Cousins’ paper, imparting a perhaps misleading objectivity to the reported values. Recall that Cousins balks at accepting a method that leads to an experiment with infinite precision being less stringent with its limits than one with finite precision. I suspect that’s because he’s acknowledging the experiment as imparting authority to its reported mean value and confidence interval in its own right, not as just another statistic in the hypothetical ensemble of experiments that together satisfy the requirements of classical confidence intervals. If one takes the ensemble point of view seriously, then it’s not clear that Cousin’s worry matters. Of course, there is a whole other question about whether we should really be thinking in terms of large ensembles of experiments in experimental particle physics, given that the difficulty and expense of such experiments ensure that we do not have such large ensembles in practice.

ResearchBlogging.orgCousins, R. (1995). Why isn’t every physicist a Bayesian? American Journal of Physics, 63 (5) DOI: 10.1119/1.17901


A ceaselessly gravid German philosophical cow

May 17, 2010

I have no opinion on Heidegger, but the rant on him in Thomas Bernhard’s Old Masters is a masterpiece.

Stifter in fact always reminds me of Heidegger, of that ridiculous Nazi philistine in plus-fours. Just as Stifter has totally and in the most shameless manner kitschified great literature, so Heidegger, the Black Forest philosopher Heidegger, has kitschified philosophy, Heidegger and Stifter, each one for himself and in his own way, have hopelessly kitschified philosophy and literature. Heidegger, after whom the wartime and postwar generations have been chasing, showering him with revolting and stupid doctoral theses even in his lifetime — I always visualize him sitting on his wooden bench outside his Black Forest house, alongside his wife who, with her perverse knitting enthusiasm, ceaselessly knits winter socks for him from the wool she has herself shorn from their own Heidegger sheep. I cannot visualize Heidegger other than sitting on the bench outside his Black Forest house, alongside his wife, who all her life totally dominated him and who knitted all his socks and crocheted all his caps and baked all his bread and wove all his bedlinen and who even cobbled up his sandals for him. Heidegger was a kitschy brain, Reger said, just as Stifter, but actually a lot more ridiculous than Stifter who in fact was a tragic figure unlike Heidegger, who was always merely comical, just as petit-bourgeois as Stifter, just as disastrously megalomaniac, a feeble thinker from the Alpine foothills, as I believe, and just about right for the German philosophical hotpot. For decades they ravenously spooned up that man Heidegger, more than anybody else, and overloaded their German philological and philosophical stomachs with his stuff. Heidegger had a common face, not a spiritual one, Reger said, he was through and through an unspiritual person, devoid of all fantasy, devoid of all sensibility, a genuine German philosophical ruminant, a ceaselessly gravid German philosophical cow, Reger said, which grazed upon German philosophy and thereupon for decades let its smart little cowpats drop on it… Heidegger has always been repulsive to me, not only the night-cap on his head and his homespun winter long-johns above the stove which he himself had lit at Todtnauberg, not only his Black Forest walking stick which he himself had whittled, in fact his entire hand-whittled Black Forest philosophy, everything about that tragicomical man has always been repulsive to me, has always profoundly repulsed me whenever I even thought of it; I only had to know a single line of Heidegger to feel repulsed, let alone when reading Heidegger, Reger said; I have always thought of Heidegger as a charlatan who merely utilized everything around him and who, during that utilization, sunned himself on his bench at Todtnauberg… His nothing without reason is the most ludicrous thing ever, Reger said. But the Germans are impressed by posturing, Reger said, the Germans have an interest in posturing, that is one of their most striking characteristics. And as for the Austrians, they are a lot worse still in all these respects. I have seen a series of photographs which a supremely talented woman photographer made of Heidegger, who in all of them looked like a retired bloated staff officer, Reger said; in these photographs Heidegger is just climbing out of bed, or Heidegger is climbing into bed, or Heidegger is sleeping, or waking up, putting on his underpants, pulling on his socks, taking a nip of grapejuice, stepping out of his log cabin and looking towards the horizon, whittling away at his stick, putting on his cap, taking off his cap, holding his cap in his hands, opening out his legs, raising his head, lowering his head, putting his right hand in his wife’s left hand while his wife is putting her left hand into his right hand, walking in front of his house, walking at the back of his house, walking towards his house, walking away from his house, reading, eating, spooning his soup, cutting a slice of bread (baked by himself), opening a book (written by himself), closing a book (written by himself), bending down, straightening up, and so on, Reger said. Enough to make you throw up.


Follow

Get every new post delivered to your Inbox.