## Discrete observations and classical confidence intervals

In particle physics, experimentalists often aim to set limits on certain physical quantities, in part to verify theories. Say a theory predicts that a particle called Gobbledygook has a 10-8 chance of decaying into two Gooks and a $1-10^{-8}$ chance of decaying into three Gobbles. Often, the ratio between these two decay modes are closely related to important parameters in the theory. Experiments that try to set limits on the ratios of these decays can therefore give us an idea of the range of values in which those parameters fall. The fraction of total decays that a particular decay mode takes up is called the branching ratio of that decay mode.

These experiments proceed by creating a huge number of Gobbledygook decays, and counting the number of these decays that (say) result in two Gooks. The eventual count is therefore a discrete quantity — one cannot count a fractional number of decays. The branching ratio itself, which is what the experimenters try to set a limit on, is not a discrete quantity. So the limits that experimenters put on branching ratios are not subject to the restriction of discreteness — they can take on a range of continuous values.

In classical statistics, confidence intervals have the following significance. A 90% confidence interval means that if I carry out a large number of experiments and set a 90% confidence interval in each experiment about the quantity I’m measuring, then 90% of those confidence intervals will contain the actual value of the quantity I’m measuring. That is, classical confidence intervals say something about the expected coverage of the actual value that is generated by a particular method of constructing confidence interval.

So let’s say I want to put an upper limit on the branching ratio of a particular decay mode. I measure the number of such decay modes in my sample of decays, $n_0$, and find that $n_0=0$. I know that the decay mode is a Poisson process with unknown true mean $u_t$, i.e. $P(n|u_t) = u_t^n e^{-u_t} / n!$. To set a 90% confidence level upper limit on $u_t$, I put $n=0, P(n|u_t)=0.1$ and solve for $u_t$. This gives me the upper limit $u_2 = 2.3$.

Up to this point, we haven’t considered uncertainties due to the experimental setup. If there are no uncertainties whatsoever, that is, if the experimental apparatus and data analysis are of infinite precision, then the above method of constructing a 90% confidence interval, if repeated, will in fact lead to 90% of confidence intervals constructed this way covering $u_t$.

However, no experiments have infinite precision, so we have to take uncertainties into account. But the classical 90% confidence interval we get when we take experimental uncertainties into account in fact leads (in the above example) to u2 < 2.3, a tighter limit than the limit that an experiment with infinite precision would lead us to set! This, as Robert Cousins writes, is unacceptable since

if two experiments each find $n_0=0$ and have the same $\hat{s}$, the poorly calibrated one will report a more restrictive limit than the superbly calibrated one.

That is, we’d expect that the “more precise” experiment would allow us to place a stricter limit on the branching ratio, yet it turns out that with classical confidence intervals, the less precise experiment gives us a stricter limit!

Here’s how that happens. For the infinitely precise experiment, the 90% confidence interval is as described above. We want to measure the branching ratio $R_t = u_t / s_t$, where $s_t$ is the true sensitivity of the experiment. In the infinitely precise experiment, there is no uncertainty in $s_t$. Thus 90% of confidence intervals about the measured branching ratio $\hat{R}$ will cover $R_t$. 10% will not.

Now suppose we don’t know the true sensitivity $s_t$. We can only estimate it by $\hat{s} \pm \sigma$. Suppose $\sigma = 0.1 \hat{s}$. Suppose further that $u_t =2.28$ or $u_t = 2.32$, that is, $u_t$ is close to 2.3 relative to $\sigma$. Then the percentage of experiments that will observe $n_0 \geq 1$ is very close to 90%. When we construct the confidence intervals about $\hat{R}$ from these experiments, their upper limit will be $3.9 / \hat{s}$ or greater, so nearly all of the 90% will cover $R_t$. In the remaining 10% of experiments where $n_0=0$, about half of the confidence intervals will cover $R_t$ — due to the $\pm \sigma$ term in the sensitivity. Thus the total coverage of $R_t$ will be approximately (90+5)%=95% — not 90%! A 90% confidence interval for the experiment with uncertainty $\sigma=0.1 \hat{s}$, according to Cousins, would result in an upper limit of $2.0/ \hat{s}$, stricter than the $2.3 / \hat{s}$ that one gets in the infinitely precise experiment!

Cousins says that this strange result is due to the discrete nature of observations in a Poisson process. I think of it intuitively this way. The discreteness of the observations means that with $u_t \approxeq 2.3$, about 10% of experiments will throw up the result $n_0=0$. Because of the symmetric uncertainty about $\hat{s}$, about half of these will cover $R_t$. Now, if $n_0$ were a continuous variable (excuse this rather dubious counterfactual), many of these incidences of $n_0=0$ would instead be spread over a range of positive values of $n_0$. These incidences would have limits higher than the $2.3 / \hat{s}$ for $n_0 = 0$, so fewer of them would cover $R_t$ compared to the discrete case. Thus, the discrete nature of the observations leads to over-coverage.

Note the occurrence of overcoverage does not depend on $u_t$ being close to 2.3. But the effect is magnified the closer $u_t$ is to 2.3.

Cousins uses this anomaly — that a “more precise” experiment can actually lead to less stringent limits on branching ratios — to argue that particle physicists should employ Bayesian statistics instead. But Bayesian statistics comes with its own collection of problems, the most obvious one being the need to choose a prior. This can sometimes be an “advantage”. In experimental particle physics, the Particle Data Group is a particularly important organisation. Every year, it publishes a Review of Particle Physics that is the “bible” for experimental particle physicists — among other things, it contains all the “accepted” values of physical constants and parameters relevant to particle physics. When Cousins wrote his paper, the PDG’s weighted average over experiments for the squared mass of the neutrino, with a central 68% classical confidence interval, was $m^2 = (-54 \pm 30) eV^2$. That is, the entire confidence interval was in an “unphysical” region! If one uses a prior that is zero for values of $m^2 <0$, then one can rule out such "unphysical" confidence intervals. But this still leaves the question of whether the prior for the "physical" region should be uniform in $m$, $m^2$, or something else. Cousins reports that "the consensus view settled on $m^2$, but the fact that the upper limit depends on this choice remains unsettling to many".

What I find most interesting about this statistical curiosity is the tensions at work in the desiderata for published limits on quantities like branching ratios. On the one hand, it would be nice to have a pithy description that is uniform for all the branching ratios listed in the Review of Particle Physics — all with a weighted average and the appropriate uncertainty associated with a standardised confidence level. That would be great utility for those looking for a quick overview of the experimental situation, say in order to jot down some rough pen-and-paper estimates in a related calculation. On the other hand, these pithy descriptions leave out the intricacies described in Cousins’ paper, imparting a perhaps misleading objectivity to the reported values. Recall that Cousins balks at accepting a method that leads to an experiment with infinite precision being less stringent with its limits than one with finite precision. I suspect that’s because he’s acknowledging the experiment as imparting authority to its reported mean value and confidence interval in its own right, not as just another statistic in the hypothetical ensemble of experiments that together satisfy the requirements of classical confidence intervals. If one takes the ensemble point of view seriously, then it’s not clear that Cousin’s worry matters. Of course, there is a whole other question about whether we should really be thinking in terms of large ensembles of experiments in experimental particle physics, given that the difficulty and expense of such experiments ensure that we do not have such large ensembles in practice.

Cousins, R. (1995). Why isn’t every physicist a Bayesian? American Journal of Physics, 63 (5) DOI: 10.1119/1.17901

### One Response to Discrete observations and classical confidence intervals

1. wolfgang says:

I did not have time and patience to check the calculation, but would still like to make three remarks.

1) I dont think there is any special issue here with discrete vs. real, because the branching ratio example could easily be formulated in terms of a simple head vs. tail example (with biased coin so that e.g. head is seen very rarely).

2) People often make mistakes about confidence intervals, when they are not careful about the idea that one tests against a (null) hypothesis.
In other words, considering an interval A < B a frequentist would (try to) rule out that the ratio is less than A with certain confidence e.g. 90% and also rule out with some confidence that the ratio is bigger than B.
If one deals with non-trivial measurement errors (as in your example) it is not immediately clear how to best 'combine' the two – and I think that is the real reason for the unexpected result of your analysis.

3) As for why (some) physicists (like me) hesitate to use Bayesian statistics, one reason (and there are several) is the danger that it blurs the line between experiment and theory.
You say it is obvious that m cannot be less than zero. The next guy is certain that your experiment *must* produce Gooks because of superstring theory and therefore wants to use a different prior from the guy who thinks that string theory is not even wrong.

One guy wants to use a uniform (uninformed) prior for m and the next guy thinks that x = exp(m) is the 'natural' variable to consider and the prior should be uniform in x.