Question marks over Corby judgement
On July 29, The Hon Mr Justice Akenhead ruled in the High Court that Corby Borough Council had been negligent in their handling of toxic waste from Corby’s reclamation of the sites of abandoned steel works.
The contamination that resulted could realistically have caused the birth defects of 16 defendants, he said. Crucially, he found that “there was a statistically significant cluster of birth defects between 1989 and 1999” [para 884 of the judgement.
But this finding appears open to question.
Statistical significance is one of the most commonly used scientific concepts. It is also widely misunderstood and has been disputed among statisticians and epidemiologists - both in general and in specific applications - ever since its development nearly a century ago.
This case is no exception. The two epidemiological experts disagreed on whether the cluster was statistically significant but the judge “was most impressed” [para 712] by the one that concluded it was.
Fortunately the judgement provides the full data that was permitted as evidence on this point. The judge ruled that statistical significance should only be assessed using data collected to a common standard, and this focussed attention on the incidence of upper-limb defects of babies born in Corby between 1989 and 1998 compared to the rest of the former Kettering Health Authority (KHA).
There were 14 such defects in the whole of KHA over this period from 35,627 births: 6 of these were in Corby from 7,736 births, compared to 8 out of 27,891 births elsewhere in KHA [para 736]. Corby therefore had 2.7 times the rate found elsewhere in KHA.
Statistical significance measures how likely such an extreme imbalance in birth defects would be, if Corby really did have the same underlying risk as elsewhere in KHA. The first expert calculated that the “one-sided p-value” – the probability of getting such a high number in Corby by chance alone – was 0.033 (technical note: there are many different ways of calculating such a p-value and my personal preference, a conditional analysis using a mid-p-value, gives p= 0.040, very similar to that in the judgement.) Since this is less than the classically accepted threshold level of 0.05 (5 per cent), this led the first expert and the judge to consider the results “significant”.
However both experts agreed that the conclusion of significance rested on a number of assumptions [para 729]. For example, the judge considered the issue of “one-sided” vs “two-sided” tests (in which we are interested in both high and low incidence), and concluded that the only interest was in excess risk and so a one-sided p–value was appropriate – without this decision the cluster would not have been judged significant.
The judgement also contains another strong caveat: the cluster is only significant if “the cases which gave rise to concerns and to the KHA investigation .. are included in tests of statistical significance” [para 729]. The judge decided these cases should be included.
But the judge, rather remarkably, did not address the closely-related point of “multiple comparisons”. Apparent clusters of events occur all the time by chance alone. The p-value of 0.033 accepted by the judge means that such a cluster will occur in around 1 in 30 areas, even though there is no background factor causing a problem.
There were around 7 million births in England and Wales between 1989 and 1998: 7,736 of them were in Corby, so there are around 900 other areas the size of Corby in England and Wales. That means that we can be confident there are around 30 areas in England and Wales that have just as much of a cluster of upper-limb defects as observed in Corby - all of these would be “false positives” as there is no underlying cause except chance variability. So how can we be sure that Corby is not one of these false positives?
The simplest method is to use two independent sets of data – one to suggest a hypothesis, and another to test it - but we have seen that this was ruled out by the judge. An alternative is to take account of how many opportunities there were for such a cluster to arise.
So when calculating a p-value for Corby in KHA, most statisticians would want to calculate the probability of getting such an extreme result in Corby or another area in KHA, essentially removing from consideration the fact that a cluster specifically in Corby led to the whole analysis.
There are a number of ways of carrying out this adjustment. One way is as follows. The rest of KHA was responsible for 27,891 births - if we divide this into 4 additional areas with 6,973 births in each area we have created 5 areas comprising KHA, of which Corby is the largest by a small margin.
Then, given the incidence of 14 observed upper limb defects in the whole KHA between 1989 and 1998, then if any of the five areas had 6 or more defects, then this would comprise a ‘cluster’ at least as extreme as the one observed in Corby.
By a computer simulation we can show that there is a 22 per cent chance of getting an apparent cluster of at least 6 defects occurring in one of the 5 areas comprising KHA, completely by chance alone (14 per cent using a mid-p-value). So the fact that 6 out of the 14 defects were concentrated in one area is not particularly surprising.
This is a subtle point. If the reason for investigating the incidence of birth defects in Corby was because of suspicions concerning toxic waste, then the hypothesis would have been set up independent of the data, there would be no need for adjustment and the cluster would be considered statistically significant as concluded by the judge.
But in this case it appears that the cluster came first, and the possible association with toxic waste came followed. If the cluster is to be used as an independent source of evidence, then it seems that an appropriate adjustment should be made. The contaminants in Corby may well have caused birth defects, but the judgement of a ‘statistical significant’ cluster appears highly arguable.
David Spiegelhalter is Winton Professor of the Public Understanding of Risk at Cambridge University
Mike O'Neill (not verified) wrote,
Thu, 13/08/2009 - 09:34
I'm not sure what simulation was used but there is an 8.8% chance that an area with 7,736 births (Corby) will have 6 or more birth defects, given that there are 14 defects per 35,627 births (KHA) and assuming all defects occur at random.
Of course, it would be better to have the national figures for births and defects to plug into the calculation.
Michael (not verified) wrote,
Tue, 01/09/2009 - 17:16
The mistakes made by the judge here appear to be fundamental. The problem though is that an expert epidemiologist must have made the same errors (as one expert epidemiologist argued that the result was statistically significant).
This epidemologists point of view is not discussed. I would be interested to know what it is. (Surely a trained stats professional would not argue that the data should be resused to test the hypothesis. What did he argue?)
aaron (not verified) wrote,
Fri, 05/03/2010 - 22:10
"The mistakes made by the judge here appear to be fundamental.."
Exactly! It's a really impetuous conclusions,that are based on a bunch of mistakes.
Anonymous (not verified) wrote,
Sat, 17/04/2010 - 13:22
bear in mind KGH did NOT register all defects as birth as they should be and therefore these are not included in the figures used