Don’t count the numbers, count the spoons

The bigger a study, the better? That’s an assumption often made. But even studies that knock us out by their sheer size may be wrong.

Take, for example, a study published in The Lancet in August last year that used the entire population of Finland (5.2 million) as its control in measuring the effectiveness of drugs to treat schizophrenia.

It reached the surprising conclusion that clozapine, a drug that has been under a cloud for some time because of its side effects, was in fact the best. A Finnish team compared mortality among 66,881 schizophrenia patients against that of the whole Finnish population between 1996 and 2006. Despite its iffy reputation, clozapine was associated with a 26 per cent lower risk of dying.

The result was highly statistically significant (p = 0.0045 when compared with perphrenazine, and p <0.0001 for all other antipsychotic drugs) and led the authors to suggest reassessing the restrictions on the use of clozapine. In a BMJ Blog a British expert, Dr Steven Reid, concurred. “Time for a rethink, perhaps” he concluded, “as not only is clozapine the most effective anti-psychotic we have; it may also be the safest.”

In the current issue of The International Journal of Clinical Practice, a special issue on statistics in medicine, Professor S Nassir Ghaemi and S.B. Thommi from Tufts Medical Center in Boston take a different view. They argue that despite the size of the sample and the apparent solidity of the findings, the study is an example of “confounding by indication”, otherwise known as selection bias.

Given the long-established cardiovascular risks of clozapine, the chances are that it will not have been prescribed for those with a high cardiovascular risk. “If we give high-risk drugs to low-risk patients, and low-risk drugs to high-risk patients, we will get reasonably good and similar outcomes” the authors argue. “This is good medical practice, not an argument for the inherent ‘safety’ of clozapine.”

The longest and largest prospective study of the risks of clozapine, begun in 1992, showed 9 per cent cardiovascular mortality in a sample with a mean age of 36.5 years, followed for a decade. In such a young group of people, one would expect little or no cardiovascular mortality.

Applied to the Finnish study, these results imply that clozapine would kill more people than it saved. If it reduces deaths by suicide by 75 per cent in a decade (as the study found), but increases cardiovascular mortality by 9 per cent, then roughly 5 times as many deaths will be caused as lives saved. In terms of numbers needed to treat, for every suicide prevented, 50 patients would need to be treated; for every cardiovascular death caused, only 10 patients would need to be treated.  

In addition, the Finnish study failed to correct for marital status, substance abuse, socioeconomic status and other social variables, all of which may have an impact on suicide rates. So far from proving clozapine is safe, in the authors’ opinion the study does no such thing: “Good evidence exists about its dangers, and these need to be taken into account when other evidence is adduced regarding its benefits, especially in the setting of epidemiological studies full of confounding bias, even if the entire nation of Finland is involved”.

The authors give another good example of confounding, a study that appeared to show that three-months’ treatment on an antidepressant after a stroke led to a greatly reduced mortality over the succeeding nine years. This was an RCT, so ought to have been immune to confounding, but the nine-year mortality was not the primary end point. That had been to show that antidepressants improved depression three months after a stroke.

The nine-year mortality was a post-hoc outcome, one decided long after the original study was complete. There were 104 patients, 50 of whom died in follow-up and 54 of whom lived. To make the conclusion valid, all possible differences between the groups ought to have been corrected for, but were not, the authors wrongly assuming that if factors such as obesity, hypertension, diabetes and lung disease did not differ significantly between those who died and those who didn’t, they could be ignored.

For secondary outcomes and post-hoc analysis of RCTs, this is not the case, assert Ghaemi and Thommi. Although such opportunistic analyses are common in the medical literature, it cannot safely be assumed that confounding cannot occur because the original study was an RCT.

The same issue of the journal includes an analysis of statistics in drug advertising by Professor Joel Lexchin of York University in Toronto (they’re full of confusing claims and don’t seem to have improved much in decades, he finds) and an illustration of the pitfalls of calculating the population attributable fraction (the number of cases of a disease attributable to a particular risk factor) by Dr David Williamson of Emory University in Atlanta.

In an editorial, Professor Leslie Citrome of the New York University School of Medicine says that the best defence against being misled is to be aware of common distortions and biases. “This does not require a degree in statistics, but it does require being a bit sceptical and curious about the claims being made”, he says.