Fishing for significance
Fish and the oils it contains are supposed to be good for mothers-to-be and their children.
Studies have linked eating fish with desirable outcomes such as a reduced risk of maternal depression, or better cognitive abilities in their babies.
To keep us on our toes, there have been other studies that suggest that fish is good, but only if you eat enough of it to counteract the damage done by the mercury contamination it often contains. Confused? You will be.
This month a randomised controlled trial appeared in the Journal of the American Medical Association from a team in Australia that showed fish-oil capsules containing decosahexanoic acid (DHA) had no effect either on the baby’s intelligence or the mother’s propensity to depression.
It got far less attention than the earlier studies, a function of the fact that journalists prefer to report studies that show positive findings, whatever their methodological flaws. So congratulations to Rebecca Smith of the Daily Telegraph, whose story reporting the Australian finding (below) was the only one I could find in mainstream British newspapers.
The women who participated were randomly allocated to either fish-oil capsules containing DHA, or vegetable oil capsules with no DHA. The trial was big enough to detect, with 80 per cent probability, a 4.2 per cent reduction in maternal depression, half that found in earlier studies. It found nothing.
It was also powered to detect the minimal clinically meaningful difference between babies, a difference of four points in their scores on the Bayley Scales of Infant and Toddler Development. Therem to it found nothing.
So why did the earlier studies show an effect, when the RCT did not? It’s a familiar feeling, especially in nutritional studies. Time and again, when an RCT is finally carried out , it contradicts the findings of earlier observational or cohort studies.
The most prominent of these earlier studies, reported at the time in almost all UK papers, was published in The Lancet in February 2007. A team led by Joseph Hibbeln of the US National Institutes of Health in Bethesda, Maryland, used data from the Avon Longitudinal Study in the UK to conclude that women who ate less than 340 g of fish a week were more likely to have children in the lowest quartile for verbal intelligence than those who ate more than 340 g a week.
Just to show I’m not having a go at journalists with the luxury of hindsight, here’s the headline on my own story about this study, written when I was Health Editor of The Times. The Daily Telegraph, the Guardian, the Daily Mail, The Sun, the Independent and the Daily Mirror published nearly identical stories.
Earlier work by Emily Oken of Harvard Medical School and colleagues, published in American Journal of Epidemiology, suggested an even more complex play of risk and benefit. They found women who ate more fish had higher mercury levels, and higher mercury levels were associated with lower scores in development tests when their babies were three. Yet the more fish the women ate, the brighter their babies.
This confusing outcome was interpreted by the authors as “fish is good for baby, but would be even better without the mercury contamination” – or, that the benefits of fish outweigh the disbenefits of mercury. This study attracted only modest coverage in the UK – in The Herald and the Independent on Sunday. But it played bigger in the US, where mercury contamination has long been something mothers are told to worry about.
How do we reconcile the studies? The RCT is the strongest, because randomised trials require no correction for confounding factors, such as the differences in social class or wealth between mothers who eat fish and those who don’t. The Hibbeln study was corrected for no less than 28 potential confounders, the Oken study for 16.
Another possible source of error is multiple testing. The more tests are done, the more likely it is that some will show statistically significant results simply by chance – just as buying more lottery tickets increases your chance of a win.
In the Hibbeln study, for example, fish intake is compared with outcomes from 23 different tests of cognition, behaviour, and development at different ages and using different measures. When corrected for the 28 potential confounders, nine of these are statistically significant, the rest are not.
That may sound persuasive, but close reading raises some questions. Fine motor skills in babies aged 18 months or 42 months, for example, are apparently affected by the mother’s fish intake, but not those in babies aged 6 months or 30 months. Social development is affected at 30 months, but not at 6, 18, or 42 months. Verbal IQ in 8 year-olds is linked to maternal fish consumption, but performance IQ or full scale IQ is not.
This is more akin to a fishing expedition than a sharply-focussed trial. One cannot judge the value of the significant findings without also counting the others – the total number of lottery tickets the authors have bought.
So do we conclude that fish oil is a busted flush? Not if we are Dr Oken, who in a commentary in JAMA claimed that the Bayley scale used by the Australian team was a poor predictor of possible cognitive deficits, and asserted that “fish is the primary source of DHA and other omega-3 fatty acids which are critical nutrients in pregnancy”. The health supplements industry disagreed with the findings, too: but that’s no surprise.
Can journalists be blamed for reporting studies that are later contradicted? That would be asking a lot. But they might try harder to read them critically and identify potential flaws, rather than giving all studies published in medical journals equal billing.
And when an RCT does finally appear that disproves an earlier story they have written, they ought at least to try to cover it.
Mike O'Neill (not verified) wrote,
Thu, 28/10/2010 - 13:52
There are two obvious improvements that would improve both the publication and reporting of statistical studies.
1. Calculate, and include in the analysis, not only the significance levels for individual tests but also the number of instances of significance expected across all tests. This would show where the 'significance' in a particular test is likely to be simply a statistical artefact.
2. Include in the analysis results comparisons with ALL (known) similar studies, not only published ones (which tend only to be ones which show a statistically significant effect).
Studies where these are not included should be treated with suspicion by reporters.
Of course it would also help if researchers engaged brain before publishing and, as Nigel Hawkes suggests, asked the question "does this seem reasonable".