A statistical casualty?

Baroness Young has resigned as Chair of the Care Quality Commission (CQC) after reports of conflicts with Health Secretary Andy Burnham concerning procedures for monitoring the NHS.

In particular, there has been strong media attention on the apparent conflict between the ratings provided by the CQC and those released last week by Dr Foster in their Good Hospital Guide: nine of the 12 hospitals identified by Dr Foster as “significantly underperforming” for patient safety had been rated as “good” or “excellent” by CQC for Quality of Services. 

Both sets of ratings rely on the mathematical combination of performance indicators to create an overall rating and so are essentially based on statistical analysis. So Baroness Young can perhaps be seen as a statistical casualty.

I should state an interest: I have contributed to the statistical methods that are extensively used by both the CQC and Dr Foster. So if they are using similar methods, how can they come to such different answers? Sadly, the journalistic coverage of this issue has been woeful and the level of argument has been predictably dire – nobody seems to have made the slightest effort to get below the surface and understand what drives the ratings.

For example, as I reported in The Times last week, it takes only a few clicks to see that St Helens hospital, considered “excellent” by the CQC, was marked down as by Dr Foster as “significantly underperforming” because safety incident data had not been reported in time due to technical problems. The Daily Mail also nonsensically reports that Basildon’s excess mortality rate was discovered in a surprise inspection, as if the inspectors found bodies littering the grounds, whereas the CQC had been investigating Basildon for its mortality for some time beforehand. So the choice of indicator, its recency, its relevance to safety and the way in which they are combined are all crucial.

Of course the full gory details of how routine data is selected, collected, adjusted and combined to form a "rating" is a nightmare of complexity, difficult to untangle even for an experienced statistician. Dr Foster provide a guide to its methodology but gloss over the precise details of how the individual indicators are weighted or the basis for the weights. 
In fact the methods they appear to have used were intended to suggest who should be inspected, and not to produce global ratings, which many would claim should reflect values and priorities rather than be derived on purely statistical grounds. But the interviewer on the Today programme let them get away with saying that their data and methods were on the website, when neither the raw data, indicator values, the weighting system nor the final score are actually provided – only the final rank between 0 and 100. 
Unfortunately the CQC rating is if anything even more obscure in its derivation: to some extent this is inevitable but it should be the responsibility of any regulator to provide software that allows anyone to "drill down" and find out what is driving the conclusions.
Now of course for many people this story simply provides an opportunity to express an opinion about the government, which makes reading the comments underneath online articles so depressing. But we should expect better of our media, and this unwillingness to engage with the ideas behind the ratings means that neither of the agencies are properly held to account.
If the "scandal" had been about financial irregularities in a company, then economic correspondents would be let loose upon the story and might even be able to explain to the public what was going on. 
But because this story concerns some slightly subtle analysis of routine statistics, none of the media seem able to drum up anyone who has a clue what is going on and be able to tell us why the agencies could come up with such different answers. As usual statistics are simultaneously treated as unquestionable but with great scepticism, and not as constructions that can be questioned and examined. Which is very sad.