The statistics of political polling

So: are the opinion polls having a good election or a bad election? It is impossible to know. For opinion polls are unique amongst statistical models in that it is never possible to know if they are right or wrong. 

Suppose for example that Lib Dems in the event come a poor third on polling day. That might look like a refutation of (for example) the BPIX poll conducted on April 17 which has them in a (statistically-insignificant) lead.
 
However if that does happen, BPIX would not find it difficult to come up with a perfectly plausible exculpation. Their poll, they would say, was probably right at the time. However subsequent events – perhaps a poor performance by Nick Clegg in the second and third televised debates, perhaps the natural regression to the mean of votes as a campaign goes on – caused voters to change their minds.
 
All pollsters ride two horses in the circus. They claim on the one hand to give an accurate report of public opinion at the time they are conducted, and an advance indicator of the result to come. That is what causes newspapers to commission them and electors to read them. On the other hand, they have in reserve the option of disclaimed any forecasting intention. If reality and the polls diverge, they can say: so what? Things changed between the poll and the election, that is all. This is not a scientific proposition in the sense that it is not one that can be tested, proved and disproved. That, from the pollsters’ point of view, is a great strength.
 
Even before the leap in Lib Dem standing following the Clegg moment in the first debate, the polls were quite varied in the results they showed. It is true that no poll has yet put Labour ahead. Yet for example, in a clutch of polls around 12-13 April, results varied from a Tory lead of 3 percentage points (eg Populus) and 10 points (Angus Reid).
 
What is not well understood is that this variation may simply conceal an essential truth: that all polls have a statistical margin of error. Typically this would be around 2 percentage points for the vote share of each party. However, this gives a misleading impression. For if (say) a poll yields a Tory share that is two percentage points higher than their actual share, the tendency will be for it also to show a Labour share that is two percentage points lower. Very broadly, and omitting the complication of third and other parties, one large party's share is the reciprocal of the others. So a three point Tory lead and a seven point Tory lead are what statisticians would call “the same number.” There is no statistically significant difference between them.
 
Even that understates the likely variation in the poll. For this statistical margin of error depends on a very strong assumption: that the techniques each poll used to draw its sample provide them with a perfect cross-section of the electorate.
 
In fact, they don’t. For example, there are large numbers of refusers – one telephone pollster recently admitted that it took ten calls to get one respondent. No-one has the faintest idea if the refusers have the same political profile as the yes-sayers.
 
It used to be relatively easy to deal with such problems. Pollsters weighted their  samples. Voting (thinking back a little way into the last century) depended heavily on social class. If therefore you had a proportion from each class in your sample that matched the national proportion, you had a good chance of getting it right.
 
This correlation between class and vote is much weaker than it used to be. Indeed there is no objective measure – age, sex, public sector versus private sector employment and so on – that explains more than a smallish part of the variance in voting intention. Pollsters have been driven to other, more subjective adjustments. In particular, they tend to adjust for how people remember having voted in past elections which is correlated with how they will vote in this election.. But these adjustments are themselves controversial. It does not help that people’s memories as to how they voted in the past are demonstrably unreliable, in that they correlate very imperfectly with the actual recorded results of past elections.
 
Polling is not about to go away. After all, even the best racing tipsters advise many losers, and yet they command their show in national newspapers all the same. And of course, the pollsters may yet end up on polling day looking like heroes. However, in the meantime, Straight Statistics' advice to the scrupulous is to take all poll results with a largish dose of salt.