Cybercrime claims: think of a big number and double it

On Sunday the Metropolitan Police issued a press release claiming that its e-Crime Unit had saved more than £140 million in the past six months and is well one its way to exceeding its four-year harm-reduction target of £504m.

Great news, but how on earth does it know? It cites two successful cases, Operation Pagode and Operation Dynamaphone, in which convictions put internet fraudsters and “phishers” into jail. The Pagode case, involving a group of young men who sold stolen credit card details on the ghostmarket.net website, made up £84 million of this saving, Dynamaphone another £5.5 million. Where the rest of the £140m came from is not disclosed.

The story got coverage in several papers, including The Guardian. Here’s how the Mirror reported it:

                                                  

If we want to be picky, we could point out that those responsible for the ghostmarket operation pleaded guilty at Southwark Crown Court last November. They were originally arrested in January 2010. However, they were not sentenced until March this year, which almost falls within the past six months.

More interesting is how the Met calculated the £84 million saving. We know that the amounts involved in the ghostmarket case were much lower than this, so the estimate must involve some calculation of how much would have been lost if these miscreants had not been apprehended. The Met told the website Security Vibes that it uses a harm reduction matrix developed in cooperation with academics and PriceWaterhouseCoopers and copyrighted by the Met. It would be nice to know roughly how it works. Citing claims based on your own private methods of calculation is asking to be disbelieved.

But that is nothing new in a field where credulity is taken for granted. In September Norton published a report claiming that cybercrime in 24 countries surveyed cost adults a total of $388 billion a year. Norton makes security software, so it has an interest in showing the problem is huge. Its $388 billion figure is made up of $114b in direct cash costs – money stolen or money spent resolving problems caused by viruses and cyber-attacks - and $274b for victims’ own valuation of the time they lost.

The Norton report – I don’t recommend tying to read it as it is laid out in a way that defies any sensible analysis, or even printing out – is based on a survey of 20,000 people. That sounds a lot but in this field it is nowhere near enough.

Norton’s estimate, however, is by no means extreme. The Organisation for Security and Cooperation in Europe put the annual bill at $100 billion, while Edward Amoroso of AT&T told a Senate Committee in 2009 that cybercrime revenues were approximately $1 trillion.The US Federal Trade Commission estimated identity theft at $47 billion in 2004, $15.6 billion in 2006, and $54 billion in 2008. Why the huge drop in 2006? Identity thieves must have taken a year off.

A paper by two researchers from Microsoft, Dinei Florencio and Cormac Herley, puts all these claims into perspective. Cybercrime, they explain, is extraordinarily hard to measure, and estimates based on surveys peculiarly prone to distortion by a very few people who have claimed to lose a very great deal.

Losses are extremely concentrated, so representative sampling of the population does not give representative sampling of the losses. If you survey 1,000 people and one respondent claims to have lost $50,000 while the rest have lost nothing, the mean loss per individual is $50. Multiply that by 200 million to represent the adult population of the US and the total loss estimate is $10 billion. But this is based on a single answer from a single individual.

A clue to the reliability of any such survey is the difference between the mean loss and the median loss, which gives some idea of the degree of concentration of the losses in a few individuals. Most surveys simply quote the mean or the total, so give no clue to this. The Norton survey doesn’t even give a mean, just a total.

How large would a survey need to be to get around this problem? Unfeasibly large. Florencio and Herley give the example of a 2008 survey of ID theft by the US Department of Justice with a sample size of 56,480. It found that about 5 per cent of the population had suffered such fraud, mostly by traditional means such as a wallet being stolen or a credit card run twice. But just 0.2 per cent had been defrauded by responding to a phishing  e-mail or a phone call. To get an estimate of phishing that was as reliable as that of conventional credit card fraud would therefore require a sample 25 times larger – over a million people. If other forms of cybercrime are even more concentrated, surveys of several million would be needed.

There is potentially a way round this, a multi-layered sampling technique in which the first random sample is followed by surveying a second sample taken from those in whom the feature of interest is concentrated – in this case, those suffering losses from cybercrime. A technique of this sort is used to measure wealth distribution in the US. This would give a better idea of how the losses suffered by the small percentage who do suffer from cybercrime are distributed.

Earlier this year, in response to the National Statistician’s consultation on crime statistics, Straight Statistics suggested that there was a lack of reliable measures of cybercrime, and much exaggeration. In her report Jil Matheson said e-crime presents “particular measurement challenges” and that there is concern that existing statistics do not adequately capture the scale of such crimes (paragraph 4.15). She added (par 4.43) that Action Fraud, run by the National Fraud Authority, provided an avenue through which both the public and the police could record fraud offences.

Disappointing, then, to log on to the Action Fraud website and find the first thing is a survey suggesting that 200,000 people in Britain have fallen victim to “romance frauds” in which they have been chatted up on social networking sites and at some point asked for money. This estimate is based on a sample of 2,000 in an online YouGov poll and must be subject to the same problems as other cybercrime surveys. The estimate of 200,000 represents roughly one in 200 of UK adults, so we can work out that in a survey of 2,000 this result is based on just ten positive responses. Action Fraud itself identified just 592 victims of this form of fraud in 2010-11.

It’s impossible to dissent from Florencio and Hurley’s conclusion: “Are we really producing cybercrime estimates where 75 per cent of the estimate comes from the unverified self-reports of one or two people? Unfortunately, it appears so. Can any faith whatever be placed in the surveys we have? No, it appears not.”