Monthly Archives: September 2015


Is inflation really 8%? ONS web scraping trial problems

*** UPDATE: on 26 October 2015, ONS admitted to a significant error in this report. The revised report can be found here. ***

On 1 September 2015, ONS published a report on “Research indices using web scraped data” (original ONS report here and data here). It was an update on their initial analysis published in June on a trial using web scraped data to compile price indices.

The objectives of this trial appear to be to determine if and how the ONS might be able to use web scraping as an alternative method of data collection. Their hope, like many trying to use big data, is to reduce costs (by cutting out manual data collectors) and to improve quality (by increasing the number of prices checked and their frequency).

The results show that the ONS seems a very long way from perfecting the use of big data. Their report also somewhat confusingly initially focuses on differences between calculating prices indices using some statistical wizardry called “chain linking” and by comparing the average unit prices of products. These have led to a number of media headlines of inflation being severely underestimated e.g. Cost of everyday items for sale in supermarkets rockets 8 per cent in last year (Daily Mirror).

These reports are completely misleading, as I’ll discuss below. ONS should bear some responsibility for the confusion. Their report should probably have majored on what you have to wait until section 4.3 to read i.e. a “comparison with CPI”. The latter shows that when you scrape prices in a limited way and try and replicate what you are doing off-line as best you can, you can end up with fairly similar results. (Though strangely the results published using this method in September differ markedly to what they published in June – with there being no reference in the report to how they have done things differently to make the fit better).

ONS june rep

June report showing a big difference between CPI and its web scraped version

ONS sept rep

 September report showing the two be very similar

But let us return to hype being reported in the media. It is related to the results of the ONS’s attempts to produce price indices on a more frequent basis (e.g. daily or weekly) from the web scraped data. The problem with scraping as ONS have done it (i.e. letting a computer collect the price of everything with the label “whisky” on it from Tesco, Sainsburys and Waitrose’s websites), is that you can end up with a load data that is difficult to analyse and sometimes misleading (e.g. Tesco include rum in with whisky!) There are also problems that supermarkets frequently change their labels for products, products go out of stock or get delisted, and sometimes no data is collected due to computer problems. All of this means that being able to compare prices consistently over a period of time is almost impossible without some sort of major compromise.

The compromises you make then impact the results to get back. The two main ways ONS looked at it were just find the subset of products that they could track at least every month at some point and compare those – called unit price index. This subset is less than a quarter of the original one and sometimes is down to just one line per supermarket (in the case of bananas) – hardly big data then!

The alternative* is to look at each pair of days over the year and match every day and link the results to the previous day with some clever stats – called chain linking. The latter is a ingenious idea, but in practise means that price indices so created drift ever higher – possibly because supermarkets bring in new lines on promotion at a discount price and then when they return to full price, many then get delisted as people stop buying them.

ONS sept rep chain2

Neither of the above is a viable solution for creating a price index and so the ONS goal of maybe producing daily price indices seems a long way off.

Instead what ONS need to take out of this trial is basic issue with big data. To make it work accurately requires a lot of data cleaning/manipulation (called data wrangling). Without that you risk the “rubbish in, rubbish out” scenario that has dogged so many such projects.

Indeed ONS might find that it is actually cheaper and better quality just to send out the interviewers to collect the prices as they do now. Having said that, the ideal solution is probably a mixture of the two i.e. set up an online interface so an intelligent human data collector can go online and collect the prices from supermarkets that have online shops. They can then deal with the product renamings and when required propose substitutions, if products genuinely become delisted. You then end up with similar data to now but collected a bit quicker hopefully. (Note the word “bit”. I suspect the challenging usability of many online shops means that it might actually be quicker to visit the vegetable isle of Tesco and deal with the substitutions in person than try and do it online).

Finally to re-iterate, inflation for food is not running at 8%. The latest CPI figures today estimate it to be -2.4% (RPI -2.0%).

* Note, ONS also report a variant of chain-linking called GEKS which appears to suffer less from higher prices but its results appear inconsistent and so appears no solution either.

Latest UK inflation data – August 2015

The August inflation figures continue to show CPI inflation around zero. Goods inflation is -2%. There are continued downward effects due to reductions in petrol and diesel prices this month (-2.4p and -6.2p respectively). Clothes also went up less in August than last year –though that is partly an artefact of them falling less in the sales earlier this summer.

The reason we did not see negative CPI was mainly due to balancing effects of service inflation; up 2.3% on the year. In addition, furniture prices were higher this month, as were the costs of miscellaneous goods and services – both possibly a reflection of the housing market pick up since the election?

The gap between CPI and RPI

Despite the decline in CPI, RPI nudged higher this month to 1.1%. Indeed the gap between CPI and RPI has widened to its largest in over four years. The increase in the gap is mainly down the different weights used in RPI and CPI. RPI items are weighted by surveys of average people’s expenditure. CPI uses a value model which therefore puts more emphasis on the expenditure of the rich. Therefore this month, as clothes went up less than normal, this lowered CPI (as the rich spend more on clothes). Conversely the furniture price rises caused RPI to go up more due to the higher weights in that model.

More broadly though, the big difference between RPI and CPI is mainly driven by the inclusion of housing in RPI – house inflation is 9% according the Halifax. In addition, CPI uses a statistic averaging method that also causes the inflation rate to appear lower – see discussion about geometric means here.

Outlook for inflation

Next month’s CPI will more than likely be negative – possibly as low as -0.2%. This is due to the continued decline in petrol prices and the British Gas’s recent decision to reduce their prices by 5%. Add to that the continued strength of the pound has had a marked impact on producer prices. Producer input prices are down nearly 14% and their costs of products produced down -1.8% and these will continue to feed through to the costs of goods in the coming months.

That said, this deflation is going to be a temporary blip. We are probably going to see price indices rise significantly in the end of the year. This is because the declines in petrol prices and food prices which peaked in January 2015 will fall out of the equation. Average wage growth of 2.8% may also start to bear down on prices. Therefore it is most likely that CPI will be above 1% in January 2016 and RPI above 2%. They will probably continue to rise towards target by the end of 2016, but this may well be tempered by a continued strength of sterling – especially if rates do finally rise.

The full ONS stats can be found here.