Measuring forecast accuracy

Is the most accurate forecast always the best?
In the era of constant change, retailers rely heavily on demand forecasting, the centerpiece of the planning process. High-quality forecasts help grocery retailers obtain higher on-shelf availability, minimize storage costs, reduce markdowns and food wastage while maintaining store traffic and increasing customer loyalty. However, more often than not retailers find themselves calculating losses related to overstocks and out-of-stocks - results of poor forecasting.

But what does "poor forecasting" actually mean? How to measure forecast quality? What metrics to choose? And is the most accurate forecast always the best? Responding to these widely asked questions, the current article explores the pitfalls of measuring forecast quality.

How to measure forecast quality?

Forecast quality is strongly associated with forecast accuracy. Routinely when saying "quality" one means accuracy. Most commonly used metrics to measure the accuracy of the forecast are MAPE (Mean absolute percentage error) and WAPE (Weighted absolute percentage error). Both MAPE and WAPE measure forecast error, meaning that retailers should target 0 or 0%. Larger numbers, on the contrary, indicate larger error. Let's look at MAPE first.

Basically, MAPE is a forecast error in relation to the sales volume. It shows the average number of percentage points between forecasts and actual sales: the average percentage of error. As it follows from the name, the metric shows mean value and does not distinguish between different SKUs giving each item in the forecast an equal weight. MAPE is easy to calculate and it is well-suited for the homogeneous sample of the forecasted items.
However, when it comes to heterogeneous sample - slow-moving and fast-moving goods with different sales volumes - forecasting error can significantly differ from item to item and the resulting error percentages add up quickly. Let's have a look at the hypothetical example of a fast-moving item with high sales volumes such as regular milk and a typical slow-mover with lower sales volume such as coconut milk:
While the forecast for regular milk is quite accurate at 5%-error, the error of 100% for the coconut milk shifts the average metric to 52.5%. At the same time, the consequences of forecasting error of 100% when you sell only one item per day are different from those when you sell 100 items per day. In this particular example, 100%-error is not really that bad considering sales volumes. In other words, the resulting error of 52.5% is biased by the difference in sales volumes.

To avoid it, a weighted absolute percentage error - WAPE - can be used. It weighs the forecast error adding sales volumes to the equation:
If we calculate WAPE for the same data, the forecast error for the two products will be 3.96%. This figure takes into account higher sales volume of regular milk, thus giving it more weight.
On the one hand, it makes sense to use WAPE if you want to give more value to items that drive sales. On the other hand, however, you risk losing the opportunity to rapidly detect changes in slow-moving goods' performance. That being said, which metric to use - MAPE or WAPE - depends on the type of goods and their sales volumes, as well as retailers' business priorities.

Is the most accurate forecast always the best?

Forecasting competitions have become a common practice among retailers choosing between various providers of demand forecasting solutions. Contestants are challenged to forecast demand based on a historical dataset provided by retailers, and the most accurate forecast wins. However, does the most accurate always mean the best?To answer this simple yet tricky question, let's look at the example. On the graph below you can see actual sales of fresh milk and a moving average as a basic forecast. MAPE equals 34%.
Now let's shift our sales one day forward. Forecasted sales tomorrow are now equal to milk sales today. While it can be hardly named a forecast, MAPE - one of the most commonly used metrics to assess forecast accuracy - increased just slightly, to 38%.
Moreover, classical forecasting metrics are prone to become self-fulfilling forecasts. To illustrate this point, let's look at another example. The graph below shows sales of bread in a hypermarket chain:
where:
  • real sales are actual sales,
  • potential sales are amount that could have been sold if it weren't for out-of-stocks,
  • forecast is a forecasted amount of sales,
  • error is a mean average percentage error (MAPE) of real sales,
  • real error is a mean average percentage error for potential sales.
If the amount of safety stocks is insufficient and it results in out-of-stock, the range of forecast error is limited by the number of safety stocks, not the amount of potential sales (amount that could have been sold if it weren't for out-of-stocks). The difference between the two can be significant: in the given example, the average error equals 33.4% while the average real error runs up to 41.4%.

In these two examples statistical metrics give little or distorted information on how they will actually influence the business, leaving retailers with tied hands. We are not saying accuracy is not important. MAPE and WAPE are essential to track forecast performance. It turns out, they are just not the only metrics to look at.

Business metrics above all

Despite being most commonly used, MAPE and WAPE have one important drawback. MAPE and WAPE calculate the percentage of error but don't distinguish between overstocks and out-of-stocks. These metrics are symmetric.

However, from the retailer's perspective, overstocks and out-of-stocks affect business differently. In other words, the cost of forecasting error equal to 10 wholesale packages of milk is not the same for stockouts and overstocks. While the former are potential losses that include loyalty losses and a decrease in store traffic, the latter consists of storage costs, cost of capital, and write-offs.
Since standard metrics don't account for it, retailers cannot really manage the way forecasts affect their business. As MAPE and WAPE are not enough to measure the quality of the forecast, other metrics should be added to the equation.

Seeking for the answer, we turned our attention to other industries that work with big amounts of data. Search engines, for example, use dozens of different metrics to measure the quality of search results, personalization, advertising, etc. All these metrics have one thing in common – they are aligned with the business tasks and metrics.

The same should be true for retail: grocers' business is run on turnover and on-shelf availability, customer loyalty and store traffic, and not abstract statistical metrics. In order to be efficient, forecasting metrics should be aligned with business ones. What metrics to choose, totally depends on retailers' priorities. Facing overstock, grocer may optimize for the number of write-offs or costs of markdowns. If routinely running low on items is an issue, retailer may consider optimizing for lost sales or cases of out-of-stocks. If sustainability is a number-one priority, grocer can minimize food wastage.

Closing thoughts

Demand forecasting is a barebone of every retailer's business: it is essential for managing supply chain, planning sales, and shaping customer loyalty. Nobody would say that getting accurate and timely forecast is easy. It turns out, however, that assessing the accuracy of the forecast can be an equally challenging task. To summarize, here are a few principles to bear in mind when measuring forecast accuracy:

  • MAPE and WAPE are most commonly used metrics to measure forecast accuracy. Which metric to use depends on the type of goods and their sales volumes, as well as retailers' business priorities.

  • The most accurate forecast doesn't always mean the best forecast. MAPE and WAPE are symmetric: while calculating the percentage of error, these metrics don't distinguish between overstocks and out-of-stocks.

  • Statistical metrics should be aligned with the business ones. Depending on the retailer's priorities, these metrics may include the number of write-offs, costs of markdowns, the number of out-of-stocks, lost sales, or food wastage.

  • The combination of statistical and business metrics gives an opportunity to make better decisions on the optimal quantity of each product to be ordered. After all, it is not the percentage of error but the cost of error that makes a difference.
Alexey Shaternikov
CEO and Chief data scientist at DSLab
Lada Trimasova
Head of Predictive analytics group at DSLab

More posts you may find helpful

Contact us to learn more about ML-based demand forecasting and how it can benefit your business