Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
Missed the previous parts of this series? See Become your organization’s strategic advisor by using Machine Learning and Power BI
Forecasting is an important part of most management positions. Knowing how many products will sell where and when, informs resource decisions like staffing and inventory. It is also an intuitive way to compare performance between different business units, product categories etc. For P&L owners it is an essential tool to proactively manage their margin by adapting investment levels.
Building on the previous post the first forecasting technique that we will explore will be a time series projection. Like the decomposition that we looked at in the previous post it can be conceptually understood as extrapolating and summing the seasonality and trend vectors. Not unexpectedly, these types of forecasts perform well for businesses with persistent trends and seasonality. Unlike a regression model which can take planned changes into account these types of models only considers the past. In a later post we will look at alternative forecasting models if a large change is planned for the business.
A popular technique for time series based forecasting is exponential smoothing, and the built-in forecasting function in Power BI uses an implementation of this algorithm. Here we will use it to predict revenue and quantity for our sample data. Below is an example that I created by duplicating the page from my last post and changing the scripts of the R visuals to forecast one year forward using first a multiplicative Holt-Winters model and secondly an exponential smoothing with additive seasonality.
These charts were all made with a categorical column and the sum of revenue. The time-series forecasts were made with an R-visuals using the below script.
In the two R visuals with the forecasts the blue line is the actual values to date, the red line is the mean of the predicted values and the shaded areas represents the 75 and 95% confidence intervals respectively. As you can see the shaded areas are in both cases large. This is indicative of limited forecast accuracy, i.e. there is not that much explanatory power in the historic revenue numbers alone. For this dataset, this is because of multiple historic deliberate experimentations. In a later post, we will explore alternative forecasting models that improve the prediction accuracy by taking such factors into account.
Using the report
For this report to work in Power BI Desktop you need the package forecast installed. If you have not previously used this package on your computer, install it by going to your R console in R Studio or other R GUI and type install.packages("forecast") and hit the Enter key.
By clicking on and drilling into the different stores and products we can switch between forecasting a single store, product or using the drill-down arrow for a combination of both, e.g. Product 5 and store 3: a combination where the two methodologies suggest disparate trends and high uncertainty.
Just as with the decomposition in the previous post a time series based forecast is non-additive, e.g. the sum of forecasts for the individual products is unlikely to exactly match the forecast for the business all-up, due to the Law of large numbers cancelling out some noise when the products are first summed. For that reason, a dynamic setup like Power BI is a great advantage, offering the ability to quickly click through the different business areas, product classes, geographies and automatically getting a forecast for each, as well as the more accurate aggregates for any underlying combination. This works because the basic Power BI functionality filters the underlying data model, only sending the R script the data that has been selected. The R script then recalculates the new subset, updating the forecast. This will also happen when you filter on time, by limiting how much history the script can analyze. That can be desirable, e.g. to filter out a year with large anomalies but it can just as well decrease accuracy by limiting the available data too much.
An accurate forecast for each part of the business can inform resource allocation decisions, how much inventory to carry as well as serve as a leading indicator of a growing business opportunity or risk.
Recreating the report
R visuals will automatically work in Power BI when the report is published to the service, where the most common packages have already been installed, but to get them to work in Power BI Desktop you need a local R installation (please see my previous post for installation instructions).
To build this report I duplicated the page “TimeSeriesAnalysis” by right-clicking on the page tab and selecting “Duplicate Page”. I then rearranged the visuals and changed the R script* to:
## Load the library "forecast" to gain access to the exponential smoothing function ets() ## If you have not installed this package you will get an error message saying that the library was not found. In that case install it by typing: install.packages("forecast") directly in your R Console. require(forecast) ## Convert the column "Revenue" in the Power BI input "dataset" into a time series object with the right periodicity (12 for months per year) ## as.numeric(substr(dataset[1,'Date'],0,4)) returns 2013, the first year in the dataset which helps the user understand the time scale myts <- ts(dataset[,'Revenue'], start = as.numeric(substr(dataset[1,'Date'],0,4)), frequency= 12) ## Create the ets model with the right parameters, type ?ets in your R console after loading the forecast library for details myets <- ets(myts, "ZZA") ## The second argument in the function forecast() defines how many values you want to forecast. Since this is monthly data 12 is a whole year ahead ## Level determines which confidence intervals to plot, c(75,95) sets them to 75% and 95%. myprediction <- forecast(myets, 12, level = c(75,95)) ## Find the last row with actual values (to know which color to use for the plot) lastValue = tail(myprediction$x,1) ## Get the mean value for all future fitted dates myprediction$mean=ts(c(lastValue,myprediction$mean), frequency = frequency(myprediction$mean), end=end(myprediction$mean)) ## Populate the upper values for all future fitted dates myprediction$upper=ts(rbind(c(lastValue,lastValue),myprediction$upper), frequency = frequency(myprediction$upper), end=end(myprediction$upper)) ## And the same for the the lower range values for all future fitted dates myprediction$lower=ts(rbind(c(lastValue,lastValue),myprediction$lower), frequency = frequency(myprediction$lower), end=end(myprediction$lower)) ## Plot the results with historical values in blue and future values in red with shaded confidence intervals plot(myprediction, lwd=2, col="blue", fcol="red", flwd=2, shaded=TRUE, col.sub = "gray50", cex.sub=0.75, xlab = "", ylab = "Revenue")
The ets implementation is documented here. The "myets" model parameter can be changed, e.g. to “MAM” for a multiplicative Holt-Winters model, “MNA” for exponential smoothing with additive seasonality or “ZZZ” for a fully automated model (the ets function will test all three options for each “Z” that it encounters and use the best fitting option).
To change one of the R visuals to forecast quantity instead of Revenue:
In step 5 you can also use a relative reference to make the script column name agnostic, i.e. “myts <- ts(dataset[,2], start = as.numeric(substr(dataset[1,'Date'],0,4)), frequency= 12)” will always use the second column in the input. For a more robust example on how to handle different date options in Power BI I recommend taking a look at the Forecasting example in the R showcase on PowerBI.com.
Tips and tricks
You may want to zoom in on the x-axis’ last years to make the forecast more prevalent. Rather than using the Power BI controls to change this (which may decrease forecast accuracy) you can achieve this by modifying the plot command on the last row to:
plot(myprediction, lwd=2, col="blue", fcol="red", flwd=2, shaded=TRUE, xlim=c(2015, 2018), col.sub = "gray50", cex.sub=0.75, xlab = "", ylab = "Revenue")
However, this may yield an x-axis with decimals. To solve that we add:
plot(myprediction, lwd=2, col="blue", fcol="red", flwd=2, shaded=TRUE, xlim=c(2015, 2018), xaxt = "n", col.sub = "gray50", cex.sub=0.75, xlab = "", ylab = "Revenue")
axis(1, at = 2015:2018, labels = 2015:2018)
The xaxt=”n” parameter initially suppresses the x-axis. The second line of script sets the x-axis using the function axis().
If you are hitting an R script error and are struggling to find the root cause the second arrow in the R script editor moves the currently selected data to your default script editor.
Links and downloads
Forecasting in the R showcase on Power BI
Local Polynomial Regression Fitting
* Copyright (c) Microsoft Corporation. All rights reserved.
Third-party programs. This software enables you to obtain software applications from other sources.
Those applications are offered and distributed by third parties under their own license terms.
Microsoft is not developing, distributing or licensing those applications to you, but instead,
as a convenience, enables you to use this software to obtain those applications directly from
the application providers.
By using the software, you acknowledge and agree that you are obtaining the applications directly
from the third-party providers and under separate license terms, and that it is your responsibility to locate,
understand and comply with those license terms.
Microsoft grants you no license rights for third-party software or applications that is obtained using this software.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.