Strategic decisions support through business drive...

Christian_Berg · ‎10-17-2016

Missed the introduction to this series? See Become your organization’s strategic advisor by using Machine Learning and Power BI

Having interviewed and worked with hundreds of business analysts and P&L owners, I am impressed by the capacity of human intuition. People who have been in role for more than a couple of years can often balance multiple complex considerations and quickly make a good decision. This frees up time for the employee. Unfortunately, this spare time is more often used to increase the frequency at which they look at the standard reports, rather than deep analysis to try and challenge their already impressive understanding of the business. A common problem with this is confirmation bias and existing misunderstandings of true drivers can persist or a change in underlying fundamental is overlooked. In the next couple of posts, I will share techniques that I have seen successfully employed to counter common heuristics and to increase the chance of continuous improvement.

Below is an example of a dataset similar to that of a retail chain which a regional store manager shared with me. The initial goal of that project was to change the main business report to update hourly instead of weekly. The store manager had made revenue growth and inventory management the top priorities a couple of years earlier; and believed it had been very successful. Hourly reports on what the stores were ordering was expected to lead to more accurate revenue predictions. The regional manager also wanted to be able to intervene in time if the implicit forecast was too low to meet all-up targets.

An initial look at the business revealed healthy revenue growth and low variations in weeks on hand. A basic analysis however also showed that discounts had increased in size, eroding profits. For descriptive analytics like this, I find it helpful to create a couple of views with one KPI at a time, e.g. marginal profit or revenue and display it for every relevant business hierarchy. For instance, product hierarchy, geography and time. While that is an effective way to show how the business is doing it does little to explain why. For that it tends to be more effective to show correlated KPIs for the same dimension, e.g. revenue vs. profit by store over time.

As a second step, I therefore frequently add a correlation plot. In addition to being simple to understand correlation plots are quick and easy to do. High correlation between two variables means that they move in the same direction, at the same rate. Because it is a relative measure there is rarely a need for normalization. Below is an R correlation plot in Power BI, showing a couple of months’ worth of data from the stores.

The blue to red column to the far right is the legend indicating whether two variables are strongly correlated (dark blue), have no correlation (white), or are inversely correlated (dark red). Large circles in a dark color are thus strongly correlated while small, lightly colored circles indicate weak correlation. The circles appear in the upper-right triangle in the matrix, and their corresponding correlation coefficient appear in the lower-left. For instance, price and discount have a perfect negative correlation.

The visual representation of this can be found on the row “Price” (second row) in the “Discount” column (fourth column). The numeric correlation coefficient is in the “Price” column (2nd) on the “Discount” row (4th).

When we analyze the data using this visual a couple of things pop-out. Revenue and price have low negative correlation, i.e. a decrease in price will typically coincide with an increase in revenue. Revenue and inventory are strongly correlated so days that the stores carry a lot of inventory tend to coincide with high revenue. The key insight for the regional manager however was the -0.51 correlation between price and inventory. Before seeing this correlation plot the regional manager was unaware that the stores had, in response to stronger focus on inventory management, begun managing inventory primarily by changing the price. This was an unintended consequence of the directive which had been issued to improve store managers focus on accurate forecasting. Instead, the store managers had found an inventive way to achieve the same goal using price, thus hurting contribution margins and decreasing customer satisfaction. When the store managers found themselves carrying too much inventory they would simply run a large discount. Conversely when they were running low they would increase prices. By only comparing revenue and inventory this behavior had gone unnoticed by the leadership team.

Using correlation plots for business analytics

You have probably heard that correlation does not equal causation. That is misleading since two things happening at the same time often have a cause and effect relationship but it is important to remember alternative explanations. Let us say that for your business daily revenue for product A and B shows high correlation, e.g. 0.75. So 75% of the changes in revenue for A is captured in the changes for B. That does not necessarily mean that people who buy A has more value from B. Alternative explanations could be:

Buyers of B have more value from A,
Products A and B have similar seasonality (maybe they are both more popular on warm days),
Promotions for A has by chance coincided with promotions for B, etc.

In other words, high correlation might be a coincidence (unlikely with a large enough set of data), might be because one causes the other, or because there is something else which affects both, e.g. product C. Typically the direction of the interaction is known or someone familiar with the business has a strong hypothesis. If not, a common second step is to investigate a difference in timing. If for instance marketing investments happen on average 3 days before you observe an increase in revenue you can be sure that revenue does not drive marketing, but that it is the other way around.

So, while correlation does not in itself prove causation, running a correlation analysis helps you validate your understanding of a business and tells you where to focus next. I typically let the correlations that might be interesting determine which different KPIs I show together for only a few business dimensions to let the data tell the story. In this case inventory by store and product was paired with average price to help the regional manager drive a change in behavior.

How to recreate the report

R visuals will automatically work in Power BI when the report is published to the service but to get them to work in Power BI Desktop you need to have R installed on your computer. R, a third-party program, can be downloaded from CRAN. Once installed, in Power BI Desktop go to File -> Options and settings -> Options -> R Scripting and validate that the R home directory path is the same as what you used to install the base. Please use this link for detailed instructions. You may also want an R dedicated script editor, e.g. RStudio (the Windows installation file is the first link under “Installers for Supported Platforms”.

Once you have R installed you can use R visuals just as you would use any other native visual in Power BI with the addition of an R script that you paste in the editor. An example file is attached to this post.

To create the correlation plot R visual:

Open a pbix file with the relevant data, e.g. my attached example file and click on the + sign in the bottom row to create a new page.
Because an R visual will not display anything until you have added a script, I start by adding a simple standard visual like a table that I add the columns that I want to analyze to. This is achieved by clicking on the desired visual under “Visualizations” and then selecting the columns of interest from the far-right menu.
Correlations are calculated row by row so getting the granularity right is important. In this case I want to understand if a change in inventory for a particular day, product and store is correlated with the average price. To make this level of detail explicit I first added the three columns [Day], [Product] and [StoreId] to the table and for each verified that the aggregation option “Don’t summarize” was used.
Similarly, I selected “Sum” for all the other columns except Price where I used “Average”.
Once I’m happy with the data input (this will be the raw data that R receives) I convert the visual to an R visual by selecting the visual in the canvas and then clicking on the R visualization icon
Add the script using the R script* editor (visible when you select the R visual directly in the canvas). Note, you may have to expand it by clicking on the far-right arrow in the R script editor menu below the canvas.

Once expanded you should see the below, with space for the R script.

Copy/paste the script below into the script editor (any line that starts with # or ## is a comment only and is not executed):

## Use the corrplot library for the cor() function which calculates correlations and has the corrplot() function to visualize the output 
library(corrplot)
## Get rid of the first three columns in dataset since they, in this example, contain categorical values
dataset <- dataset[,-1:-3]
## Calculate correlations between the remaining columns
m <- cor(dataset)
## Plot the result with a light gray background color
corrplot.mixed(m, bg="light gray")

Use the first arrow in the R script editor to manually execute the script (any change/interaction with the report will also cause the script to be rerun.

To recreate this visual for your own data the steps will be almost identical. It is important however to remember to get the granularity right. If you don’t need to do any aggregation you would skip in this case the first three columns [Day], [Product] and [StoreId], set all of the other fields to “Don’t summarize” and delete the line of code that excludes the first three columns from the calculation, namely “dataset <- dataset[,-1:-3]” and its preceding comment, i.e. so that the script looks like:

## Use the corrplot library for the cor() function which calculates correlations and has the corrplot() function to visualize the output 
library(corrplot)
## Calculate correlations between the remaining columns
m <- cor(dataset)
## Plot the result with a light gray background color
corrplot.mixed(m, bg="light gray")

Links and downloads

Understanding the correlation coefficient: https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

Example of correlation plot in the R showcase: http://community.powerbi.com/t5/R-Script-Showcase/Correlation-Plot/m-p/58462

Additional R installation instructions for Power BI: http://powerbi.tips/2016/09/using-r-visuals-in-power-bi/

* Third-party programs. This software enables you to obtain software applications from other sources. Those applications are offered and distributed by third parties under their own license terms. Microsoft is not developing, distributing or licensing those applications to you, but instead, as a convenience, enables you to use this software to obtain those applications directly from the application providers.

By using the software, you acknowledge and agree that you are obtaining the applications directly from the third-party providers and under separate license terms, and that it is your responsibility to locate, understand and comply with those license terms. Microsoft grants you no license rights for third-party software or applications that is obtained using this software.

jeffpbi · ‎10-25-2016

Hi Christian,

Thanks for the great series. I'm enjoying trying them out. Last week, I tried this with my own data and initially got an R script error. I re-read the post and saw the recommendation to start with a table. When I changed the visual from “R” to table I saw that there were lots of empty rows for two of the columns. I filtered those out using the Power BI native Page level filters and then converted it back to an “R” visual and this time it worked.

Is there any way to automate the remove of empty rows from data set using R? That would be of great help, as many data sets contain empty cells or rows.

Thanks,

Jeff

Christian_Berg · ‎10-25-2016

Hi Jeff,

I'm glad you found it useful and I'll be adding a post on business performance reporting today.

Regarding your question about alternative ways to get rid of blank rows or otherwise incompatible entries you can use the R function na.omit() instead of the native Power BI filters. For instance, adding the below line to the top of your script will get rid of all rows that have any blank values:
dataset <- na.omit(dataset)

Thank you

Christian

patoduck · ‎11-11-2016

Hi Jeff,

Could you please do an example with Association Rules package (arules).

Thanks for all you efforts, really nice!

Patricio

Christian_Berg · ‎11-18-2016

Hi Patricio

Next week I'll write a post on multivariate analysis which is very similar to the arules package. If you think it would still be valuable to see an arules example after reading that, may I ask what your use case is?

Thank you

Christian

patoduck · ‎11-18-2016

I have a table with Transc. IDs and Product Name. The short solution will be to save that file as CSV and then use arules or arulesViz package using a R Script Visual in Power BI, but
1. If I send the Pbix file to another user, that person will need access to that folder.
2. Arules use the read.transactions that calls for a file, not the : 'dataset' R Visual produce in PBID.
Just think about it, how many users had the same problem? We all get data from stores, with 'simple' format (Transac ID and Product Name) so this is a clear business scenario.

I had a chat with an eng. from Microsoft, it could be the case that an advanced Association Rules Visual will be built, but the way I see it, it is only you and I thinking about it, therefore a good post about Arules and AruleViz will be an “Eye opener” for many analyst, that as soon they realize about this shortcoming, will hopefully pressure MS to expedite a new arules visual.
Thanks agains and hope you give it a try!
BR
patoduck at gmail dot com

jasent · ‎01-29-2017

Amazing article. Having R and make it availabel for the general users is great. Just need to add that there were issues with using "corrplot" library.

What is needed is just to make library folder \MRO-3.3.2\library with full access for every one or your user. In this way PowerBI gets the library by it self.

ashokpershad · ‎10-17-2017

Good Article..

Strategic decisions support through business driver analysis

🏆 Power BI Data Visualization World Championships...

🏆 Power BI DataViz World Championships | Week 1 C...

🏆 Power BI DataViz World Championships | Week 2 C...

How to show slicer selected values in Power BI Rep...

Developing an Azerbaijan Shape Map

New Offer! Become a Certified Fabric Data Engineer