Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Purpose of this blog post is to demonstrate how to use R script to load and prepare data source in Power BI desktop. R is open source powerful statistical analysis and visualisation language. Integrating R in Power BI lets you undertake complex data manipulation tasks. In this post I am going to use two sample csv dataset that contains raw data for import and export amounts for various Australian territories, and using R transform it to right format to be visualised in Power BI desktop. I am only going to cover basics of R to get data into the shape required so that you can use more interesting functions of R in future.
Prerequisites
For this post I am assuming you have basic understanding of R and its concepts such as data frame, packages (though we will not require any special package to be installed for this tutorial) and its syntax. Almost everything that you can think of can be done with functions in R and for this tutorial we are only going to work with base R functions.
Setup
Download and install latest version of R for windows from https://cran.r-project.org/bin/windows/base/.
Once R is installed in powerbi desktop ensure R data source is pointed to the correct location. If you have selected default location during R installation no change is required however if you’ve used custom location then you need to point R data source in powerbi desktop accordingly. To update that in powerbi desktop go to File > Options and Settings > Options > R Scripting > Detected R home directories > Other and browse to the location where R is installed.
Import
To import data into powerbi using R, go to Get Data > Other > R Script > Connect which will give you dialog box to enter in R scripts. Files I've used for this post is attached below.
Copy and paste below code that will import contents of csv file into a table called input and see in preview navigator. Using ‘~//’ will look for csv file in the default folder of R installation which is in Documents folder however you can specify custom path using double backslashes for example C:\\Users\\abc\\Desktop\\AU_Import.csv. It is also good practise to explicitly mention separator and you can use comma or tab delimiter as required.
#use read.csv method to import data input <- read.csv("~//AU_Import.csv", stringsAsFactors=FALSE, sep=",")
Formatting & Sorting
Majority of times not all the data available is useful and you may only need the small subset of data from the actual file. Here we are only interested in columns Frequency, Commodity, Time and Value. Also changing column names to more meaningful ones.
#use read.csv method to import data input <- read.csv("~//AU_Import.csv", stringsAsFactors=FALSE, sep=",")
#excluding unwanted columns and change column names keepcolumns <- c("State.of.Destination","Frequency","Commodity.by.SITC","Time","Value") input <- input[,keepcolumns] colnames(input)[3] <- c("Commodity") colnames(input)[1] <- c("State") colnames(input)[5] <- c("ImportValue")
If you notice Time column has values in 'mmm-yyyy’ format which will be recognized as characters by R and text based field in power bi and therefore won’t allow time based slicing and dicing. To convert values into proper date format we will make use of several R functions. Using Paste, Substr functions to extract month name out of values in Time column, convert it to month number, append ‘01’ to it and paste that to last 5 characters of Time column resulting in Date column in ‘dd-mm-yyyy’ format. Time column is then not required and can be removed.
#use read.csv method to import data input <- read.csv("~//AU_Import.csv", stringsAsFactors=FALSE, sep=",")
#excluding unwanted columns and change column names keepcolumns <- c("State.of.Destination","Frequency","Commodity.by.SITC","Time","Value") input <- input[,keepcolumns] colnames(input)[3] <- c("Commodity") colnames(input)[1] <- c("State") colnames(input)[5] <- c("ImportValue")
#extract monthname input$Date <- paste(substr(input$Time,nchar(input$Time)-3,nchar(input$Time)), paste(match(substr(input$Time, 1,3), month.abb), "01", sep="-"), sep="-")
#Time column no longer required attach(input) input$Time <- NULL
We will then use Date column to sort using Order function. By default sorting is ascending.
#use read.csv method to import data input <- read.csv("~//AU_Import.csv", stringsAsFactors=FALSE, sep=",") #excluding unwanted columns and change column names keepcolumns <- c("State.of.Destination","Frequency","Commodity.by.SITC","Time","Value") input <- input[,keepcolumns] colnames(input)[3] <- c("Commodity") colnames(input)[1] <- c("State") colnames(input)[5] <- c("ImportValue") #extract monthname input$Date <- paste(substr(input$Time,nchar(input$Time)-3,nchar(input$Time)), paste(match(substr(input$Time, 1,3), month.abb), "01", sep="-"), sep="-") #Time column no longer required attach(input) input$Time <- NULL #change data type of Date column to date input$Date <- as.Date(input$Date) #sort by date column input[order(Date),]
Filtering
As we’re only interested in state level data, using subset function we can filter out rows that is not useful. Using ‘!=’ operator and ‘&’ operator to specify multiple conditions we are going to excluded rows where there is no state details, commodity totals and state totals. Using Omit function will remove all rows where value is null.
#use read.csv method to import data input <- read.csv("~//AU_Import.csv", stringsAsFactors=FALSE, sep=",") #excluding unwanted columns and change column names keepcolumns <- c("State.of.Destination","Frequency","Commodity.by.SITC","Time","Value") input <- input[,keepcolumns] colnames(input)[3] <- c("Commodity") colnames(input)[1] <- c("State") colnames(input)[5] <- c("ImportValue") #extract monthname input$Date <- paste(substr(input$Time,nchar(input$Time)-3,nchar(input$Time)), paste(match(substr(input$Time, 1,3), month.abb), "01", sep="-"), sep="-") #Time column no longer required attach(input) input$Time <- NULL #change data type of Date column to date input$Date <- as.Date(input$Date) #sort by date column input[order(Date),] #Use subset function to exclude rows input <- subset(input, Commodity!="TOTAL" & State!="No state details" & State!="Total") #Use omit function to remove null rows input <- na.omit(input)
Similarly we will prepare export data into a data frame called output using above code.
#use read.csv method to import data output <- read.csv("~//AU_Export.csv", stringsAsFactors=FALSE, sep=",") #excluding unwanted columns and change column names keepcolumns <- c("State.of.Origin","Frequency","Commodity.by.SITC","Time","Value") output <- output[,keepcolumns] colnames(output)[3] <- c("Commodity") colnames(output)[1] <- c("State") colnames(output)[5] <- c("ExportValue") #extract monthname output$Date <- paste(substr(output$Time,nchar(output$Time)-3,nchar(output$Time)), paste(match(substr(output$Time, 1,3), month.abb), "01", sep="-"), sep="-") #Time column no longer required attach(output) output$Time <- NULL #change data type of Date column to date output$Date <- as.Date(output$Date) #sort by date column output[order(Date),] #Use subset function to exclude rows output <- subset(output, Commodity!="TOTAL" & State!="No state details" & State!="Total") #Use omit function to remove null rows output <- na.omit(output)
Finally we want to merge these two data frames into single one to be used in power bi. We will use merge function for that and specify columns we want to perform match on.
#use read.csv method to import data input <- read.csv("~//AU_Import.csv", stringsAsFactors=FALSE, sep=",") #excluding unwanted columns and change column names keepcolumns <- c("State.of.Destination","Frequency","Commodity.by.SITC","Time","Value") input <- input[,keepcolumns] colnames(input)[3] <- c("Commodity") colnames(input)[1] <- c("State") colnames(input)[5] <- c("ImportValue") #extract monthname input$Date <- paste(substr(input$Time,nchar(input$Time)-3,nchar(input$Time)), paste(match(substr(input$Time, 1,3), month.abb), "01", sep="-"), sep="-") #Time column no longer required attach(input) input$Time <- NULL #change data type of Date column to date input$Date <- as.Date(input$Date) #sort by date column input[order(Date),] #Use subset function to exclude rows input <- subset(input, Commodity!="TOTAL" & State!="No state details" & State!="Total") #Use omit function to remove null rows input <- na.omit(input) #use read.csv method to import Export data output <- read.csv("~//AU_Export.csv", stringsAsFactors=FALSE, sep=",") #excluding unwanted columns and change column names keepcolumns <- c("State.of.Origin","Frequency","Commodity.by.SITC","Time","Value") output <- output[,keepcolumns] colnames(output)[3] <- c("Commodity") colnames(output)[1] <- c("State") colnames(output)[5] <- c("ExportValue") #extract monthname output$Date <- paste(substr(output$Time,nchar(output$Time)-3,nchar(output$Time)), paste(match(substr(output$Time, 1,3), month.abb), "01", sep="-"), sep="-") #Time column no longer required attach(output) output$Time <- NULL #change data type of Date column to date output$Date <- as.Date(output$Date) #sort by date column output[order(Date),] #Use subset function to exclude rows output <- subset(output, Commodity!="TOTAL" & State!="No state details" & State!="Total") #Use omit function to remove null rows output <- na.omit(output) #Use merge function to merge two dataframes Combined <- merge(input,output,by=c("State","Frequency","Commodity","Date"))
Transforming
To calculate percentage values for each row in Import and Export columns, using basic arithmetic operation we will divide each value by sum of values in that column and then format it to display with percentage sign using Paste function.
#use read.csv method to import data input <- read.csv("~//AU_Import.csv", stringsAsFactors=FALSE, sep=",") #excluding unwanted columns and change column names keepcolumns <- c("State.of.Destination","Frequency","Commodity.by.SITC","Time","Value") input <- input[,keepcolumns] colnames(input)[3] <- c("Commodity") colnames(input)[1] <- c("State") colnames(input)[5] <- c("ImportValue") #extract monthname input$Date <- paste(substr(input$Time,nchar(input$Time)-3,nchar(input$Time)), paste(match(substr(input$Time, 1,3), month.abb), "01", sep="-"), sep="-") #Time column no longer required attach(input) input$Time <- NULL #change data type of Date column to date input$Date <- as.Date(input$Date) #sort by date column input[order(Date),] #Use subset function to exclude rows input <- subset(input, Commodity!="TOTAL" & State!="No state details" & State!="Total") #Use omit function to remove null rows input <- na.omit(input) #use read.csv method to import Export data output <- read.csv("~//AU_Export.csv", stringsAsFactors=FALSE, sep=",") #excluding unwanted columns and change column names keepcolumns <- c("State.of.Origin","Frequency","Commodity.by.SITC","Time","Value") output <- output[,keepcolumns] colnames(output)[3] <- c("Commodity") colnames(output)[1] <- c("State") colnames(output)[5] <- c("ExportValue") #extract monthname output$Date <- paste(substr(output$Time,nchar(output$Time)-3,nchar(output$Time)), paste(match(substr(output$Time, 1,3), month.abb), "01", sep="-"), sep="-") #Time column no longer required attach(output) output$Time <- NULL #change data type of Date column to date output$Date <- as.Date(output$Date) #sort by date column output[order(Date),] #Use subset function to exclude rows output <- subset(output, Commodity!="TOTAL" & State!="No state details" & State!="Total") #Use omit function to remove null rows output <- na.omit(output) #Use merge function to merge two datasets Combined <- merge(input,output,by=c("State","Frequency","Commodity","Date")) #New columns calculating percentage values for each Import and Export Combined$ImportValuePct <- Combined$ImportValue / sum(Combined$ImportValue) Combined$ExportValuePct <- Combined$ExportValue / sum(Combined$ExportValue)
#Format columns to show values in percentage Combined$ImportValuePct <- paste(round(Combined$ImportValuePct*100,digits=1),"%",sep="") Combined$ExportValuePct <- paste(round(Combined$ExportValuePct*100,digits=1),"%",sep="")
As a result of above code three tables will be created in power bi however we will only be interested in table Combined which is the outcome of above transformations.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.