Every respected computer scientist has heard about recursive functions (more details on Wiki: https://en.wikipedia.org/wiki/Recursion). There are many algorithms which are mentioned as example of such functions, i.e. Fibonacci sequence as the simplest example:
Fib(0) = 0
Fib(1) = 1
For all integers n > 1, Fib(n) = Fib(n-1) + Fib(n-2)
How do we rewrite this pseudo code into a function in PowerQuery?
let
fnFibonacci = (value as number) as number =>
if value = 0 then
0
else if value = 1 then
1
else
@fnFibonacci(value - 1) + @fnFibonacci(value - 2),
Source = List.Transform({0..10}, fnFibonacci)
in
Source
The most important part is the use of @ before we call the recursive function. We tell the PowerQuery to reference its own name from inside itself.
Let’s test our function:
Well, that is great but not very useful in real life. I have been searching for long time for a scenario which can use recursive functions in a meaningful way. And I have found one!
In PowerQuery, you can select or rename many columns in one step. Moreover, you can change their data types at once. But repetitive modifications of a table are not so easy anymore.
In my ETL process written in PowerQuery, I wanted to split every datetime column in a fact table into 2 separate columns: date and time. It has at least 2 big advantages. First of all, date and time values become a foreign key for my date and time dimensions; second, the size of my dataset will decrease dramatically. This means that instead of many unique values I’ll get only 365 unique values a year for date dimension and 86 400 unique values for time dimension with the precision of one second.
What are my options? I can create a new custom column of date datatype for every datetime column and another custom column for time. Yes, I can, but I do not like doing a repetitive work. Let’s try to automate it.
Firstly, we need some test data.
// create a table with some datetime columns
Source = Table.FromRecords(
{
[OrderID = 1, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,10,0,0), OrderPaid_dt = #datetime(2019,6,16,10,5,0), OrderShipped_dt = #datetime(2019,6,16,11,0,0), Price = 100.0],
[OrderID = 2, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,12,12,12), OrderPaid_dt = #datetime(2019,6,16,13,13,13), OrderShipped_dt = null, Price = 200.0]
},
type table[OrderID = Int64.Type, CustomerID = Int64.Type, OrderPlaced_dt = DateTime.Type, OrderPaid_dt = DateTime.Type, OrderShipped_dt = DateTime.Type, Price = Decimal.Type]
),
Next, let’s create a function, which has 2 parameters - a source table and a name of a datetime column. This function does the same what you would do in the UI. It creates 2 new columns and removes the original one.
// split a datetime column into 2 columns: date and time
fnSplitDateTimeColumn = (parTbl as table, parColumnName as text) as table =>
let
// add a new column with date
transformDateExpression = Expression.Evaluate("each Date.From([" & parColumnName & "])", #shared),
addDateColumn = Table.AddColumn(parTbl, parColumnName & "_date", transformDateExpression, type date),
// add a new column with time
transformTimeExpression = Expression.Evaluate("each try #time(Time.Hour([" & parColumnName & "]),Time.Minute([" & parColumnName & "]),Time.Second([" & parColumnName & "])) otherwise null", #shared),
addTimeColumn = Table.AddColumn(addDateColumn, parColumnName & "_time", transformTimeExpression, type time),
// remove datetime column
removeDateTimeColumn = Table.RemoveColumns(addTimeColumn, parColumnName)
in
removeDateTimeColumn,
And finally, in the last step we create another function which is recursive. This function has also 2 parameters - a source table and a list of all datetime column names which we haven’t transformed yet. The function takes the first item from the list (a datetime column name), splits this datetime column into 2 columns, and calls itself recursively. The recursive call uses the last step as the new source table and a list of all datetime column names except the first one (which is already processed). Once the list is empty, the recursion terminates. The recursion function returns a modified source table – instead of one datetime columns there are now 2 columns.
// recursive function which splits all datetime columns into date and time columns
// parTbl is a source table to modify, parColumnNameList is a list of columns to split
fnSplitAllDateTimeColumns = (parTbl as table, parColumnNameList as list) as table =>
// if parColumNameList is empty, terminate the recursion
if List.IsEmpty(parColumnNameList) then
parTbl
else
let
// get one column name to process
currentColumnName = List.First(parColumnNameList),
// remove first item from the parColumnNameList
nextColumNameList = List.RemoveFirstN(parColumnNameList, 1),
// split current column
splitOneColumnTable = fnSplitDateTimeColumn(parTbl, currentColumnName),
// call itself recursively with a new created table and a shortend column name list
nextIterationTable = @fnSplitAllDateTimeColumns(splitOneColumnTable, nextColumNameList)
in
nextIterationTable,
And everything together:
let
// create a table with some datetime columns
Source = Table.FromRecords(
{
[OrderID = 1, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,10,0,0), OrderPaid_dt = #datetime(2019,6,16,10,5,0), OrderShipped_dt = #datetime(2019,6,16,11,0,0), Price = 100.0],
[OrderID = 2, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,12,12,12), OrderPaid_dt = #datetime(2019,6,16,13,13,13), OrderShipped_dt = null, Price = 200.0]
},
type table[OrderID = Int64.Type, CustomerID = Int64.Type, OrderPlaced_dt = DateTime.Type, OrderPaid_dt = DateTime.Type, OrderShipped_dt = DateTime.Type, Price = Decimal.Type]
),
// split a datetime column into 2 columns: date and time
fnSplitDateTimeColumn = (parTbl as table, parColumnName as text) as table =>
let
// add a new column with date
transformDateExpression = Expression.Evaluate("each Date.From([" & parColumnName & "])", #shared),
addDateColumn = Table.AddColumn(parTbl, parColumnName & "_date", transformDateExpression, type date),
// add a new column with time
transformTimeExpression = Expression.Evaluate("each try #time(Time.Hour([" & parColumnName & "]),Time.Minute([" & parColumnName & "]),Time.Second([" & parColumnName & "])) otherwise null", #shared),
addTimeColumn = Table.AddColumn(addDateColumn, parColumnName & "_time", transformTimeExpression, type time),
// remove datetime column
removeDateTimeColumn = Table.RemoveColumns(addTimeColumn, parColumnName)
in
removeDateTimeColumn,
// recursive function which splits all datetime columns into date and time columns
// parTbl is a source table to modify, parColumnNameList is a list of columns to split
fnSplitAllDateTimeColumns = (parTbl as table, parColumnNameList as list) as table =>
// if parColumNameList is empty, terminate the recursion
if List.IsEmpty(parColumnNameList) then
parTbl
else
let
// get one column name to process
currentColumnName = List.First(parColumnNameList),
// remove first item from the parColumnNameList
nextColumNameList = List.RemoveFirstN(parColumnNameList, 1),
// split current column
splitOneColumnTable = fnSplitDateTimeColumn(parTbl, currentColumnName),
// call itself recursively with a new created table and a shortend column name list
nextIterationTable = @fnSplitAllDateTimeColumns(splitOneColumnTable, nextColumNameList)
in
nextIterationTable,
// get all columns having the datatype datetime
DateTimeColumnNames = Table.ColumnsOfType(Source, {type datetime}),
// split all datetime columns
SplitAllDateTimeColumns = fnSplitAllDateTimeColumns(Source, DateTimeColumnNames)
in
SplitAllDateTimeColumns
And the result:
Is it a lot of code for such a simple task which you can manage in PowerQuery Editor with shiny UI? Yes and no. If you have many fact tables with a lot of datetime columns and you do an ETL, you’ll be very happy having the opportunity to automate that. But if you have just one table with 2 datetime columns, it is an overkill.
Do you know any other pragmatic use case for recursive functions in PowerQuery? Please, let me know down in the comments 😊