Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Nolock

Solving Real Life Problems with Recursive Functions in PowerQuery

Every respected computer scientist has heard about recursive functions (more details on Wiki: https://en.wikipedia.org/wiki/Recursion). There are many algorithms which are mentioned as example of such functions, i.e. Fibonacci sequence as the simplest example:

Fib(0) = 0
Fib(1) = 1
For all integers n > 1, Fib(n) = Fib(n-1) + Fib(n-2)

 

How do we rewrite this pseudo code into a function in PowerQuery?

let
    fnFibonacci = (value as number) as number =>
        if value = 0 then
            0
        else if value = 1 then
            1
        else
            @fnFibonacci(value - 1) + @fnFibonacci(value - 2),
    Source = List.Transform({0..10}, fnFibonacci)
in
    Source

 

The most important part is the use of @ before we call the recursive function. We tell the PowerQuery to reference its own name from inside itself.

Let’s test our function:

Screenshot1.PNG

 

Well, that is great but not very useful in real life. I have been searching for long time for a scenario which can use recursive functions in a meaningful way. And I have found one!

In PowerQuery, you can select or rename many columns in one step. Moreover, you can change their data types at once. But repetitive modifications of a table are not so easy anymore.

 

In my ETL process written in PowerQuery, I wanted to split every datetime column in a fact table into 2 separate columns: date and time. It has at least 2 big advantages. First of all, date and time values become a foreign key for my date and time dimensions; second, the size of my dataset will decrease dramatically. This means that instead of many unique values I’ll get only 365 unique values a year for date dimension and 86 400 unique values for time dimension with the precision of one second.

What are my options? I can create a new custom column of date datatype for every datetime column and another custom column for time. Yes, I can, but I do not like doing a repetitive work. Let’s try to automate it.

 

Firstly, we need some test data.

// create a table with some datetime columns
    Source = Table.FromRecords(
        {  
            [OrderID = 1, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,10,0,0), OrderPaid_dt = #datetime(2019,6,16,10,5,0), OrderShipped_dt = #datetime(2019,6,16,11,0,0), Price = 100.0],
            [OrderID = 2, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,12,12,12), OrderPaid_dt = #datetime(2019,6,16,13,13,13), OrderShipped_dt = null, Price = 200.0]  
        },
        type table[OrderID = Int64.Type, CustomerID = Int64.Type, OrderPlaced_dt = DateTime.Type, OrderPaid_dt = DateTime.Type, OrderShipped_dt = DateTime.Type, Price = Decimal.Type]
    ),

 

Next, let’s create a function, which has 2 parameters - a source table and a name of a datetime column. This function does the same what you would do in the UI. It creates 2 new columns and removes the original one.

    // split a datetime column into 2 columns: date and time
    fnSplitDateTimeColumn = (parTbl as table, parColumnName as text) as table =>
        let
            // add a new column with date
            transformDateExpression = Expression.Evaluate("each Date.From([" & parColumnName & "])", #shared),
            addDateColumn = Table.AddColumn(parTbl, parColumnName & "_date", transformDateExpression, type date),

            // add a new column with time
            transformTimeExpression = Expression.Evaluate("each try #time(Time.Hour([" & parColumnName & "]),Time.Minute([" & parColumnName & "]),Time.Second([" & parColumnName & "])) otherwise null", #shared),
            addTimeColumn = Table.AddColumn(addDateColumn, parColumnName & "_time", transformTimeExpression, type time),

            // remove datetime column
            removeDateTimeColumn = Table.RemoveColumns(addTimeColumn, parColumnName)
        in
            removeDateTimeColumn,

 

And finally, in the last step we create another function which is recursive. This function has also 2 parameters - a source table and a list of all datetime column names which we haven’t transformed yet. The function takes the first item from the list (a datetime column name), splits this datetime column into 2 columns, and calls itself recursively. The recursive call uses the last step as the new source table and a list of all datetime column names except the first one (which is already processed). Once the list is empty, the recursion terminates. The recursion function returns a modified source table – instead of one datetime columns there are now 2 columns.

    // recursive function which splits all datetime columns into date and time columns
    // parTbl is a source table to modify, parColumnNameList is a list of columns to split
    fnSplitAllDateTimeColumns = (parTbl as table, parColumnNameList as list) as table =>
        // if parColumNameList is empty, terminate the recursion
        if List.IsEmpty(parColumnNameList) then
            parTbl
        else
            let
                // get one column name to process
                currentColumnName = List.First(parColumnNameList),
                // remove first item from the parColumnNameList
                nextColumNameList = List.RemoveFirstN(parColumnNameList, 1),

                // split current column
                splitOneColumnTable = fnSplitDateTimeColumn(parTbl, currentColumnName),
                // call itself recursively with a new created table and a shortend column name list
                nextIterationTable = @fnSplitAllDateTimeColumns(splitOneColumnTable, nextColumNameList)
            in
                nextIterationTable,

And everything together:

let
    // create a table with some datetime columns
    Source = Table.FromRecords(
        {  
            [OrderID = 1, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,10,0,0), OrderPaid_dt = #datetime(2019,6,16,10,5,0), OrderShipped_dt = #datetime(2019,6,16,11,0,0), Price = 100.0],
            [OrderID = 2, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,12,12,12), OrderPaid_dt = #datetime(2019,6,16,13,13,13), OrderShipped_dt = null, Price = 200.0]  
        },
        type table[OrderID = Int64.Type, CustomerID = Int64.Type, OrderPlaced_dt = DateTime.Type, OrderPaid_dt = DateTime.Type, OrderShipped_dt = DateTime.Type, Price = Decimal.Type]
    ),

    // split a datetime column into 2 columns: date and time
    fnSplitDateTimeColumn = (parTbl as table, parColumnName as text) as table =>
        let
            // add a new column with date
            transformDateExpression = Expression.Evaluate("each Date.From([" & parColumnName & "])", #shared),
            addDateColumn = Table.AddColumn(parTbl, parColumnName & "_date", transformDateExpression, type date),

            // add a new column with time
            transformTimeExpression = Expression.Evaluate("each try #time(Time.Hour([" & parColumnName & "]),Time.Minute([" & parColumnName & "]),Time.Second([" & parColumnName & "])) otherwise null", #shared),
            addTimeColumn = Table.AddColumn(addDateColumn, parColumnName & "_time", transformTimeExpression, type time),

            // remove datetime column
            removeDateTimeColumn = Table.RemoveColumns(addTimeColumn, parColumnName)
        in
            removeDateTimeColumn,

    // recursive function which splits all datetime columns into date and time columns
    // parTbl is a source table to modify, parColumnNameList is a list of columns to split
    fnSplitAllDateTimeColumns = (parTbl as table, parColumnNameList as list) as table =>
        // if parColumNameList is empty, terminate the recursion
        if List.IsEmpty(parColumnNameList) then
            parTbl
        else
            let
                // get one column name to process
                currentColumnName = List.First(parColumnNameList),
                // remove first item from the parColumnNameList
                nextColumNameList = List.RemoveFirstN(parColumnNameList, 1),

                // split current column
                splitOneColumnTable = fnSplitDateTimeColumn(parTbl, currentColumnName),
                // call itself recursively with a new created table and a shortend column name list
                nextIterationTable = @fnSplitAllDateTimeColumns(splitOneColumnTable, nextColumNameList)
            in
                nextIterationTable,

    // get all columns having the datatype datetime
    DateTimeColumnNames = Table.ColumnsOfType(Source, {type datetime}),
    // split all datetime columns
    SplitAllDateTimeColumns = fnSplitAllDateTimeColumns(Source, DateTimeColumnNames)
in
    SplitAllDateTimeColumns

 

And the result:

Screenshot2.PNG

 

Is it a lot of code for such a simple task which you can manage in PowerQuery Editor with shiny UI? Yes and no. If you have many fact tables with a lot of datetime columns and you do an ETL, you’ll be very happy having the opportunity to automate that. But if you have just one table with 2 datetime columns, it is an overkill.

 

Do you know any other pragmatic use case for recursive functions in PowerQuery? Please, let me know down in the comments 😊