Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Nolock

Lookup Table in Power Query

I have prepared an article about a transitive closure in Power Query (coming in 2 weeks). I wanted to publish the article already, but I decided to wait a little bit and write another one about a performance boost I’ve used in the code. It is a lookup table in Power Query.

 

Imagine the following scenario: You have a recursive algorithm and it contains maybe thousands of joins. You have just 1 table which you must join on itself in every step of a recursion. The result of the join is always at most one value. It can’t be performant if you use a join in every step.

Therefore, I was searching for another solution. The solution is a lookup table. But how to build a lookup table in Power Query? Well, it is simpler than expected. With a record. Even a record with few thousand elements is very fast in comparison to a table. Maybe it is implemented as an associative array, just guessing.

 

The data type record has a very nice feature. You can concatenate 2 records into a new one. Elements of the first record are overridden if they are also in the second record. Example:

 

[a = 1] & [b = 2] => [a = 1, b = 2]
[a = 1] & [a = 2] => [a = 2]

 

The code below shows how we can create a lookup table within a record with the help of List.Accumulate.

 

SourceList = List.Transform({0..CountOfUniqueNodes}, Text.From),
AsRecord = List.Accumulate(
    SourceList,
    [],
    (state, current) =>
        state & 
        Expression.Evaluate(
            "[" & current & "= current]", 
            [ current = current]
        )
)

 

Thanks to Expression.Evaluate we can create a record element with an arbitrary name. In my case we create a record with the following items [0=0, 1=1, 2=2, 3=3, …] where all values are strings.

An equivalent table is much simpler to create:

 

AsTable = Table.Buffer(Table.FromList(List.Transform(SourceList, each [From = _, To = _]), Record.FieldValues, {"From", "To"}))

 

Performance test

One of the best features of PowerQuery is that it is lazy. I love the laziness of PowerQuery. But now I have to push the PowerQuery engine to work in an order I want.

 

StartTimestamp = DateTime.LocalNow(),
Run = 
    if StartTimestamp = null then 
        null 
    else
        List.Sum( 
            List.Generate(
                () => 0, 
                each _ < CountOfIterations, 
                each _ + 1, 
                AppliedMethod[Fnc]
            )
        ),
EndTimestamp = 
    if Run = null then
        null
    else
        DateTime.LocalNow(),

Result = 
[ Count Of Unique Nodes = CountOfUniqueNodes, Count Of Iterations = CountOfIterations, Used Method = AppliedMethod[Name], Run = Run, Runtime = EndTimestamp - StartTimestamp
]

This code guarantees that all the steps will be executed in the exact order: StartTimestamp, Run, EndTimestamp.

 

Let’s melt the processor

This is the whole code I use for melting down the processor.

 

let
    CountOfUniqueNodes = 1000,
    CountOfIterations = 1000000,
    AppliedMethod = [Name = "Record", Fnc = (_) => Value.FromText(Record.FieldOrDefault(AsRecord, Number.ToText(Number.Mod(_, CountOfUniqueNodes))))],
    /*AppliedMethod = [Name = "Table", Fnc = (item) => 
        Value.FromText(
            Table.SelectRows(
                AsTable, 
                (row) => row[From] = Number.ToText(Number.Mod(item, 1000))
            ){0}[To]
        )
    ],*/
    
    SourceList = List.Transform({0..CountOfUniqueNodes}, Text.From),
    AsRecord = List.Accumulate(
        SourceList,
        [],
        (state, current) =>
            state & 
            Expression.Evaluate(
                "[" & current & "= current]", 
                [ current = current]
            )
    ),
    AsTable = Table.Buffer(Table.FromList(List.Transform(SourceList, each [From = _, To = _]), Record.FieldValues, {"From", "To"})),

    StartTimestamp = DateTime.LocalNow(),
    Run = 
        if StartTimestamp = null then 
            null 
        else
            List.Sum( 
                List.Generate(
                    () => 0, 
                    each _ < CountOfIterations, 
                    each _ + 1, 
                    AppliedMethod[Fnc]
                )
            ),
    EndTimestamp = 
        if Run = null then
            null
        else
            DateTime.LocalNow(),
    
    Result = [
        Count Of Unique Nodes = CountOfUniqueNodes,
        Count Of Iterations = CountOfIterations,
        Used Method = AppliedMethod[Name],
        Run = Run, 
        Runtime = EndTimestamp - StartTimestamp]
in
    Result

 

I have a table with 1 000 nodes and that many directed edges. Each element points to itself: 0 -> 0, 1 -> 1, and so on. It is a nonsense in the real life, but it is good enough for a performance test.

 

And I invoke 1 000 000 times a lookup via a record.

 

2.PNG

 

Wow, it took just 1.57s on my machine. That’s 1.57 µs for a lookup in average. Not bad!

Now I want to do the same with a lookup via a table.

3 cpu.PNG

… still waiting … the processor is melting …

 

3.PNG

 

It is done after 6 minutes and 34 seconds. Hmm, I was expecting a bad result, but not that bad.

Another test with 10 000 nodes (10 times more than before) and 1 000 000 iterations via a record.

 

4 10 000 nodes.PNG 

 

You see, it is slowing down a little bit. But, hey, it is still a breathtaking result. And a graph with 10 000 nodes (edges aren’t important for this test) is not a small one.

Next time, you will see how I apply this lookup table in a transitive closure.

Comments