Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Nolock

Lookup Table in Power Query

I have prepared an article about a transitive closure in Power Query (coming in 2 weeks). I wanted to publish the article already, but I decided to wait a little bit and write another one about a performance boost I’ve used in the code. It is a lookup table in Power Query.

 

Imagine the following scenario: You have a recursive algorithm and it contains maybe thousands of joins. You have just 1 table which you must join on itself in every step of a recursion. The result of the join is always at most one value. It can’t be performant if you use a join in every step.

Therefore, I was searching for another solution. The solution is a lookup table. But how to build a lookup table in Power Query? Well, it is simpler than expected. With a record. Even a record with few thousand elements is very fast in comparison to a table. Maybe it is implemented as an associative array, just guessing.

 

The data type record has a very nice feature. You can concatenate 2 records into a new one. Elements of the first record are overridden if they are also in the second record. Example:

 

[a = 1] & [b = 2] => [a = 1, b = 2]
[a = 1] & [a = 2] => [a = 2]

 

The code below shows how we can create a lookup table within a record with the help of List.Accumulate.

 

SourceList = List.Transform({0..CountOfUniqueNodes}, Text.From),
AsRecord = List.Accumulate(
    SourceList,
    [],
    (state, current) =>
        state & 
        Expression.Evaluate(
            "[" & current & "= current]", 
            [ current = current]
        )
)

 

Thanks to Expression.Evaluate we can create a record element with an arbitrary name. In my case we create a record with the following items [0=0, 1=1, 2=2, 3=3, …] where all values are strings.

An equivalent table is much simpler to create:

 

AsTable = Table.Buffer(Table.FromList(List.Transform(SourceList, each [From = _, To = _]), Record.FieldValues, {"From", "To"}))

 

Performance test

One of the best features of PowerQuery is that it is lazy. I love the laziness of PowerQuery. But now I have to push the PowerQuery engine to work in an order I want.

 

StartTimestamp = DateTime.LocalNow(),
Run = 
    if StartTimestamp = null then 
        null 
    else
        List.Sum( 
            List.Generate(
                () => 0, 
                each _ < CountOfIterations, 
                each _ + 1, 
                AppliedMethod[Fnc]
            )
        ),
EndTimestamp = 
    if Run = null then
        null
    else
        DateTime.LocalNow(),

Result = 
[ Count Of Unique Nodes = CountOfUniqueNodes, Count Of Iterations = CountOfIterations, Used Method = AppliedMethod[Name], Run = Run, Runtime = EndTimestamp - StartTimestamp
]

This code guarantees that all the steps will be executed in the exact order: StartTimestamp, Run, EndTimestamp.

 

Let’s melt the processor

This is the whole code I use for melting down the processor.

 

let
    CountOfUniqueNodes = 1000,
    CountOfIterations = 1000000,
    AppliedMethod = [Name = "Record", Fnc = (_) => Value.FromText(Record.FieldOrDefault(AsRecord, Number.ToText(Number.Mod(_, CountOfUniqueNodes))))],
    /*AppliedMethod = [Name = "Table", Fnc = (item) => 
        Value.FromText(
            Table.SelectRows(
                AsTable, 
                (row) => row[From] = Number.ToText(Number.Mod(item, 1000))
            ){0}[To]
        )
    ],*/
    
    SourceList = List.Transform({0..CountOfUniqueNodes}, Text.From),
    AsRecord = List.Accumulate(
        SourceList,
        [],
        (state, current) =>
            state & 
            Expression.Evaluate(
                "[" & current & "= current]", 
                [ current = current]
            )
    ),
    AsTable = Table.Buffer(Table.FromList(List.Transform(SourceList, each [From = _, To = _]), Record.FieldValues, {"From", "To"})),

    StartTimestamp = DateTime.LocalNow(),
    Run = 
        if StartTimestamp = null then 
            null 
        else
            List.Sum( 
                List.Generate(
                    () => 0, 
                    each _ < CountOfIterations, 
                    each _ + 1, 
                    AppliedMethod[Fnc]
                )
            ),
    EndTimestamp = 
        if Run = null then
            null
        else
            DateTime.LocalNow(),
    
    Result = [
        Count Of Unique Nodes = CountOfUniqueNodes,
        Count Of Iterations = CountOfIterations,
        Used Method = AppliedMethod[Name],
        Run = Run, 
        Runtime = EndTimestamp - StartTimestamp]
in
    Result

 

I have a table with 1 000 nodes and that many directed edges. Each element points to itself: 0 -> 0, 1 -> 1, and so on. It is a nonsense in the real life, but it is good enough for a performance test.

 

And I invoke 1 000 000 times a lookup via a record.

 

2.PNG

 

Wow, it took just 1.57s on my machine. That’s 1.57 µs for a lookup in average. Not bad!

Now I want to do the same with a lookup via a table.

3 cpu.PNG

… still waiting … the processor is melting …

 

3.PNG

 

It is done after 6 minutes and 34 seconds. Hmm, I was expecting a bad result, but not that bad.

Another test with 10 000 nodes (10 times more than before) and 1 000 000 iterations via a record.

 

4 10 000 nodes.PNG 

 

You see, it is slowing down a little bit. But, hey, it is still a breathtaking result. And a graph with 10 000 nodes (edges aren’t important for this test) is not a small one.

Next time, you will see how I apply this lookup table in a transitive closure.

Comments