March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
I have prepared an article about a transitive closure in Power Query (coming in 2 weeks). I wanted to publish the article already, but I decided to wait a little bit and write another one about a performance boost I’ve used in the code. It is a lookup table in Power Query.
Imagine the following scenario: You have a recursive algorithm and it contains maybe thousands of joins. You have just 1 table which you must join on itself in every step of a recursion. The result of the join is always at most one value. It can’t be performant if you use a join in every step.
Therefore, I was searching for another solution. The solution is a lookup table. But how to build a lookup table in Power Query? Well, it is simpler than expected. With a record. Even a record with few thousand elements is very fast in comparison to a table. Maybe it is implemented as an associative array, just guessing.
The data type record has a very nice feature. You can concatenate 2 records into a new one. Elements of the first record are overridden if they are also in the second record. Example:
[a = 1] & [b = 2] => [a = 1, b = 2] [a = 1] & [a = 2] => [a = 2]
The code below shows how we can create a lookup table within a record with the help of List.Accumulate.
SourceList = List.Transform({0..CountOfUniqueNodes}, Text.From), AsRecord = List.Accumulate( SourceList, [], (state, current) => state & Expression.Evaluate( "[" & current & "= current]", [ current = current] ) )
Thanks to Expression.Evaluate we can create a record element with an arbitrary name. In my case we create a record with the following items [0=0, 1=1, 2=2, 3=3, …] where all values are strings.
An equivalent table is much simpler to create:
AsTable = Table.Buffer(Table.FromList(List.Transform(SourceList, each [From = _, To = _]), Record.FieldValues, {"From", "To"}))
Performance test
One of the best features of PowerQuery is that it is lazy. I love the laziness of PowerQuery. But now I have to push the PowerQuery engine to work in an order I want.
StartTimestamp = DateTime.LocalNow(), Run = if StartTimestamp = null then null else List.Sum( List.Generate( () => 0, each _ < CountOfIterations, each _ + 1, AppliedMethod[Fnc] ) ), EndTimestamp = if Run = null then null else DateTime.LocalNow(), Result =
[ Count Of Unique Nodes = CountOfUniqueNodes, Count Of Iterations = CountOfIterations, Used Method = AppliedMethod[Name], Run = Run, Runtime = EndTimestamp - StartTimestamp
]
This code guarantees that all the steps will be executed in the exact order: StartTimestamp, Run, EndTimestamp.
Let’s melt the processor
This is the whole code I use for melting down the processor.
let CountOfUniqueNodes = 1000, CountOfIterations = 1000000, AppliedMethod = [Name = "Record", Fnc = (_) => Value.FromText(Record.FieldOrDefault(AsRecord, Number.ToText(Number.Mod(_, CountOfUniqueNodes))))], /*AppliedMethod = [Name = "Table", Fnc = (item) => Value.FromText( Table.SelectRows( AsTable, (row) => row[From] = Number.ToText(Number.Mod(item, 1000)) ){0}[To] ) ],*/ SourceList = List.Transform({0..CountOfUniqueNodes}, Text.From), AsRecord = List.Accumulate( SourceList, [], (state, current) => state & Expression.Evaluate( "[" & current & "= current]", [ current = current] ) ), AsTable = Table.Buffer(Table.FromList(List.Transform(SourceList, each [From = _, To = _]), Record.FieldValues, {"From", "To"})), StartTimestamp = DateTime.LocalNow(), Run = if StartTimestamp = null then null else List.Sum( List.Generate( () => 0, each _ < CountOfIterations, each _ + 1, AppliedMethod[Fnc] ) ), EndTimestamp = if Run = null then null else DateTime.LocalNow(), Result = [ Count Of Unique Nodes = CountOfUniqueNodes, Count Of Iterations = CountOfIterations, Used Method = AppliedMethod[Name], Run = Run, Runtime = EndTimestamp - StartTimestamp] in Result
I have a table with 1 000 nodes and that many directed edges. Each element points to itself: 0 -> 0, 1 -> 1, and so on. It is a nonsense in the real life, but it is good enough for a performance test.
And I invoke 1 000 000 times a lookup via a record.
Wow, it took just 1.57s on my machine. That’s 1.57 µs for a lookup in average. Not bad!
Now I want to do the same with a lookup via a table.
… still waiting … the processor is melting …
It is done after 6 minutes and 34 seconds. Hmm, I was expecting a bad result, but not that bad.
Another test with 10 000 nodes (10 times more than before) and 1 000 000 iterations via a record.
You see, it is slowing down a little bit. But, hey, it is still a breathtaking result. And a graph with 10 000 nodes (edges aren’t important for this test) is not a small one.
Next time, you will see how I apply this lookup table in a transitive closure.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.