The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Dear Team,
In Power Query M Code I am trying to figure out how the internal algorithm works for the comparer functions:
Help says that these functions generate an array of ones and zeros and negative ones to help determine ordering, sorting, grouping, max and min calculations and other calculations in functions like List.Sort, Table.Group, List.Min, Lit.Max and many more.
But help is very sparse on exactly what x and y represent and how the resultant array of ones, zeroes and negative ones actually determine the final result.
As for as I can tell, for a single column (comparing sequential rows in a column) comparison:
x = current row in a column
y = next row in column
and
x > y then 1 or x < y then -1 or x = y then 0.
My first question about the algorithm is, if we have these formulas:
MinAlgorithmn = List.Min({-1,-5,0,6,6,3},null,(x,y) => if x > y then 1 else if x < y then -1 else 0) = -5
DefaultMin = List.Min({-1,-5,0,6,6,3}) = -5
MaxAlgorithmn = List.Max({-1,-5,0,6,6,3},null,(x,y) => if x > y then 1 else if x < y then -1 else 0) = 6
DefaultMax = List.Max({-1,-5,0,6,6,3})
Then the original data set would be:
{-1,-5,0,6,6,3}
The algorithm would be:
-1 > -5 algorithm yields:= 1
-5 < 0 algorithm yields:= -1
0 < 6 algorithm yields:= -1
6 = 6 algorithm yields:= 0
6 > 3 algorithm yields:= 1
The resultant array would be: {1,-1,-1,0,1,??}
How in the world does this array {1,-1,-1,0,1,??} pick out from the array, {-1,-5,0,6,6,3}, the -5 for the minimum value and 6 for the maximum value? What pattern am I missing? Or is this not at all how the List.Min and List.Max functions work?
My second question about the algorithm is, if we have this starting table:
Supplier | Costs |
Cœur Work Landscape | 2100 |
Coeur Work Landscape | 1522.36 |
Aerial Surveyor | 3582.44 |
Ærial Surveyor | 550.75 |
Æsthetic Landscape | 1992.25 |
Cœur Work Landscape | 481.26 |
Ærial Surveyor | 1005.75 |
Coeur Work Landscape | 250.99 |
If we use this formula:
=Table.Group(AddDataTypes, {"Supplier"}, {{"TotalCosts", each List.Sum([Costs]), type nullable number}})
We get:
Supplier | TotalCosts |
Cœur Work Landscape | 2581.26 |
Coeur Work Landscape | 1773.35 |
Aerial Surveyor | 3582.44 |
Ærial Surveyor | 1556.5 |
Æsthetic Landscape | 1992.25 |
If we use this formula:
= Table.Group(AddDataTypes, {"Supplier"}, {{"TotalCosts", each List.Sum([Costs]), type nullable number}},null,Comparer.FromCulture("en-US",true))
We get:
Supplier | TotalCosts |
Cœur Work Landscape | 4354.61 |
Aerial Surveyor | 5138.94 |
Æsthetic Landscape | 1992.25 |
I have broken apart how it looks like the algorithm works here so that we can see the resultant array:
Supplier | Costs | x | y | Comparer.FromCulture |
Cœur Work Landscape | 2100 | Cœur Work Landscape | Coeur Work Landscape | 0 |
Coeur Work Landscape | 1522.36 | Coeur Work Landscape | Aerial Surveyor | 1 |
Aerial Surveyor | 3582.44 | Aerial Surveyor | Ærial Surveyor | 0 |
Ærial Surveyor | 550.75 | Ærial Surveyor | Æsthetic Landscape | -1 |
Æsthetic Landscape | 1992.25 | Æsthetic Landscape | Cœur Work Landscape | -1 |
Cœur Work Landscape | 481.26 | Cœur Work Landscape | Ærial Surveyor | 1 |
Ærial Surveyor | 1005.75 | Ærial Surveyor | Coeur Work Landscape | -1 |
Coeur Work Landscape | 250.99 | Coeur Work Landscape |
How in the world does this array {0,1,0,-1,-1,1,-1,??} make the final grouping group based in “en-US” equivalent letters? What pattern am I missing? Or is it working some other way?
Thanks, Team!
@mgirvin and @Rickmaurinus , what an interesting topic, and great points on the sparse documentation. These do need much love and attention. @Rickmaurinus , great article, thank you! How can these questions, thoughts and other noodlings turn into one or more videos in the future? Would the two of you consider coop'ting on one or more?
Hey Mike,
That is an interesting topic. This week I finished an article on using Table.Group with Custom Comparer functions, which can you read here: https://gorilla.bi/power-query/table-group-custom-comparers/
For a Comparer function to be valid, it needs to be symmetric. So defining a comparer function, should result in 1, 0 or -1 as a result. What I mean with symmetric is that if row 1 is evaluated first and row 2 after then the resulting order should be identical to when row 2 is evaluated first and row 1 second.
Now, I don't think the result of evaluating the comparer logic is used independently for each row in the dataset. How I see it, is that the function is applied repeatedly to all the rows, to form groups. So the logic tries row 1 and 2, and says row 2 comes first. It then applies the logic to row 3, and compares it to row 1 and 2, and positions row 3. Then it performs the logic for row 4, and looks where in the order row 4 belongs.
Also, whereas using GroupKind.Global requires a symmetric comparer function, you can 'get away' with a simplified comparer function when using GroupKind.Local.
Have a look at my article and I'd be happy to hear your thoughts. You can also see another example illustrated here.
Regards,
Rick
Hi @mgirvin
It's an interesting topic, however there is no public documentation about how these Power Query native functions are working in the back end.
My guess is that after every comparison, a temporary max or min value will be stored and used to compare with the next value. For example if we hope to get List.Max from {-1,-5,0,6,6,3}, the first time we get -1 as the temporary max value and give it to x. When comparing x with -5, x will still be -1 as it's larger than -5. When comparing x with 0, x will be replaced by 0. Then use the new x to compare with 6 and get a new max value 6 for x. Repeating this comparison for the rest values in the list until the last one.
However this is just my simple guess as we don't know how the native function is written. In addition, from a programming point of view, this is also not the most efficient approach.
Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!
Thank you very much for your reply : )
What gets me, however, is why does Micrsoft help always metion the if x > y then 1 else if x < y then -1 else 0. Why even mention the numbers? What is the point of telling people that the comparer functions assign 1, -1 and 0, without any further explanation...
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.