Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Stealth02
Helper I
Helper I

Power Query - Not all "Tables" created equal?

Context: I have gaps in my master data - where I have cost centers in my fact table that are not present in my master data for particular fiscal year - but those cost centers are present in other years. I have developed a scenario to allow to complete the master data - by appending the GetMissingDim to the Dim Table.

 

When I apply transformation steps (add_dim1FilteredFinal) to the dim.1 table (i.e. the table generated from the merge query with the Dim Query) - the process is extremely slow - it appears to read in data on a row by row (record by record basis) - without or with buffering.  Using the code below (GetMissingDim - Alternative 1) - a count of 10K or above will demonstrate the slowness of this approach. 

 

When I apply the same transformation steps (add_dim1FilteredFinal) to the dim.1 table - except that I first expand the dim.1 table and then regroup using all rows, the data does not appear to be read row by row - and as such, the results are extremely fast (GetMissingDim - Alternative 2) - even with a count of 50K. .

 

My questions:

Why is there such a difference between a table that has been generated from a merger versus a table that has been generated from the allrows in the group by function- as in both cases, I am doing transformation to an inner table? 

How can I change Alternative 1 to have performance results like Alternative 2 - so I don't have to expand and re-group?

 

Below you will find the sample queries that would demonstrate the issues described above: 

 

Count: A seperate query to enter a numeric value to change how big are the tables. Issues start to be apparent with a count of 10K+

 

DIM: Dimension table with a missing year of data. 

 

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"dimCC","dimFY"})
in
    Source

 

 

GetMissingDim- Alternative 1 - Very slow as it processes row by row

 

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    MergeLeftAnti = Table.NestedJoin(Source, {"CC", "FY"}, Dim, {"dimCC", "dimFY"}, "Dim", JoinKind.LeftAnti),
    Merged = Table.NestedJoin(MergeLeftAnti, {"CC"}, Dim, {"dimCC"}, "dim.1", JoinKind.LeftOuter),
    Add_dim1FilteredFinal = Table.AddColumn(Merged, "dim.1_Final", each let 
                FYFilter=[FY],
                dimFYFiltered=List.Select([dim.1][dimFY], each _>FYFilter),
                dimFY_Max=List.Min(dimFYFiltered),
                dim1FilteredFinal=Table.SelectRows([dim.1],each [dimFY]=dimFY_Max)
                in dim1FilteredFinal),
    #"Expanded dim.1_Final" = Table.ExpandTableColumn(Add_dim1FilteredFinal, "dim.1_Final", {"dimCC", "dimFY"}, {"dimCC", "dimFY"})
in
    #"Expanded dim.1_Final"

 

 

GetMissingDim - Alternative 2- Very quick - Same as Alternative 1 except for a expand Merged step and a Group with all rows:

 

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    MergeLeftAnti = Table.NestedJoin(Source, {"CC", "FY"}, Dim, {"dimCC", "dimFY"}, "Dim", JoinKind.LeftAnti),
    Merged = Table.NestedJoin(MergeLeftAnti, {"CC"}, Dim, {"dimCC"}, "dim.1", JoinKind.LeftOuter),
    #"Expanded dim.1" = Table.ExpandTableColumn(Merged, "dim.1", {"dimCC", "dimFY"}, {"dimCC", "dimFY"}),
    #"Grouped Rows" = Table.Group(#"Expanded dim.1", {"CC", "FY"}, {{"dim.1", each _, type table [CC=number, FY=text, Dim=table, dim.1=table]}}),
    Add_dim1FilteredFinal = Table.AddColumn(#"Grouped Rows", "dim.1_Final", each let 
                FYFilter=[FY],
                dimFYFiltered=List.Select([dim.1][dimFY], each _>FYFilter),
                dimFY_Max=List.Min(dimFYFiltered),
                dim1FilteredFinal=Table.SelectRows([dim.1],each [dimFY]=dimFY_Max)
                in dim1FilteredFinal),
    #"Expanded dim.1_Final" = Table.ExpandTableColumn(Add_dim1FilteredFinal, "dim.1_Final", {"dimCC", "dimFY"}, {"dimCC", "dimFY"})
in
    #"Expanded dim.1_Final"

 

 

 

 

 

1 ACCEPTED SOLUTION

You don't need to do any appending. You can merge and fill up instead.

 

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    #"Merged Queries" = Table.NestedJoin(Source, {"CC", "FY"}, Dim_ForSort, {"CC", "FY"}, "Dim_ForSort", JoinKind.LeftOuter),
    #"Expanded Dim_ForSort" = Table.ExpandTableColumn(#"Merged Queries", "Dim_ForSort", {"Dimension Details"}, {"Dimension Details"}),
    #"Sorted Rows" = Table.Buffer(Table.Sort(#"Expanded Dim_ForSort",{{"CC", Order.Ascending}, {"FY", Order.Ascending}})),
    #"Filled Up" = Table.FillUp(#"Sorted Rows",{"Dimension Details"})
in
    #"Filled Up"

View solution in original post

6 REPLIES 6
AlexisOlson
Super User
Super User

This is an interesting question and I'm curious what the reason is too. My hunch is that it's related to how the internal optimizer builds the query where the slower version doesn't get vectorized optimally (or maybe isn't memoized optimally).

 

I'd guess there's an approach that's better than either version stated, but I don't fully understand the reasoning behind the transformations in GetMissingDim, so it's hard to suggest an alternative.

Hi Alexis,

 

In essence - I am doing a vlookup true and find the closest match greater than what is in the dimension table. The intent of the getmissingdim - is that I would append that table to the dimension table to get a complete dim table with all gaps filled. 

 

I understand that there are other ways to do vlookup true (i.e. append then sort then fill (up or down depending on the desired results). Side note, this has similar performance than the above alternative 2 (I tested it this morning (below - I've pasted the code) - but likely a better way to do vlookup true. 

 

But I am still curious to find out why the perfomance difference between alternative 1 and 2 - as I have other scenarios where an append would not be the right approach... interesting your hunch re vectorized vs memorized aspects - I will wait to see if there are other opinions - but that may be the best explanation! 

 

 

Dim_ForSort

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x),List.Repeat({"Dim detailsY1"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x),List.Repeat({"Dim detailsY3"},x)}),{"CC","FY","Dimension Details"}),
    Add_Flag = Table.AddColumn(Source, "flag", each "x")
in
    Add_Flag

 

MissingDim_Append

let
//[
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    Append_DimForSort = Table.Combine({Source, Dim_ForSort}),
    BuffTable = Table.Buffer(Append_DimForSort),
    SortedRows = Table.Sort(BuffTable,{{"CC", Order.Ascending}, {"FY", Order.Ascending}, {"Dimension Details", Order.Ascending}}),
    FilledUp = Table.FillUp(SortedRows,{"Dimension Details"}),
    FilterOutFlagX = Table.SelectRows(FilledUp, each ([flag] = null))
in
    FilterOutFlagX

 

You don't need to do any appending. You can merge and fill up instead.

 

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    #"Merged Queries" = Table.NestedJoin(Source, {"CC", "FY"}, Dim_ForSort, {"CC", "FY"}, "Dim_ForSort", JoinKind.LeftOuter),
    #"Expanded Dim_ForSort" = Table.ExpandTableColumn(#"Merged Queries", "Dim_ForSort", {"Dimension Details"}, {"Dimension Details"}),
    #"Sorted Rows" = Table.Buffer(Table.Sort(#"Expanded Dim_ForSort",{{"CC", Order.Ascending}, {"FY", Order.Ascending}})),
    #"Filled Up" = Table.FillUp(#"Sorted Rows",{"Dimension Details"})
in
    #"Filled Up"

Thanks - that works  - I was working based on a different scenario where there was no matching values between the tables (hence the append) - but given that in this scenario, there are matching values - a merge is preferable.  

 

Quick question on your use of buffer - my understanding is that buffer should be used before sorting a table to bring it into memory and ensure a proper sort - but you've included after completing the sort. Is there a reason for doing so? Is my understanding of buffer incorrect?

I want it in memory in the sorted order, so I buffer after the sort.

Super thank you. 

 

I will accept your original explanation as the solution - as it is likely the most probable answer. Thanks for all the feedback.  

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors