Solved: Power Query - Not all "Tables" created equal?

Stealth02 · ‎01-30-2023

Context: I have gaps in my master data - where I have cost centers in my fact table that are not present in my master data for particular fiscal year - but those cost centers are present in other years. I have developed a scenario to allow to complete the master data - by appending the GetMissingDim to the Dim Table.

When I apply transformation steps (add_dim1FilteredFinal) to the dim.1 table (i.e. the table generated from the merge query with the Dim Query) - the process is extremely slow - it appears to read in data on a row by row (record by record basis) - without or with buffering. Using the code below (GetMissingDim - Alternative 1) - a count of 10K or above will demonstrate the slowness of this approach.

When I apply the same transformation steps (add_dim1FilteredFinal) to the dim.1 table - except that I first expand the dim.1 table and then regroup using all rows, the data does not appear to be read row by row - and as such, the results are extremely fast (GetMissingDim - Alternative 2) - even with a count of 50K. .

My questions:

Why is there such a difference between a table that has been generated from a merger versus a table that has been generated from the allrows in the group by function- as in both cases, I am doing transformation to an inner table?

How can I change Alternative 1 to have performance results like Alternative 2 - so I don't have to expand and re-group?

Below you will find the sample queries that would demonstrate the issues described above:

Count: A seperate query to enter a numeric value to change how big are the tables. Issues start to be apparent with a count of 10K+

DIM: Dimension table with a missing year of data.

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"dimCC","dimFY"})
in
    Source

GetMissingDim- Alternative 1 - Very slow as it processes row by row

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    MergeLeftAnti = Table.NestedJoin(Source, {"CC", "FY"}, Dim, {"dimCC", "dimFY"}, "Dim", JoinKind.LeftAnti),
    Merged = Table.NestedJoin(MergeLeftAnti, {"CC"}, Dim, {"dimCC"}, "dim.1", JoinKind.LeftOuter),
    Add_dim1FilteredFinal = Table.AddColumn(Merged, "dim.1_Final", each let 
                FYFilter=[FY],
                dimFYFiltered=List.Select([dim.1][dimFY], each _>FYFilter),
                dimFY_Max=List.Min(dimFYFiltered),
                dim1FilteredFinal=Table.SelectRows([dim.1],each [dimFY]=dimFY_Max)
                in dim1FilteredFinal),
    #"Expanded dim.1_Final" = Table.ExpandTableColumn(Add_dim1FilteredFinal, "dim.1_Final", {"dimCC", "dimFY"}, {"dimCC", "dimFY"})
in
    #"Expanded dim.1_Final"

GetMissingDim - Alternative 2- Very quick - Same as Alternative 1 except for a expand Merged step and a Group with all rows:

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    MergeLeftAnti = Table.NestedJoin(Source, {"CC", "FY"}, Dim, {"dimCC", "dimFY"}, "Dim", JoinKind.LeftAnti),
    Merged = Table.NestedJoin(MergeLeftAnti, {"CC"}, Dim, {"dimCC"}, "dim.1", JoinKind.LeftOuter),
    #"Expanded dim.1" = Table.ExpandTableColumn(Merged, "dim.1", {"dimCC", "dimFY"}, {"dimCC", "dimFY"}),
    #"Grouped Rows" = Table.Group(#"Expanded dim.1", {"CC", "FY"}, {{"dim.1", each _, type table [CC=number, FY=text, Dim=table, dim.1=table]}}),
    Add_dim1FilteredFinal = Table.AddColumn(#"Grouped Rows", "dim.1_Final", each let 
                FYFilter=[FY],
                dimFYFiltered=List.Select([dim.1][dimFY], each _>FYFilter),
                dimFY_Max=List.Min(dimFYFiltered),
                dim1FilteredFinal=Table.SelectRows([dim.1],each [dimFY]=dimFY_Max)
                in dim1FilteredFinal),
    #"Expanded dim.1_Final" = Table.ExpandTableColumn(Add_dim1FilteredFinal, "dim.1_Final", {"dimCC", "dimFY"}, {"dimCC", "dimFY"})
in
    #"Expanded dim.1_Final"

AlexisOlson · ‎01-31-2023

You don't need to do any appending. You can merge and fill up instead.

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    #"Merged Queries" = Table.NestedJoin(Source, {"CC", "FY"}, Dim_ForSort, {"CC", "FY"}, "Dim_ForSort", JoinKind.LeftOuter),
    #"Expanded Dim_ForSort" = Table.ExpandTableColumn(#"Merged Queries", "Dim_ForSort", {"Dimension Details"}, {"Dimension Details"}),
    #"Sorted Rows" = Table.Buffer(Table.Sort(#"Expanded Dim_ForSort",{{"CC", Order.Ascending}, {"FY", Order.Ascending}})),
    #"Filled Up" = Table.FillUp(#"Sorted Rows",{"Dimension Details"})
in
    #"Filled Up"

View solution in original post

AlexisOlson · ‎01-30-2023

This is an interesting question and I'm curious what the reason is too. My hunch is that it's related to how the internal optimizer builds the query where the slower version doesn't get vectorized optimally (or maybe isn't memoized optimally).

I'd guess there's an approach that's better than either version stated, but I don't fully understand the reasoning behind the transformations in GetMissingDim, so it's hard to suggest an alternative.

Stealth02 · ‎01-31-2023

Hi Alexis,

In essence - I am doing a vlookup true and find the closest match greater than what is in the dimension table. The intent of the getmissingdim - is that I would append that table to the dimension table to get a complete dim table with all gaps filled.

I understand that there are other ways to do vlookup true (i.e. append then sort then fill (up or down depending on the desired results). Side note, this has similar performance than the above alternative 2 (I tested it this morning (below - I've pasted the code) - but likely a better way to do vlookup true.

But I am still curious to find out why the perfomance difference between alternative 1 and 2 - as I have other scenarios where an append would not be the right approach... interesting your hunch re vectorized vs memorized aspects - I will wait to see if there are other opinions - but that may be the best explanation!

Dim_ForSort

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x),List.Repeat({"Dim detailsY1"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x),List.Repeat({"Dim detailsY3"},x)}),{"CC","FY","Dimension Details"}),
    Add_Flag = Table.AddColumn(Source, "flag", each "x")
in
    Add_Flag

MissingDim_Append

let
//[
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    Append_DimForSort = Table.Combine({Source, Dim_ForSort}),
    BuffTable = Table.Buffer(Append_DimForSort),
    SortedRows = Table.Sort(BuffTable,{{"CC", Order.Ascending}, {"FY", Order.Ascending}, {"Dimension Details", Order.Ascending}}),
    FilledUp = Table.FillUp(SortedRows,{"Dimension Details"}),
    FilterOutFlagX = Table.SelectRows(FilledUp, each ([flag] = null))
in
    FilterOutFlagX

AlexisOlson · ‎01-31-2023

You don't need to do any appending. You can merge and fill up instead.

let
    x=Count,
    Source = Table.FromRows(List.Zip({{1..x},List.Repeat({"2018-19"},x)})&List.Zip({{1..x},List.Repeat({"2019-20"},x)})&List.Zip({{1..x},List.Repeat({"2020-21"},x)}),{"CC","FY"}),
    #"Merged Queries" = Table.NestedJoin(Source, {"CC", "FY"}, Dim_ForSort, {"CC", "FY"}, "Dim_ForSort", JoinKind.LeftOuter),
    #"Expanded Dim_ForSort" = Table.ExpandTableColumn(#"Merged Queries", "Dim_ForSort", {"Dimension Details"}, {"Dimension Details"}),
    #"Sorted Rows" = Table.Buffer(Table.Sort(#"Expanded Dim_ForSort",{{"CC", Order.Ascending}, {"FY", Order.Ascending}})),
    #"Filled Up" = Table.FillUp(#"Sorted Rows",{"Dimension Details"})
in
    #"Filled Up"

Stealth02 · ‎02-06-2023

Thanks - that works - I was working based on a different scenario where there was no matching values between the tables (hence the append) - but given that in this scenario, there are matching values - a merge is preferable.

Quick question on your use of buffer - my understanding is that buffer should be used before sorting a table to bring it into memory and ensure a proper sort - but you've included after completing the sort. Is there a reason for doing so? Is my understanding of buffer incorrect?

AlexisOlson · ‎02-06-2023

I want it in memory in the sorted order, so I buffer after the sort.

Stealth02 · ‎02-07-2023

Super thank you.

I will accept your original explanation as the solution - as it is likely the most probable answer. Thanks for all the feedback.

Power Query - Not all "Tables" created equal?

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

FabCon is coming to Atlanta

Power Query - Not all "Tables" created equal?

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026