Solved: Re: Get all rows from a table based on multiple va...

H_George · ‎03-23-2023

Hi everyone,
I have 2 large dataset where there are no unique values.
What I would like to do is import every value from my second table to the first table where the Material IDs match, and in one of the columns of the second table the date is earlier than the date in one of the columns in the firsttable.
The second table does not have a value for every material ID from the first, but every material ID can have multiple rows in both tables. I need the row with the max date matching the criteria.
I need all the rows from the first table for later calculation.

I can do this now by left joining the 2 tables, counting the values in every nested table where the date is earlier than the column in my firsttable(the dates are sorted descending beforehand), after that I select the values one by one from the nested table based on a counter-1.
This works fully, my problem is that the first table has 50-100k rows and the second 750-2000 rows, so running this process takes a lot of time. I would really appreciate if someone could help me with this.
Br, George

AlienSx · ‎03-23-2023

Hi, try this. Should work faster.

let
    // table 01 data
    tbl1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDSNzLQNwIxQzJzU+MNlWJ14HLGeORM8OszBcrFlKDKmpiYAAUM9Q2BWsE6S0zB4qampkCOEUI8zUQpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_01 = _t, #"Capacity needed_OLD" = _t]),
    type_01 = Table.TransformColumnTypes(tbl1,{{"dates_01", type date}}),
    // group table 01 by material_id and sort each table by date
    g1 = Table.Group(type_01, {"material_id"}, {{"tbl1", each Table.Sort(_, {"dates_01", Order.Descending})}}),
    // table 02 data
    tbl2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDWNzTVNwIxQzJzU+ONlGJ14HImqHLGYDkjIyOggKG+ob6RAViuBCRnbm4OljUxMcEia2lpCZY1NTWFyEKNBUuCBGNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_02 = _t, #"Improved Time" = _t]),
    type_02 = Table.TransformColumnTypes(tbl2,{{"dates_02", type date}}),
    // group table by material_id and make a record (dict) with material_id as field names
    g2 = Table.Group(type_02, {"material_id"}, {{"Value", each List.Zip({[dates_02], [Improved Time]})}}),
    dict = Record.FromTable(Table.RenameColumns(g2,{{"material_id", "Name"}})),
    // this is join replacement. Should work faster! 
    tbl = Table.AddColumn(g1, "tbl2", each Record.FieldOrDefault(dict, [material_id], null)),
    // this function iterates table "old" (resides in tbl1 column) and grabs date from tbl2
    fx_new_capacity = (old as table, new as list)  =>
        let 
            old_count = Table.RowCount(old),
            dates = List.Buffer( List.Sort( old[dates_01], Order.Descending ) ),
            updates = List.Buffer( List.Sort( new, {each _{0}, Order.Descending} ) ),
            new_column = 
                List.Generate(
                    () => [ i = 0, lst = updates, new_col = {}],
                    (x) => x[i] <= old_count and not List.IsEmpty( x[lst] ),
                    (x) => 
                        let 
                            a = List.Skip ( x[lst], each _{0} >= dates{x[i]} ),
                            b = List.First( a ){1}
                        in [ i = x[i] + 1, lst = a, new_col = x[new_col] & {b} ]

                ),
            out = Table.FromColumns( Table.ToColumns(old) & {List.Last(new_column)[new_col]}, Table.ColumnNames(old) & {"Capacity_neededNEW"} )
        in out,
    // here we iterate table tbl and call our function, then combine our results into final table
    new_cap = Table.Combine(Table.TransformRows( tbl, each if [tbl2] = null then [tbl1] else fx_new_capacity( [tbl1], [tbl2] )))
in
    new_cap

View solution in original post

H_George · ‎03-23-2023

Hi @AlienSx , thank you for the reply.
In table2 there are 2 arguments that I need for calculation.
In table1 are all my production plans for every material and the time capacity they need,. There is a possibility for planned improvement where the needed capacity changes for a material throughout a year (maybe more than one time), table 2 contains this information.
What I want is, that if the time was changed before a planned production date, than the capacity needed should change according to the new time from table2 (after getting a values I check if the values exist, if not I use the old time, if yes I use the new ones).
Sorry I can not get more into detail, since it is a company task.

AlienSx · ‎03-23-2023

@H_George , lets simplify this. Suppose you have 2 tables with same material_id: from tbl01 and tbl02 respectively.

tbl01

material_id	dates_01
333	02/20/23
333	03/20/23
333	04/20/23

tbl02 selection with the same material_id

material_id	dates_02
333	01/15/23
333	03/01/23

what would you want to select from tbl02?

As far as I can understand, you are iterating tbl_01 row by row, then select records from tbl02 with date less than current value in tbl01[dates_01], then select a single record using List.Max(tbl01[dates_02]). Correct? Could you please show the final selection from the tbl02 using table above? You may change the data if you find them irrelevant.

H_George · ‎03-23-2023

Hi,
table1

material_ID	dates_01	Capacity needed_OLD	Capacity needed_NEW
333	02/20/23	Time_1
333	03/20/23	Time_1	Time_2
333	03/25/23	Time_1	Time_2
333	04/20/23	Time_1	Time_3

table2

material_ID	dates_02	Improved Time
333	03/15/23	Time_2
333	04/15/23	Time_3

After the join I use a formula like this:
let val = [dates_01] in List.Count(List.Select(Table.Column([table2], "dates_02"), each _ <=val))
I store this value in a column as index.

Then I get the time with this:
Table.Column([table], "Improved Time"){[Index]-1}
I do this for every column I need(the time value is based on multiple columns, but I wouldn't get into it much more than this)

There has to be a better solution.
Also the best would be to replace the original times(this is historic data), right now I use an IF statement to check if the new time is empty or not, and that how I calculate with it.
Thank you for your time.

AlienSx · ‎03-23-2023

Hi, try this. Should work faster.

let
    // table 01 data
    tbl1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDSNzLQNwIxQzJzU+MNlWJ14HLGeORM8OszBcrFlKDKmpiYAAUM9Q2BWsE6S0zB4qampkCOEUI8zUQpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_01 = _t, #"Capacity needed_OLD" = _t]),
    type_01 = Table.TransformColumnTypes(tbl1,{{"dates_01", type date}}),
    // group table 01 by material_id and sort each table by date
    g1 = Table.Group(type_01, {"material_id"}, {{"tbl1", each Table.Sort(_, {"dates_01", Order.Descending})}}),
    // table 02 data
    tbl2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDWNzTVNwIxQzJzU+ONlGJ14HImqHLGYDkjIyOggKG+ob6RAViuBCRnbm4OljUxMcEia2lpCZY1NTWFyEKNBUuCBGNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_02 = _t, #"Improved Time" = _t]),
    type_02 = Table.TransformColumnTypes(tbl2,{{"dates_02", type date}}),
    // group table by material_id and make a record (dict) with material_id as field names
    g2 = Table.Group(type_02, {"material_id"}, {{"Value", each List.Zip({[dates_02], [Improved Time]})}}),
    dict = Record.FromTable(Table.RenameColumns(g2,{{"material_id", "Name"}})),
    // this is join replacement. Should work faster! 
    tbl = Table.AddColumn(g1, "tbl2", each Record.FieldOrDefault(dict, [material_id], null)),
    // this function iterates table "old" (resides in tbl1 column) and grabs date from tbl2
    fx_new_capacity = (old as table, new as list)  =>
        let 
            old_count = Table.RowCount(old),
            dates = List.Buffer( List.Sort( old[dates_01], Order.Descending ) ),
            updates = List.Buffer( List.Sort( new, {each _{0}, Order.Descending} ) ),
            new_column = 
                List.Generate(
                    () => [ i = 0, lst = updates, new_col = {}],
                    (x) => x[i] <= old_count and not List.IsEmpty( x[lst] ),
                    (x) => 
                        let 
                            a = List.Skip ( x[lst], each _{0} >= dates{x[i]} ),
                            b = List.First( a ){1}
                        in [ i = x[i] + 1, lst = a, new_col = x[new_col] & {b} ]

                ),
            out = Table.FromColumns( Table.ToColumns(old) & {List.Last(new_column)[new_col]}, Table.ColumnNames(old) & {"Capacity_neededNEW"} )
        in out,
    // here we iterate table tbl and call our function, then combine our results into final table
    new_cap = Table.Combine(Table.TransformRows( tbl, each if [tbl2] = null then [tbl1] else fx_new_capacity( [tbl1], [tbl2] )))
in
    new_cap

H_George · ‎03-24-2023

Thank you very much, this works perfectly.

AlienSx · ‎03-23-2023

Hello, @H_George If table join is your bottleneck then I would consider Record.FieldOrDefault approach instead - read these articles. I don't quite get what you are doing after tables join - some table2 rows selection based on 2 lists of dates? Maybe there is some space to improve performace at this point using list functions. Just couple of thoughts.

Get all rows from a table based on multiple values on my current table

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

Get all rows from a table based on multiple values on my current table

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025