Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
H_George
Frequent Visitor

Get all rows from a table based on multiple values on my current table

Hi everyone,
I have 2 large dataset where there are no unique values.
What I would like to do is import every value from my second table to the first table where the Material IDs match, and in one of the columns of the second table the date is earlier than the date in one of the columns in the firsttable.
The second table does not have a value for every material ID from the first, but every material ID can have multiple rows in both tables. I need the row with the max date matching the criteria.
I need all the rows from the first table for later calculation. 


I can do this now by left joining the 2 tables, counting the values in every nested table where the date is earlier than the column in my firsttable(the dates are sorted descending beforehand), after that I select the values one by one from the nested table based on a counter-1. 
This works fully, my problem is that the first table has 50-100k rows and the second 750-2000 rows, so running this process takes a lot of time. I would really appreciate if someone could help me with this.
Br, George

1 ACCEPTED SOLUTION

Hi, try this. Should work faster.

let
    // table 01 data
    tbl1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDSNzLQNwIxQzJzU+MNlWJ14HLGeORM8OszBcrFlKDKmpiYAAUM9Q2BWsE6S0zB4qampkCOEUI8zUQpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_01 = _t, #"Capacity needed_OLD" = _t]),
    type_01 = Table.TransformColumnTypes(tbl1,{{"dates_01", type date}}),
    // group table 01 by material_id and sort each table by date
    g1 = Table.Group(type_01, {"material_id"}, {{"tbl1", each Table.Sort(_, {"dates_01", Order.Descending})}}),
    // table 02 data
    tbl2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDWNzTVNwIxQzJzU+ONlGJ14HImqHLGYDkjIyOggKG+ob6RAViuBCRnbm4OljUxMcEia2lpCZY1NTWFyEKNBUuCBGNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_02 = _t, #"Improved Time" = _t]),
    type_02 = Table.TransformColumnTypes(tbl2,{{"dates_02", type date}}),
    // group table by material_id and make a record (dict) with material_id as field names
    g2 = Table.Group(type_02, {"material_id"}, {{"Value", each List.Zip({[dates_02], [Improved Time]})}}),
    dict = Record.FromTable(Table.RenameColumns(g2,{{"material_id", "Name"}})),
    // this is join replacement. Should work faster! 
    tbl = Table.AddColumn(g1, "tbl2", each Record.FieldOrDefault(dict, [material_id], null)),
    // this function iterates table "old" (resides in tbl1 column) and grabs date from tbl2
    fx_new_capacity = (old as table, new as list)  =>
        let 
            old_count = Table.RowCount(old),
            dates = List.Buffer( List.Sort( old[dates_01], Order.Descending ) ),
            updates = List.Buffer( List.Sort( new, {each _{0}, Order.Descending} ) ),
            new_column = 
                List.Generate(
                    () => [ i = 0, lst = updates, new_col = {}],
                    (x) => x[i] <= old_count and not List.IsEmpty( x[lst] ),
                    (x) => 
                        let 
                            a = List.Skip ( x[lst], each _{0} >= dates{x[i]} ),
                            b = List.First( a ){1}
                        in [ i = x[i] + 1, lst = a, new_col = x[new_col] & {b} ]

                ),
            out = Table.FromColumns( Table.ToColumns(old) & {List.Last(new_column)[new_col]}, Table.ColumnNames(old) & {"Capacity_neededNEW"} )
        in out,
    // here we iterate table tbl and call our function, then combine our results into final table
    new_cap = Table.Combine(Table.TransformRows( tbl, each if [tbl2] = null then [tbl1] else fx_new_capacity( [tbl1], [tbl2] )))
in
    new_cap

View solution in original post

6 REPLIES 6
H_George
Frequent Visitor

Hi @AlienSx , thank you for the reply.
In table2 there are 2 arguments that I need for calculation.
In table1 are all my production plans for every material and the time capacity they need,. There is a possibility for planned improvement where the needed capacity changes for a material throughout a year (maybe more than one time), table 2 contains this information.
What I want is, that if the time was changed before a planned production date, than the capacity needed should change according to the new time from table2 (after getting a values I check if the values exist, if not I use the old time, if yes I use the new ones).
Sorry I can not get more into detail, since it is a company task.

@H_George , lets simplify this. Suppose you have 2 tables with same material_id: from tbl01 and tbl02 respectively. 

tbl01

material_iddates_01
33302/20/23
33303/20/23
33304/20/23

tbl02 selection with the same material_id

material_iddates_02
33301/15/23
33303/01/23

what would you want to select from tbl02?

As far as I can understand, you are iterating tbl_01 row by row, then select records from tbl02 with date less than current value in tbl01[dates_01], then select a single record using List.Max(tbl01[dates_02]). Correct? Could you please show the final selection from the tbl02 using table above? You may change the data if you find them irrelevant. 

 

 

Hi,
table1

material_IDdates_01Capacity needed_OLDCapacity needed_NEW
33302/20/23Time_1 
33303/20/23Time_1Time_2
33303/25/23Time_1Time_2
33304/20/23Time_1Time_3

 

table2

material_IDdates_02Improved Time
33303/15/23Time_2
33304/15/23Time_3


After the join I use a formula like this:
let val = [dates_01] in List.Count(List.Select(Table.Column([table2], "dates_02"), each _ <=val))
I store this value in a column as index.

Then I get the time with this:
Table.Column([table], "Improved Time"){[Index]-1}
I do this for every column I need(the time value is based on multiple columns, but I wouldn't get into it much more than this)

There has to be a better solution.
Also the best would be to replace the original times(this is historic data), right now I use an IF statement to check if the new time is empty or not, and that how I calculate with it.
Thank you for your time.

Hi, try this. Should work faster.

let
    // table 01 data
    tbl1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDSNzLQNwIxQzJzU+MNlWJ14HLGeORM8OszBcrFlKDKmpiYAAUM9Q2BWsE6S0zB4qampkCOEUI8zUQpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_01 = _t, #"Capacity needed_OLD" = _t]),
    type_01 = Table.TransformColumnTypes(tbl1,{{"dates_01", type date}}),
    // group table 01 by material_id and sort each table by date
    g1 = Table.Group(type_01, {"material_id"}, {{"tbl1", each Table.Sort(_, {"dates_01", Order.Descending})}}),
    // table 02 data
    tbl2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDWNzTVNwIxQzJzU+ONlGJ14HImqHLGYDkjIyOggKG+ob6RAViuBCRnbm4OljUxMcEia2lpCZY1NTWFyEKNBUuCBGNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_02 = _t, #"Improved Time" = _t]),
    type_02 = Table.TransformColumnTypes(tbl2,{{"dates_02", type date}}),
    // group table by material_id and make a record (dict) with material_id as field names
    g2 = Table.Group(type_02, {"material_id"}, {{"Value", each List.Zip({[dates_02], [Improved Time]})}}),
    dict = Record.FromTable(Table.RenameColumns(g2,{{"material_id", "Name"}})),
    // this is join replacement. Should work faster! 
    tbl = Table.AddColumn(g1, "tbl2", each Record.FieldOrDefault(dict, [material_id], null)),
    // this function iterates table "old" (resides in tbl1 column) and grabs date from tbl2
    fx_new_capacity = (old as table, new as list)  =>
        let 
            old_count = Table.RowCount(old),
            dates = List.Buffer( List.Sort( old[dates_01], Order.Descending ) ),
            updates = List.Buffer( List.Sort( new, {each _{0}, Order.Descending} ) ),
            new_column = 
                List.Generate(
                    () => [ i = 0, lst = updates, new_col = {}],
                    (x) => x[i] <= old_count and not List.IsEmpty( x[lst] ),
                    (x) => 
                        let 
                            a = List.Skip ( x[lst], each _{0} >= dates{x[i]} ),
                            b = List.First( a ){1}
                        in [ i = x[i] + 1, lst = a, new_col = x[new_col] & {b} ]

                ),
            out = Table.FromColumns( Table.ToColumns(old) & {List.Last(new_column)[new_col]}, Table.ColumnNames(old) & {"Capacity_neededNEW"} )
        in out,
    // here we iterate table tbl and call our function, then combine our results into final table
    new_cap = Table.Combine(Table.TransformRows( tbl, each if [tbl2] = null then [tbl1] else fx_new_capacity( [tbl1], [tbl2] )))
in
    new_cap

Thank you very much, this works perfectly. 

AlienSx
Super User
Super User

Hello, @H_George  If table join is your bottleneck then I would consider Record.FieldOrDefault approach instead - read these articles. I don't quite get what you are doing after tables join - some table2 rows selection based on 2 lists of dates? Maybe there is some space to improve performace at this point using list functions. Just couple of thoughts. 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors