The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi everyone,
I have 2 large dataset where there are no unique values.
What I would like to do is import every value from my second table to the first table where the Material IDs match, and in one of the columns of the second table the date is earlier than the date in one of the columns in the firsttable.
The second table does not have a value for every material ID from the first, but every material ID can have multiple rows in both tables. I need the row with the max date matching the criteria.
I need all the rows from the first table for later calculation.
I can do this now by left joining the 2 tables, counting the values in every nested table where the date is earlier than the column in my firsttable(the dates are sorted descending beforehand), after that I select the values one by one from the nested table based on a counter-1.
This works fully, my problem is that the first table has 50-100k rows and the second 750-2000 rows, so running this process takes a lot of time. I would really appreciate if someone could help me with this.
Br, George
Solved! Go to Solution.
Hi, try this. Should work faster.
let
// table 01 data
tbl1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDSNzLQNwIxQzJzU+MNlWJ14HLGeORM8OszBcrFlKDKmpiYAAUM9Q2BWsE6S0zB4qampkCOEUI8zUQpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_01 = _t, #"Capacity needed_OLD" = _t]),
type_01 = Table.TransformColumnTypes(tbl1,{{"dates_01", type date}}),
// group table 01 by material_id and sort each table by date
g1 = Table.Group(type_01, {"material_id"}, {{"tbl1", each Table.Sort(_, {"dates_01", Order.Descending})}}),
// table 02 data
tbl2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDWNzTVNwIxQzJzU+ONlGJ14HImqHLGYDkjIyOggKG+ob6RAViuBCRnbm4OljUxMcEia2lpCZY1NTWFyEKNBUuCBGNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_02 = _t, #"Improved Time" = _t]),
type_02 = Table.TransformColumnTypes(tbl2,{{"dates_02", type date}}),
// group table by material_id and make a record (dict) with material_id as field names
g2 = Table.Group(type_02, {"material_id"}, {{"Value", each List.Zip({[dates_02], [Improved Time]})}}),
dict = Record.FromTable(Table.RenameColumns(g2,{{"material_id", "Name"}})),
// this is join replacement. Should work faster!
tbl = Table.AddColumn(g1, "tbl2", each Record.FieldOrDefault(dict, [material_id], null)),
// this function iterates table "old" (resides in tbl1 column) and grabs date from tbl2
fx_new_capacity = (old as table, new as list) =>
let
old_count = Table.RowCount(old),
dates = List.Buffer( List.Sort( old[dates_01], Order.Descending ) ),
updates = List.Buffer( List.Sort( new, {each _{0}, Order.Descending} ) ),
new_column =
List.Generate(
() => [ i = 0, lst = updates, new_col = {}],
(x) => x[i] <= old_count and not List.IsEmpty( x[lst] ),
(x) =>
let
a = List.Skip ( x[lst], each _{0} >= dates{x[i]} ),
b = List.First( a ){1}
in [ i = x[i] + 1, lst = a, new_col = x[new_col] & {b} ]
),
out = Table.FromColumns( Table.ToColumns(old) & {List.Last(new_column)[new_col]}, Table.ColumnNames(old) & {"Capacity_neededNEW"} )
in out,
// here we iterate table tbl and call our function, then combine our results into final table
new_cap = Table.Combine(Table.TransformRows( tbl, each if [tbl2] = null then [tbl1] else fx_new_capacity( [tbl1], [tbl2] )))
in
new_cap
Hi @AlienSx , thank you for the reply.
In table2 there are 2 arguments that I need for calculation.
In table1 are all my production plans for every material and the time capacity they need,. There is a possibility for planned improvement where the needed capacity changes for a material throughout a year (maybe more than one time), table 2 contains this information.
What I want is, that if the time was changed before a planned production date, than the capacity needed should change according to the new time from table2 (after getting a values I check if the values exist, if not I use the old time, if yes I use the new ones).
Sorry I can not get more into detail, since it is a company task.
@H_George , lets simplify this. Suppose you have 2 tables with same material_id: from tbl01 and tbl02 respectively.
tbl01
material_id | dates_01 |
333 | 02/20/23 |
333 | 03/20/23 |
333 | 04/20/23 |
tbl02 selection with the same material_id
material_id | dates_02 |
333 | 01/15/23 |
333 | 03/01/23 |
what would you want to select from tbl02?
As far as I can understand, you are iterating tbl_01 row by row, then select records from tbl02 with date less than current value in tbl01[dates_01], then select a single record using List.Max(tbl01[dates_02]). Correct? Could you please show the final selection from the tbl02 using table above? You may change the data if you find them irrelevant.
Hi,
table1
material_ID | dates_01 | Capacity needed_OLD | Capacity needed_NEW |
333 | 02/20/23 | Time_1 | |
333 | 03/20/23 | Time_1 | Time_2 |
333 | 03/25/23 | Time_1 | Time_2 |
333 | 04/20/23 | Time_1 | Time_3 |
table2
material_ID | dates_02 | Improved Time |
333 | 03/15/23 | Time_2 |
333 | 04/15/23 | Time_3 |
After the join I use a formula like this:
let val = [dates_01] in List.Count(List.Select(Table.Column([table2], "dates_02"), each _ <=val))
I store this value in a column as index.
Then I get the time with this:
Table.Column([table], "Improved Time"){[Index]-1}
I do this for every column I need(the time value is based on multiple columns, but I wouldn't get into it much more than this)
There has to be a better solution.
Also the best would be to replace the original times(this is historic data), right now I use an IF statement to check if the new time is empty or not, and that how I calculate with it.
Thank you for your time.
Hi, try this. Should work faster.
let
// table 01 data
tbl1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDSNzLQNwIxQzJzU+MNlWJ14HLGeORM8OszBcrFlKDKmpiYAAUM9Q2BWsE6S0zB4qampkCOEUI8zUQpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_01 = _t, #"Capacity needed_OLD" = _t]),
type_01 = Table.TransformColumnTypes(tbl1,{{"dates_01", type date}}),
// group table 01 by material_id and sort each table by date
g1 = Table.Group(type_01, {"material_id"}, {{"tbl1", each Table.Sort(_, {"dates_01", Order.Descending})}}),
// table 02 data
tbl2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjY2VtJRMjDWNzTVNwIxQzJzU+ONlGJ14HImqHLGYDkjIyOggKG+ob6RAViuBCRnbm4OljUxMcEia2lpCZY1NTWFyEKNBUuCBGNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [material_id = _t, dates_02 = _t, #"Improved Time" = _t]),
type_02 = Table.TransformColumnTypes(tbl2,{{"dates_02", type date}}),
// group table by material_id and make a record (dict) with material_id as field names
g2 = Table.Group(type_02, {"material_id"}, {{"Value", each List.Zip({[dates_02], [Improved Time]})}}),
dict = Record.FromTable(Table.RenameColumns(g2,{{"material_id", "Name"}})),
// this is join replacement. Should work faster!
tbl = Table.AddColumn(g1, "tbl2", each Record.FieldOrDefault(dict, [material_id], null)),
// this function iterates table "old" (resides in tbl1 column) and grabs date from tbl2
fx_new_capacity = (old as table, new as list) =>
let
old_count = Table.RowCount(old),
dates = List.Buffer( List.Sort( old[dates_01], Order.Descending ) ),
updates = List.Buffer( List.Sort( new, {each _{0}, Order.Descending} ) ),
new_column =
List.Generate(
() => [ i = 0, lst = updates, new_col = {}],
(x) => x[i] <= old_count and not List.IsEmpty( x[lst] ),
(x) =>
let
a = List.Skip ( x[lst], each _{0} >= dates{x[i]} ),
b = List.First( a ){1}
in [ i = x[i] + 1, lst = a, new_col = x[new_col] & {b} ]
),
out = Table.FromColumns( Table.ToColumns(old) & {List.Last(new_column)[new_col]}, Table.ColumnNames(old) & {"Capacity_neededNEW"} )
in out,
// here we iterate table tbl and call our function, then combine our results into final table
new_cap = Table.Combine(Table.TransformRows( tbl, each if [tbl2] = null then [tbl1] else fx_new_capacity( [tbl1], [tbl2] )))
in
new_cap
Thank you very much, this works perfectly.
Hello, @H_George If table join is your bottleneck then I would consider Record.FieldOrDefault approach instead - read these articles. I don't quite get what you are doing after tables join - some table2 rows selection based on 2 lists of dates? Maybe there is some space to improve performace at this point using list functions. Just couple of thoughts.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.