March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi,
I'm looking for a more efficient method to categorize dates based on a reference table that contains keywords.
In the query below, I search for keywords (Booking and Partner) twice and then merge to retrieve specific columns from the reference table. However, any modification is quite slow, even simple tasks like changing the format. The left pane indicates a high amount of MB loading. I've applied some filters to reduce the number of lines and also buffers.
Would anyone know of a smarter approach to achieve the same results?
Thank you very much,
Nad
Steps:
Added Booking keyword: This column is created by searching for matches between the "Booking keyword" column in a reference table named REF_KEY and the "Buchungstext" column in the Source table. If a match is found, the value from the REF_KEY table is added to the new column.
Added Partner keyword: Similar to the previous step,
…
Merged Queries: Performs a left outer join between the Source table and the REF_KEY table based on three columns: "Kostenart", "Match Booking keyword", and "Match Partner keyword".
…
Query
let
// Rename the TRANSFORM table as Source
Source = TRANSFORM,
// Add custom column for Booking keyword match
#"added Booking keyword" = Table.AddColumn(bufferedSource, "Match Booking keyword", each
let
// Filter REF_KEY table and buffer it
filtered_REF_KEY = Table.Buffer(Table.SelectColumns(REF_KEY, {"Kostenart", "Booking keyword"})),
condition1 = [Kostenart],
filteredCondition = Table.SelectRows(filtered_REF_KEY, each [Kostenart] = condition1),
whatSearch = List.RemoveNulls(filteredCondition[Booking keyword]),
whereSearch = [Buchungstext], // Assuming [Buchungstext] is a column in Source
matches = List.Select(whatSearch, each Text.Contains(Text.Lower(whereSearch), Text.Lower(_)))
in
if List.IsEmpty(matches) then null else matches{0}
),
// Add custom column for Partner keyword match
#"added Partner keyword" = Table.AddColumn(#"added Booking keyword", "Match Partner keyword", each
let
// Filter REF_KEY table and buffer it
filtered_REF_KEY = Table.Buffer(Table.SelectColumns(REF_KEY, {"Kostenart", "Partner keyword"})),
condition1 = [Kostenart],
filteredCondition = Table.SelectRows(filtered_REF_KEY, each [Kostenart] = condition1),
whatSearch = List.RemoveNulls(filteredCondition[Partner keyword]),
whereSearch = [Partner],
matches = List.Select(whatSearch, each Text.Contains(Text.Lower(whereSearch), Text.Lower(_)))
in
if List.IsEmpty(matches) then null else matches{0}
),
#"Inserted Year" = Table.AddColumn(#"added Partner keyword", "Year", each Date.Year([Belegdatum]), Int64.Type),
#"Merged Queries" = Table.NestedJoin(#"Inserted Year", {"Kostenart", "Match Booking keyword", "Match Partner keyword"}, REF_KEY, {"Kostenart", "Booking keyword", "Partner keyword"}, "REF_KEY", JoinKind.LeftOuter),
#"Expanded REF_KEY" = Table.ExpandTableColumn(#"Merged Queries", "REF_KEY", {"Assumption", "Em. fact. key"}, {"Assumption", "Em. fact. key"})
in
#"Expanded REF_KEY"
Solved! Go to Solution.
Hi @Nad_PBI_21 ,
I can give you some suggestions:
1. Minimize the use of 'Table.Buffer'. While buffering can improve performance by reducing the number of times a source is accessed, overusing it, especially with large tables, can lead to high memory consumption. Consider buffering only the smallest necessary tables or the ones that are used multiple times in subsequent transformations.
2. Instead of performing the keyword search within an `each` function for every row, you could try to transform your reference table into a list of keywords and their corresponding categories once, and then use this list to categorize your main table. This way, you perform the heavy lifting of creating the keyword-category list just once.
3. Use 'Table.Join' instead of 'Table.NestedJoin' when possible. If you don't need nested tables and only require a few columns from the reference table, a regular join might be more efficient.
4. Ensure that the columns used for joins and searches have the appropriate data types. Text operations can be slow on numeric or date fields if they are not properly typed.
Best Regards,
Dino Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @Nad_PBI_21 ,
I can give you some suggestions:
1. Minimize the use of 'Table.Buffer'. While buffering can improve performance by reducing the number of times a source is accessed, overusing it, especially with large tables, can lead to high memory consumption. Consider buffering only the smallest necessary tables or the ones that are used multiple times in subsequent transformations.
2. Instead of performing the keyword search within an `each` function for every row, you could try to transform your reference table into a list of keywords and their corresponding categories once, and then use this list to categorize your main table. This way, you perform the heavy lifting of creating the keyword-category list just once.
3. Use 'Table.Join' instead of 'Table.NestedJoin' when possible. If you don't need nested tables and only require a few columns from the reference table, a regular join might be more efficient.
4. Ensure that the columns used for joins and searches have the appropriate data types. Text operations can be slow on numeric or date fields if they are not properly typed.
Best Regards,
Dino Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
thanks a lot for your help, I will try to follow your recommendations @v-junyant-msft
Hi @Nad_PBI_21,
We'd like to help you, but provide sample data as table and also expected result please.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
21 | |
16 | |
13 | |
12 | |
9 |
User | Count |
---|---|
34 | |
31 | |
20 | |
19 | |
17 |