Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Shape the future of the Fabric Community! Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions. Take survey.

Reply
ChemEnger
Advocate IV
Advocate IV

VERY Slow Group By

I have a query (from a Power BI Dataflow) with approximately 14000 rows.  The Primary key ([Parent], Int64.Type) can have multiple records ([Batch Number], type text) associated with it.  I need to count the number of [Batch Number]s associated with each [Parent] so that I can exclude [Parent] records with more than 20 [Batch Number] records.

 

No matter what I try, any aggregation step is VERY slow.  I have also tried buffering the data but this step is itself very slow.

 

My current steps (including re-expanding the original records are:

#"Parents Only" = Table.SelectColumns(#"Previous Step", {"Parent"}),
//Group the records, with a count **THIS IS THE SLOW STEP**
#"Grouped Rows" = Table.Group(#"Parents Only", {"Parent"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
//Add a key **Not sure I need this but thought it would speed things up??**
#"Added [Parent] Key" = Table.AddKey(#"Grouped Rows", {"Parent"}, true),
// Filtered to Parents with NGT 20 batches listed
#"Filtered Rows" = Table.SelectRows(#"Added [Parent] Key", each [Count] <= 20),
// Remove count, just leave Parent ID
#"Removed Count" = Table.RemoveColumns(#"Filtered Rows",{"Count"}),
// Re-join the list of Parent ID numbers to the filtered table, now only with Parents with NGT 20 batches
#"Re-join" = Table.NestedJoin(#"Removed Count", {"Parent"}, #"Added [Parent] Key", {"Parent"}, "Original Record"),
#"Expanded Original Record" = Table.ExpandTableColumn(#"Re-join", "Original Record", {"Batch Number"}, {"Batch Number"}),

I have tried the Table.AddKey step before the Table.Group but this but this doesn't speed up the process and is also fairly slow itself.

I can't create a separate query of just the [Parent] records, as some contain e.g. multiple [Batch Number]s separated by commas that need de-aggregating (Table.ExpandListColumn...) to multiple records within the same query.

1 ACCEPTED SOLUTION
ChemEnger
Advocate IV
Advocate IV

SOLVED.

 

The speed (or lack thereof) was due to a previous step in the query - I was filtering the the first part of the [Batch Number] field against a list of known [Batch Prefix]s.  What was happening (obvious now!) was that the call was being made to this list multiple times.  A List.Buffer wrapper therefore solved it in one go!

 

//Buffer the list of Batch Prefixes to prevent multiple calls
#"Batch Prefixes" = List.Buffer(#"Prefixes"[Batch Prefix]),
//Select only rows with batch prefixes that match the [Batch Number] Prefix
#"Filter for Known Batches" = Table.SelectRows(#"Table with Batch Prefixes", each List.Contains(#"Batch Prefixes", [Batch Prefix])),

 

One point to note is that I did try to create a list query of the Batch Prefixes up front and reference that but got a Formula.Firewall error, hence creating the list from a table within this query (#"Prefixes"[Batch Prefix])

View solution in original post

1 REPLY 1
ChemEnger
Advocate IV
Advocate IV

SOLVED.

 

The speed (or lack thereof) was due to a previous step in the query - I was filtering the the first part of the [Batch Number] field against a list of known [Batch Prefix]s.  What was happening (obvious now!) was that the call was being made to this list multiple times.  A List.Buffer wrapper therefore solved it in one go!

 

//Buffer the list of Batch Prefixes to prevent multiple calls
#"Batch Prefixes" = List.Buffer(#"Prefixes"[Batch Prefix]),
//Select only rows with batch prefixes that match the [Batch Number] Prefix
#"Filter for Known Batches" = Table.SelectRows(#"Table with Batch Prefixes", each List.Contains(#"Batch Prefixes", [Batch Prefix])),

 

One point to note is that I did try to create a list query of the Batch Prefixes up front and reference that but got a Formula.Firewall error, hence creating the list from a table within this query (#"Prefixes"[Batch Prefix])

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.