Solved: Re: Troubles with grouping in Power Query

MattMcMillin_71 · ‎10-14-2024

Hi, I have c.20 million rows of data with aobut 15 columns per row/transaction but want to limit the output into the Desktop by using the group by function in Power Query. This is meaning I am waiting for a long time to do its calculations.
Is there anything I need to be aware of, ie shoudl I not be grouping the data using an alrerady calculated column etc?

pbiuseruk · ‎10-16-2024

You have 2 options then I reckon. If it's a fabric pipeline, Spark based pipeline or an Azure Data Factory pipeline, you can add transformation steps to do the aggregations and then the Power Query engine won't need to spend time doing that effort.

Alternatively, the better option in my opinion, would be to make a view based on that table, within the SQL Database that you're querying. You can do things like grouping and roll up within SQL statements. If you're not the best at SQL then maybe use ChatGPT to try and convert your m code into SQL code, which you can use to make the view.

View solution in original post

Omid_Motamedise · ‎10-14-2024

Do you apply the simple grouping or advance (using the fourth and fifth argument) one?

If my answer helped solve your issue, please consider marking it as the accepted solution.

MattMcMillin_71 · ‎10-16-2024

Just a simple grouping with 5 groupings
"Month Cohort", "Week Ending", "Confidence Group", "category", "type"
The 1st 2 are using the Date option when creating a new column, based off the existing transaction date column:
= Table.AddColumn(#"Week ending", "Month Name", each Date.MonthName([operation_date]), type text)

"Confidence Group" is simple conditional column, where:
= Table.AddColumn(#"Removed Duplicates", "Confidence Group", each if [confidence] >= 0.5 then "High" else "Low")

Last 2 groupings are based on existing database table columns

Power_BI9 · ‎10-14-2024

To improve the performance of Group By in Power Query with large datasets, filter data first and avoid using complex calculated columns for grouping. Limit the number of grouping columns and use simpler aggregation functions. Ensure data types are correct and consider loading the data into the Data Model for better performance. Lastly, test with smaller samples before applying changes to the full dataset. For more tips and advanced techniques, you can explore this Power BI course that offers in-depth knowledge on optimizing Power BI performance.

Anonymous · ‎10-14-2024

Hi,

Thanks for the solution pbiuseruk and BabyYoda offered, and i want to offer some more information for user to refer to.

hello @MattMcMillin_71 , bas on your description, you'd better not use the calculated column to group, it will affect the performance.You can refer to the following solutions:

1.as @pbiuseruk and @BabyYoda mentioned, you can consider to group on data source.

2.Simplify your data types and only load important columns into Power BI, you can refer to the following link.

Optimization guide for Power BI - Power BI | Microsoft Learn

3.If your datasource supports incremental refresh, you can consider to use incremental refresh to process the data instead of the whole data, it can improve the performance, you can refer to the following link about incremental refresh.

Incremental refresh for semantic models in Power BI - Power BI | Microsoft Learn

Configure incremental refresh and real-time data for Power BI semantic models - Power BI | Microsoft...

Best Regards!

Yolo Zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

pbiuseruk · ‎10-14-2024

Agree with @BabyYoda - if you're using a DB, can you have view created and then query that instead?

MattMcMillin_71 · ‎10-16-2024

Its an SQL database table, which is daily refreshed as a pipeline

pbiuseruk · ‎10-16-2024

You have 2 options then I reckon. If it's a fabric pipeline, Spark based pipeline or an Azure Data Factory pipeline, you can add transformation steps to do the aggregations and then the Power Query engine won't need to spend time doing that effort.

Alternatively, the better option in my opinion, would be to make a view based on that table, within the SQL Database that you're querying. You can do things like grouping and roll up within SQL statements. If you're not the best at SQL then maybe use ChatGPT to try and convert your m code into SQL code, which you can use to make the view.

BabyYoda · ‎10-14-2024

Can you do the grouping at the source so that you aren't bringing in so many rows? What type of data source is it?

Troubles with grouping in Power Query

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

Fabric Data Days starts November 4th!

Troubles with grouping in Power Query

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026