Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
MattMcMillin_71
Frequent Visitor

Troubles with grouping in Power Query

Hi, I have c.20 million rows of data with aobut 15 columns per row/transaction  but want to limit  the output  into the Desktop by using the group by function in Power Query. This is meaning I am waiting for a long time to do its calculations.
Is there anything I need to be aware of, ie shoudl I not be grouping  the data using an alrerady calculated column etc?

1 ACCEPTED SOLUTION

You have 2 options then I reckon. If it's a fabric pipeline, Spark based pipeline or an Azure Data Factory pipeline, you can add transformation steps to do the aggregations and then the Power Query engine won't need to spend time doing that effort.

Alternatively, the better option in my opinion, would be to make a view based on that table, within the SQL Database that you're querying. You can do things like grouping and roll up within SQL statements. If you're not the best at SQL then maybe use ChatGPT to try and convert your m code into SQL code, which you can use to make the view.

View solution in original post

8 REPLIES 8
Omid_Motamedise
Super User
Super User

Do you apply the simple grouping or advance (using the fourth and fifth argument) one? 

If my answer helped solve your issue, please consider marking it as the accepted solution. It helps others in the community find answers faster—and keeps the community growing stronger!
You can also check out my YouTube channel for tutorials, tips, and real-world solutions in Power Query with the following link
https://youtube.com/@omidbi?si=96Bo-ZsSwOx0Z36h

Just a simple grouping with  5 groupings
"Month Cohort", "Week Ending", "Confidence Group", "category", "type"
The 1st 2 are using the Date  option when creating a new column, based off the existing transaction date column:
= Table.AddColumn(#"Week ending", "Month Name", each Date.MonthName([operation_date]), type text)

"Confidence Group" is simple conditional column, where:
= Table.AddColumn(#"Removed Duplicates", "Confidence Group", each if [confidence] >= 0.5 then "High" else "Low")

Last 2 groupings are based on existing database table columns

Power_BI9
New Member

To improve the performance of Group By in Power Query with large datasets, filter data first and avoid using complex calculated columns for grouping. Limit the number of grouping columns and use simpler aggregation functions. Ensure data types are correct and consider loading the data into the Data Model for better performance. Lastly, test with smaller samples before applying changes to the full dataset. For more tips and advanced techniques, you can explore this Power BI course that offers in-depth knowledge on optimizing Power BI performance.

Anonymous
Not applicable

Hi,

Thanks for the solution pbiuseruk  and BabyYoda offered, and i want to offer some more information for user to refer to.

hello @MattMcMillin_71 , bas on your description, you'd better not use the calculated column to group, it will affect the performance.You can refer to the following solutions:

1.as @pbiuseruk  and @BabyYoda mentioned, you can consider to group on data source.

2.Simplify your data types and only load important columns into Power BI, you can refer to the following link.

Optimization guide for Power BI - Power BI | Microsoft Learn

3.If your datasource supports incremental refresh, you can consider to use incremental refresh to process the data instead of the whole data, it can improve the performance, you can refer to the following link about incremental refresh.

Incremental refresh for semantic models in Power BI - Power BI | Microsoft Learn

Configure incremental refresh and real-time data for Power BI semantic models - Power BI | Microsoft...

 

 

Best Regards!

Yolo Zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

pbiuseruk
Resolver II
Resolver II

Agree with @BabyYoda - if you're using a DB, can you have view created and then query that instead?

Its an SQL database table, which is daily refreshed as a pipeline

You have 2 options then I reckon. If it's a fabric pipeline, Spark based pipeline or an Azure Data Factory pipeline, you can add transformation steps to do the aggregations and then the Power Query engine won't need to spend time doing that effort.

Alternatively, the better option in my opinion, would be to make a view based on that table, within the SQL Database that you're querying. You can do things like grouping and roll up within SQL statements. If you're not the best at SQL then maybe use ChatGPT to try and convert your m code into SQL code, which you can use to make the view.

BabyYoda
Frequent Visitor

Can you do the grouping at the source so that you aren't bringing in so many rows?  What type of data source is it?

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors