Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
Ismail94
Frequent Visitor

Group By in power query taking too long time.

I have about 10 million rows and trying to groupby my data in Power BI by several columns. I've tried to do this with Power Query editor with "Group By" button but the performance is taking about 15-20 minuts. Then to boost the performance I've tried Python Pandas groupby function but the result is showing only VALUES. 

 So my question is, is it possible to boost the performance of power query? Or how can i workout with python? Thanks in advance.
My script is: data = data.groupby(['date', 'client ', 'product'] ).sum()

Input Data

 

date       client   product qty   total amount  
07-05-2022 Client-1 Phone   5     5000          
07-05-2022 Client-2 PC      8     8500          
07-05-2022 Client-2 PC      8     8500

 

 

Output Data

 

qty    total amount 
5      5000         
16     17000

 

 

 

 

 

 

 

3 REPLIES 3
Ismail94
Frequent Visitor

Thanks for your response @Vijay_A_Verma  @v-jingzhang . I appreciate it.  The source was from pc memory, csv.file. As you said i've tried to pre-summarize(Group.By) my data with Python script before loading it to Power BI. It is working fine and it reduced time from 25 minuts to 3-4. 

v-jingzhang
Community Support
Community Support

Hi @Ismail94 

 

What is the data source? If the data source is a relational database or other type of database, doing the pre-summarizing operation at the data source side will be more effective. This will greatly reduce the data size that is loaded into the report and reduce the performance load on your computer. 

 

When you load all detailed data into Power BI and do the group-by with Power Query, all detailed data needs to be loaded into your computer's memory. The group-by process is executed by the Power Query M engine on your computer and will use the CPU and memory resources on your computer. 

 

If you pre-summarize the data in the database, it will use the database engine to do this group-by job. It will use the resources on the computer or server where the database is installed. When you query the summarized data with Power Query, the data size is smaller and the query speed is faster. You can use the pre-summarized data directly in your report or do further transformations based on that. 

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

Vijay_A_Verma
Super User
Super User

Best best would be load the data in SQL Server (if you don't have access to SQL Server instance - SQL Server Express is free and can be installed on Desktop). Then fetch the data from SQL Server into PQ and while fetching it from SQL Server, issue Group by SQL statement.

Group by in SQL Server is quite faster than native PQ group by. 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Kudoed Authors