Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
I have about 10 million rows and trying to groupby my data in Power BI by several columns. I've tried to do this with Power Query editor with "Group By" button but the performance is taking about 15-20 minuts. Then to boost the performance I've tried Python Pandas groupby function but the result is showing only VALUES.
So my question is, is it possible to boost the performance of power query? Or how can i workout with python? Thanks in advance.
My script is: data = data.groupby(['date', 'client ', 'product'] ).sum()
Input Data
date client product qty total amount
07-05-2022 Client-1 Phone 5 5000
07-05-2022 Client-2 PC 8 8500
07-05-2022 Client-2 PC 8 8500
Output Data
qty total amount
5 5000
16 17000
Thanks for your response @Vijay_A_Verma @v-jingzhang . I appreciate it. The source was from pc memory, csv.file. As you said i've tried to pre-summarize(Group.By) my data with Python script before loading it to Power BI. It is working fine and it reduced time from 25 minuts to 3-4.
Hi @Ismail94
What is the data source? If the data source is a relational database or other type of database, doing the pre-summarizing operation at the data source side will be more effective. This will greatly reduce the data size that is loaded into the report and reduce the performance load on your computer.
When you load all detailed data into Power BI and do the group-by with Power Query, all detailed data needs to be loaded into your computer's memory. The group-by process is executed by the Power Query M engine on your computer and will use the CPU and memory resources on your computer.
If you pre-summarize the data in the database, it will use the database engine to do this group-by job. It will use the resources on the computer or server where the database is installed. When you query the summarized data with Power Query, the data size is smaller and the query speed is faster. You can use the pre-summarized data directly in your report or do further transformations based on that.
Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.
Best best would be load the data in SQL Server (if you don't have access to SQL Server instance - SQL Server Express is free and can be installed on Desktop). Then fetch the data from SQL Server into PQ and while fetching it from SQL Server, issue Group by SQL statement.
Group by in SQL Server is quite faster than native PQ group by.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
14 | |
13 | |
12 | |
12 | |
12 |