March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hello all,
So it's not so much a problem, but rather a question regarding 2 different ways to do something.
Context :
I have a fairly big table (around 3 million rows), that i'm currently working on. As usual, it's the sort of table where you have EVERYHTING (you guys know what i'm talking about, right ?). I have removed as much unecessary things as possible and segment my table into multiple ones (Facts and Dims).
It works fine, i can clearly see the impact it has on my report.
Question :
There are some places where i use the "Group By" option from Power Query to get some aggregations, but in other places I use it simply to remove other columns and remove duplicates.
The thing is, I don't know if the optimal solution. I could also just "Remove unnecessary columns" and then use "Remove duplicates" to obtain the same results.
So my question is, which is the best option performance wise ? I tried using the diagnostics tool, but I have a hard time interpreting the results. Both solutions seem fine, but is one actually better than the other ? If you had a huge dataset, which would be more appropriate ?
Thank you in advance.
Solved! Go to Solution.
In Diagnostics, pick up the report generated which has aggregated word.
Select Exclusive duration column, Transform tab - Duration - Total seconds - Now Statistics - Sum
Note down the timing of both approaches. Whatever is lesser is the better.
Now the first principle in optimizing query is to perform vertical filtering i.e. remove all unnecessary columns.
Hence, if you perform remove columns followed by Group by, you will have very good amount of performance increase compared to do plain vanilla Group by.
Horizontal filtering (i.e. remove duplicates) is also one of the steps which we can resort to after Vertical filtering. But horizontal filtering will consume some finite time which will be added up to total time but same can be compensated by next steps as they will work on smaller dataset.
It is always better to work on a small size rectangle.
But you have to see how much time is taken in Query diagnostics on your data set for different approaches and choose the best one.
In Diagnostics, pick up the report generated which has aggregated word.
Select Exclusive duration column, Transform tab - Duration - Total seconds - Now Statistics - Sum
Note down the timing of both approaches. Whatever is lesser is the better.
Now the first principle in optimizing query is to perform vertical filtering i.e. remove all unnecessary columns.
Hence, if you perform remove columns followed by Group by, you will have very good amount of performance increase compared to do plain vanilla Group by.
Horizontal filtering (i.e. remove duplicates) is also one of the steps which we can resort to after Vertical filtering. But horizontal filtering will consume some finite time which will be added up to total time but same can be compensated by next steps as they will work on smaller dataset.
It is always better to work on a small size rectangle.
But you have to see how much time is taken in Query diagnostics on your data set for different approaches and choose the best one.
Hello,
Thank you for your full answer. In the end, both solutions are almost the same in terms of performance. I did however try to reduce the data even more BEFORE applying my solutions. It is indeed the best way to reduce the impact on performance.
Thanks again !
When I am comparing two approaches, I just duplicate the query, modify the new one to use the other approach, and load them in parallel and see which wins. You can speed up the testing by keeping fewer rows temporarily.
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Thank you for the advice. I also thought about it, but couldn't find the time to do so. Pretty easy way to find the best solution.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
21 | |
16 | |
13 | |
13 | |
8 |
User | Count |
---|---|
36 | |
32 | |
20 | |
19 | |
17 |