The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I tried searching for this topic but couldn't anywhere. May be I didn't frame the question correctly, so I am posting this myself.
For the import option, while using SQL to pull data, is it optimal to do any kind of categorical data massaging in the code itself or use a power query to do it later? Which will be faster and less space consuming?
For eg: is it good to use a case statement in the SQL code before importing or Switch statement in power query after importing?
Thanks for the help!
Solved! Go to Solution.
Generally you will have better performance the further up the chain that you push things. Think of it this way, if you implement the data cleanup in SQL you are pushing the work to the SQL box and thus using the server's resources to execute on the code versus with Power Query, a lot of the work is being done by the computer on which you are running Power BI. Generally going to be faster executing on a server which likely has better CPU/RAM than a computer. In addition, the query and cleanup is essentially happening locally so faster disk I/O. Plus, you are only pushing the relevant data across the wire (network) so that could very likely be much faster as well.
Again, these are generalities and your specific situation may be different.
In general, it is best to push the data massaging back in the data pipeline as far as possible. So, if you have the SQL skills, do it there. In fact, write a SQL View and then just connect your query to that. The data wrangling and other features in Power BI are for the folks that do not have the technical ability or access to be able to do direct data manipulation within the source system(s).
@Greg_Deckler Understood but is there a performance difference between the 2 ways?
Generally you will have better performance the further up the chain that you push things. Think of it this way, if you implement the data cleanup in SQL you are pushing the work to the SQL box and thus using the server's resources to execute on the code versus with Power Query, a lot of the work is being done by the computer on which you are running Power BI. Generally going to be faster executing on a server which likely has better CPU/RAM than a computer. In addition, the query and cleanup is essentially happening locally so faster disk I/O. Plus, you are only pushing the relevant data across the wire (network) so that could very likely be much faster as well.
Again, these are generalities and your specific situation may be different.
@srivi the way you want to look at this where business logic should reside. Think of it if you are going to use the same table in some other report and need switch/case condition, better to do it in sql so that every report using the same table has one layer of business logic. Hope this helps.
Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!
Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo
If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤
Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
113 | |
83 | |
73 | |
51 | |
42 |
User | Count |
---|---|
140 | |
112 | |
72 | |
64 | |
63 |