Power Query Performance

Ian_Stuart_Rupe · ‎12-23-2021

In general I find Power Query performance to be very poor when building a medium complexity script. For example it seems really inefficient on joins (merges) and if I have a fact table that I want to join to multiple dimensions (to bring back the surrogate keys) it can take a very long time to refresh the data view when moving from step to step. Also getting row counts is very slow and this is something that is good practice to do before and after merges. It is frustrating when one knows how fast this would be in SQL.

What do other people think? Is there a live idea on https://ideas.powerbi.com/ideas/ ? I couldn't see one. Has anyone suggested using SQL as the language rather than M? Could this be done?

Thanks,

Ian

Ian_Stuart_Rupe · ‎01-14-2022

Hi all, I have voted for an existing idea at https://ideas.powerbi.com/ideas/idea/?ideaid=03e29a3d-c483-42a3-9860-1e31b028d715 If you agree could you vote and sppread the word? It needs votes.

Ian_Stuart_Rupe · ‎01-10-2022

Thanks everyone for the posts, very helpful and I have wached the video @lbendlin and I am reading the other articles to see if there is anything that I can do to improve performance. At the moment, as query folding is not an option with file sources, I think any gains will be minimal, but we shall see.

Anonymous · ‎12-31-2021

Hi @Ian_Stuart_Rupe ,

Could you tell me if your problem has been solved?
If it is, kindly Accept it as the solution. More people will benefit from it.
Or you are still confused about it, please provide me with more details about your problem.

Best Regards,
Stephen Tao

Ian_Stuart_Rupe · ‎01-11-2022

Not solved. It seems that performance is a known issue.

Ian_Stuart_Rupe · ‎12-23-2021

As and when I can go server side I do but when bringing data from files and wanting to create a dimensional model that is not an option. I had not thought of using DAX to create the surrogate key columns on the fact table. That could well be an option, thank you. I will raise a new idea to improve performance/ change the language too unless someone else knows of one that has a;lready been raised.

AlexisOlson · ‎12-23-2021

DAX might be problematic for creating key columns as you might run into circular dependencies if you then use these to create relationships on.

Power Query can be a bit hit and miss with performance depending on a number of factors including things like how well (or if) it's folding queries back to native SQL. The Power Query experts around here like @lbendlin are well aware of the performance limitations and have been pushing for improvements, so they may have specific ideas to add your voice to.

smpa01 · ‎12-23-2021

@AlexisOlson this seems like a good opportunity since you brought it up and this has been on my mind for a very long time.

If I am bringing all my tables from a SQL server and I am doing all the server side transformation through native SQL queries, is there a reason why I should care about query folding?

Please correct me if I am wrong, but the way I undertand query folding is if you are bringing tables from a server (take TSQL for example) and you don't write native SQL queries and rather transform through PQ syntax, as long as those syntaxs are comparable to a SQL equivalent transformation, the query folding will happen (e.g. Table.SelectColumns= SELECT a,b,c).

But if my whole query is a native SQL why query folding is important to me (query folding will not happen in the first place if you write a native SQL query altogether)?

========================
Did I answer your question? Mark my post as a solution!
Proud to be a Super User
My Custom Visualization Projects
• Plotting Live Sound: Live Sound
• Beautiful News: Women in Parliament, Energy Mix, Shrinking Armies
• Visual Capitalist: Working Hrs
• Others: Easing Graph, Animated Calendar
MayViz Submissions
• Week 1: View
• Week 2: View
• Week 3: View
• Week 4: View
========================

lbendlin · ‎12-23-2021

@smpa01 If you hand craft your queries then Query folding is automatically disabled.

But there is another aspect to it - the aspect of cost. Why should I encourage query folding when my source system may be weak and unprepared (no indexes for example) while the PC or service has ample of memory and oomph to do the processing? So if your source system is good at spooling but weak at custom queries then you should not encourage query folding, and rather emphasise the use of Table.Buffer().

smpa01 · ‎12-23-2021

@lbendlin thanks for this insight and I am trying to relate it to my case.

90% of my source data comes from SQL server. they are well indexed and I bring them using custom SQL queries. So in those cases do I understand, neither query folding has any relevance nor should I care about it?

I also did not understand what you meant by So if your source system is good at spooling but weak at custom queries - when can this happen in case of a SQL server,- lack of indexing, poor DB designing?

========================
Did I answer your question? Mark my post as a solution!
Proud to be a Super User
My Custom Visualization Projects
• Plotting Live Sound: Live Sound
• Beautiful News: Women in Parliament, Energy Mix, Shrinking Armies
• Visual Capitalist: Working Hrs
• Others: Easing Graph, Animated Calendar
MayViz Submissions
• Week 1: View
• Week 2: View
• Week 3: View
• Week 4: View
========================

AlexisOlson · ‎12-23-2021

Most of my data come from SQL server too. I never use custom SQL queries within the Power BI queries but rather connect to tables or views that are already shaped like I want them. The idea is to keep SQL code stored and managed on the server and keep transformations in M fairly minimal for those tables. There's more M involved for sources like Excel or SharePoint where I can't really push the processing upstream.

I often do exploration and rapid prototyping with the Query Editor since it has such a nice GUI / coding balance and allows mashing together different sources so easily. But when moving to production, it's more of just a connector to data sources rather than a transformational layer.

Note: I'm not making any value claims as to best practice here. Just sharing my personal experience.

lbendlin · ‎12-23-2021

Yes. Just because you are a good and diligent SQL server admin doesn't mean everybody else is. Oftentimes you have to access someone else's SQL server tables and views, and you have no visibility to their setup. In such cases spooling (and throwing stuff away later) may be your only viable choice.

smpa01 · ‎12-23-2021

@lbendlin many thanks the picture is getting clearer. I can see the other perspective now.

So if I am querying a large tbl (let's say a 1M+ db tbl) exists in an inefficient SQL server , are you saying PQ will rather do the transformation faster (due to query folding aspect) than through native SQL query (poor db design) ? I am just scared that given PQ's poor performance aspect, I doubt whether it will do it faster (with query folding) than the non-indexd SQL server (inefficient server). I can take that and experiment against a non-indexd db large db table.

Even if that is true, must not one try to detrmine first how efficient/ineffcient the server is before deciding on native SQL or PQ (to Query folding to take place).

========================
Did I answer your question? Mark my post as a solution!
Proud to be a Super User
My Custom Visualization Projects
• Plotting Live Sound: Live Sound
• Beautiful News: Women in Parliament, Energy Mix, Shrinking Armies
• Visual Capitalist: Working Hrs
• Others: Easing Graph, Animated Calendar
MayViz Submissions
• Week 1: View
• Week 2: View
• Week 3: View
• Week 4: View
========================

lbendlin · ‎12-23-2021

That's not exactly what I am saying. I am saying that spooling can be a better option in such a scenario. Query folding would only make it worse, so you would need to suppress that, with Table.Buffer() for example.

lbendlin · ‎12-23-2021

Thanks for the flowers @AlexisOlson but I'm no Power Query expert by any stretch. I am currently re-reading the utterly fantastic primer on Power Query by Ben Gribaudo

Power Query M Primer (part 1): Introduction, Simple Expressions & let | Ben Gribaudo

@Ian_Stuart_Rupe I highly recommend you read the articles as well - all of them. You will be able to appreciate how Power Query actually works, what it is good at (hint: Counting rows is definitely not on the list, quite the opposite) and what differentiates it from other languages like SQL. The primer has an extensive section on keys and indexes and their influence on joins.

Here's a great video on the topic: How Power Query Thinks: Taking the Mystery Out of Streaming and Query Folding (Video) | Ben Gribaudo

Ian_Stuart_Rupe · ‎01-11-2022

Hi @lbendlin I read (or at least skip read) all the articles. Some were new to me and very helpful. I did not see the artucle that addressed counting rows though?

lbendlin · ‎01-11-2022

I made the same mistake initially (skip reading the articles). Re-reading them slowly was time well spent. At the minimum you may want to rewatch the streaming video.

Ian_Stuart_Rupe · ‎01-11-2022

Understood but I have to cherry pick what I spend my time on. A lot of those articles are too deep and others I already know about. There are some gems in there too but I simply do not have time to search for hidden treasure at the moment. Suffice to say we all agree it is slow methinks 🙂

smpa01 · ‎01-11-2022

@Ian_Stuart_Rupe are you on premium workspace ?

========================
Did I answer your question? Mark my post as a solution!
Proud to be a Super User
My Custom Visualization Projects
• Plotting Live Sound: Live Sound
• Beautiful News: Women in Parliament, Energy Mix, Shrinking Armies
• Visual Capitalist: Working Hrs
• Others: Easing Graph, Animated Calendar
MayViz Submissions
• Week 1: View
• Week 2: View
• Week 3: View
• Week 4: View
========================

Ian_Stuart_Rupe · ‎01-11-2022

No

smpa01 · ‎12-23-2021

@Ian_Stuart_Rupe I can understand the feeling. PQ has performance issues and the performance is nowhere near to SQL's performance.

PQ is largely used for data transformation. If you are bringing the tables from a SQL server and if you have an option, do all the transformation server-side and only use PQ to transport those tables.

If you have tables from multiple sources (e.g. SQL and sharepoint) it is recommended that you build a data model and use DAX to genrate the output.

DAX is blazing fast (if not as fast as SQL) and it will not disappoint you.

There are also known tweaks described here that makes significant difference to PQ's performance. But for large dataset I did not find PQ's performance satisfactory.

========================
Did I answer your question? Mark my post as a solution!
Proud to be a Super User
My Custom Visualization Projects
• Plotting Live Sound: Live Sound
• Beautiful News: Women in Parliament, Energy Mix, Shrinking Armies
• Visual Capitalist: Working Hrs
• Others: Easing Graph, Animated Calendar
MayViz Submissions
• Week 1: View
• Week 2: View
• Week 3: View
• Week 4: View
========================