Query Folding are fundamental for an efficient ETL and while working with Power Query Editor you must push the steps that breaks Query Folding to the end. Or do you… If curious, please continue reading...
How to work with Query Folding is explained in this Microsoft documentation, please read it, it contains a lot of good information and as always, wanting to know more about Power Query, please go to one of Chris Webb's BI Blog or Matthew Roche Blog - BI Polar. If you (like me) are verifying that you are not breaking Query Folding by right clicking the Query Steps one-by-one, you shall be aware of the this doesn't show you the whole truth. In fact, Query Folding takes place even though it seems to be broken in a previous step. In the query steps below, right-clicking on the "[Date]" query step (that adds a Date column) seems to break the Query Folding... But it doesn't.
Figure1. Query Folding seems to be broken
But right clicking on the last query-steps as in Figure2 below, we see that some Query Folding are still taking place. If it would have broken the Query Folding, the next-comming Merge and Expand (Join in T-SQL code) would have been executed by the Power Query Engine as M-code and not as T-SQL code in Data Source.
Figure2. Query Folding has not been broken
To confirm this, I'm tracing the SQL server and capture this query below. Please pay attention on both the "select top 10" at the top as well as the "left outer join [SalesLT].[Customer] " at the bottom.
select top 10
[$Outer].[SalesOrderID] as [SalesOrderID],
[$Outer].[OrderDate] as [OrderDate],
[$Outer].[AccountNumber] as [AccountNumber],
[$Outer].[SalesOrderNumber] as [SalesOrderNumber],
[$Outer].[PurchaseOrderNumber] as [PurchaseOrderNumber],
[$Outer].[CustomerID2] as [CustomerID],
[$Outer].[SubTotal] as [SubTotal],
[$Inner].[FirstName] as [SalesLT Customer.FirstName],
[$Inner].[LastName] as [SalesLT Customer.LastName],
[$Inner].[CompanyName] as [SalesLT Customer.CompanyName]
select [_].[SalesOrderID] as [SalesOrderID],
[_].[OrderDate] as [OrderDate],
[_].[SalesOrderNumber] as [SalesOrderNumber],
[_].[PurchaseOrderNumber] as [PurchaseOrderNumber],
[_].[AccountNumber] as [AccountNumber],
[_].[CustomerID] as [CustomerID2],
[_].[SubTotal] as [SubTotal]
from [SalesLT].[SalesOrderHeader] as [$Table]
) as [_]
where [_].[SubTotal] > 1000
) as [$Outer]
left outer join [SalesLT].[Customer] as [$Inner] on ([$Outer].[CustomerID2] = [$Inner].[CustomerID])
Instead of using the legacy Power Query in Power BI Desktop, we can also use the Power Query Editor found in the Power BI Service when using Power BI Dataflows. Using the same query as before, but this time inside a Power BI Dataflow (that uses a more modern UI) we will easily see what's happening.
Figure3. Same query but using Power Query Online
Here we see indications of query folding at the right in the Query Steps window. Green indicates that the query will be evaluated within the data source and Red indicates that the query will be evaluated outside of the data source.
Going back to the captured SQL query listed earlier, we will not find the Date column in the select statement. This means that the Date column must be executed outside of the data source, i.e. by the Power BI Engine. And remember... The message that could be seen in the modern UI used in Power BI Dataflow in Figure 3 says exactly that "This step will be executed outside of the data source".
This is good news and means that query folding are taking place, even thogh you might think you have broken it using Power BI Desktops Power Query Editor.
What we see here is a fantastic work from the power query optimizer that will try to fold as much as possible and then add additional steps itself.
This is very much like me. Whenever possible (and especially if it's a heavy, boring, or time-consuming task), I will try to get someone else or a machine to do it for me 🙂
Using the legacy Power Query editor in Power BI Desktop, it will take a lot of unproductive time to move query step back and forth in the Power Query Editor to try to optimize your query and you need to check for query folding in more than one query step.
My recommendation is to always use the modern Power Query editor (so far only available in Power BI Dataflows) when optimizing your queries for Query Folding efficiency and it's very simple to do this:
Just mark the query you are working with in Power BI Desktop using Ctrl-c
Create a Power BI Dataflow with a blank query
Then past in the query in the Power BI Dataflow editor using Ctrl-v
Eventually we will have the modern editor also in Power BI but I have not checked the time plans for this while writing this blog.
Note1: While tracing the SQL queries when refreshing the query from the Power Query Editor in Power BI Desktop, I could from time to time see the query being folded in the GUI and then after a while it wasn't. And I could from time to time (and this didn't need to follow what I see in the Power Query Editor) see in the Query Trace that the query was folded and then it wasn't. And sometime the query didn't fold in Power Query Editor, but when refreshing the Dataset it was. So there might be a lot of inconsistency in the legacy Power Query Editor in Power BI Desktop. Using Power BI Dataflow is your best frind here...
Note2: Tracing SQL queries can be done in many ways. What I did was to use "Azure Data Studio" connected to the SQL database and then I launched the Profiler. Just search for Profiler in the command pallet. For alternative methods, Chris Webb is your best friend. The reason for me to trace directly on the SQL server was that I wanted a consistent method of seeing what’s being executed on the SQL server side that worked for both Power BI Desktop Power Query as well as for Power BI Service Dataflows