Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
KilleenJeffrey
Frequent Visitor

Referencing Power Query queries by other Power queries

https://docs.microsoft.com/en-us/power-bi/guidance/power-query-referenced-queries

 

This is a question in relationship to the article above.  Understand fully from the article how this works (good article).  This question is about a workaround.  The context of this question is Query 1 from the article is pulling data from a SQL server.  Also the context assumes this workaround would need to be done in the advanced editor.

 

In the article above Query 1 is called by Query 2, Query 3, and Query 4.  Also for context assume there is a Query 5 (not in the article) that merges Query 2, QUery 3, and Query 4 together into Query 5.  In this case as explained in the article Query would pull the data from the SQL server three times.

 

Question: If you took the M Code from Query 2, Query 3, and Query 4 and mashed it into the top of Query 5 would that avoid Query 1 being run three times?  Query 1 would still be referenced three times but all three references would be from within the single new super query called Query 2345 and no longer referenced from three distinct separate queries.  Also what would the effect being of putting this M Statement at the top of QUERY2345...

 

Source = Table.Buffer(Query1),

 

Would the 3 calls to Query 1 within QUERY 2345 use the data in the Table Buffer and only go out to the SQL Server once and not three times? 

 

Also what would the impact be of creating custom functions for Query 2, Query  3, and Query 4?  The reason for the custom functions is it would avoid the problem of duplicate references being used e.g. #"Filter1" = foo in all three original queries.  This would allow a simple cut and paste from the existing functions...

 

1 ACCEPTED SOLUTION

Yes, just understand the size of the tables does have an impact. Table.Buffer() is an "it depends" function when it comes to performance. See this post by Ehren - he is on the MS PQ Product Team. When to use Table.Buffer (microsoft.com)

Please mark one or more of these answers as the solution so this thread can be shown to be resolved.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

4 REPLIES 4
edhans
Super User
Super User

I'll try to answer these @KilleenJeffrey - 

First of all, Power Query works bottom up, not top down. So if the 5 query is loaded, it starts there and works backwards to get what it needs. That is why Query 1 gets called several times. 5 calls 2, 3, and 4, and those each call 1.

Putting all of that code in one superquery will not change how it works. PQ looks at the path of the query and a query refrences isn't viewed as another query as I understand it, so having 3 queries refernces each other in a chain - Q1 -> Q2 -> Q3 will act as if it were 1 query with all of the 3 queries just being steps. The advantage of refrences is so Q1 can be split into Q2, Q3, and Q4, each with their own logic and transformations, then rejoined at the end, or merged, or whatever, without having to rewrite the Q1 logic 4 times. Change Q1's transformations later and it changes all downstream query results.

The other reason I use references is in Excel. Say I have 1 table with 1 set of transformations, but I need to load it into both an Excel table and the Excel data model for Power Pivot. I don't load Q1 at all. I do Q1, then create 2 references - Q2 and Q3. Q2 gets loaded to an Excel table, and Q3 to the data model. There is no performance impact. However, if I later decide to tweak Q2 or Q3 independently, I can, or I can choose to unload one without affecting the other. If you unload a single query in Excel that is loaded to both a table and the data model, it will unload it from both places and then reload it back to the one you want automatically. That trashes everything though because when the table is temporarily removed from either a sheet or the data model, any dependent formulas, relationships, and/or measures are wrecked.

Table.Buffer() will not help you here. It will not take Q1 and store it in RAM, then prevent further calls back to the source. Table.Buffer() will load the entire table in memory vs streaming the contents one record at a time, which is what it normally does. This can help performance in some cases, but can hurt it drastically in others. What it is intended to do is preserve the state of the final results of that query and prevent it from changing - sort order being the most common issue it solves. But be careful. If that query has 3M records and is several GB in size, your machine will slow to a crawl as it brings it in to RAM. 

You should take a look at this video - it will explain a lot and I think you'll find it really useful. How Power Query Thinks: Taking the Mystery Out of Streaming and Query Folding (Video) | Ben Gribaudo



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

That video was incremental helpful - thanks!  Anyone doing hardcore hand coded M code should watch that video.

 

Combining the 5 queries into one super query along with the use of Table.Buffer worked!  It cut the run time on my small test from 285 seconds to 105 seconds.  I can also tell by the way I tested it the first 65 seconds of that 105 seconds was loading the Table.Buffer.  That performance gain is going to become much more significant as build out the real queries with significantly more logic.

 

I used the approach of Custom Functions to create the super query.  It worked perfectly because I was able to directly cut and paste the M code from each of the 5 queries completely unmodified.  This means I can maintain the original query and build the super query.

 

One factor in my environment  is my workstation and the SQL server are in two physically separate locations so each time it pulls data from the server it needs to use an external network - a network that has a lot of traffic during business hours.  That Table.Buffer saved pulling the data down 3 times.  Even though I have 128GB workstation I read that Power BI caps memory usage so I suspect that there may be efficiencies caused by using Table.Buffer everyone warns  about but they are more than offset by avoiding the network transfers.

 

It is going to take awhile to document  this in a teaching mode - I will publish an article in this thread in the next 7-10 days.  But the bottom line is there is a way to have Table.Buffer work with multiple queries to speed things up!

Yes, just understand the size of the tables does have an impact. Table.Buffer() is an "it depends" function when it comes to performance. See this post by Ehren - he is on the MS PQ Product Team. When to use Table.Buffer (microsoft.com)

Please mark one or more of these answers as the solution so this thread can be shown to be resolved.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

I am studying your reply and the video but the meantime the cuurent chains in my case are...

 

Q1 -> Q2 \

Q1 -> Q3 --> Q5

Q1 -> Q4 /

 

FYI I have a Xeon multi-processor workstation with 128GB of RAM

A fair amount of material (reply+video) and will reply later

 

 

 

 

 

 

 

 

 

 

zto understand 

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.