Don't miss your chance to take the Fabric Data Engineer (DP-600) exam for FREE! Find out how by attending the DP-600 session on April 23rd (pacific time), live or on-demand.
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
Hello,
I'm using Excel to run Power Query. I followed Ken's design approach to use a Staging query that builds multiple re-shaped tables. I have one staging sql query (connection only) feeding 4 re-shape queries (load to table).
When I refesh all, I notice that every sub query is contacting the database. Is that normal? Is there a way to point my sub queries to the staging query's result instead?
Or am I not understanding something? I thought the stagging query would only be executed once, then other queries would execute from those results.
My queries are something like this:
SQL_Staging_Qry
let
src=SQL.Database("srvnm", "catalog", [Query="SELECT * FROM foo"])
in
src
Sub_Qry
let
sql_src=SQL,
// shape data with Table.SelectRows(sql_src, ...), Table.AddColumn(sql_src, ...), etc
result = reshaped_data
in
result
Solved! Go to Solution.
Hi, I think I understand. You're point comes from the perspective of what M language does and how the compiler interprets duplicate vs reference.
My perspective comes from SQL DB optimization and reducing the total number of queries to the database. As over the wire latency maters much more than processor time.
What I found is... that Power Query does in fact send multiple SQL queries per sub query when the stagging query is set to 'Connection Only'.
Proof / Repro
I have detected exactly the same number of SQL Select statements hit the wire as there are sub queries.
Potential Workaround Found
Only one SQL query will hit the wire and all other sub queries will now wait for completion.
Further Discusion
Ideally, resultant set should be used, but M Language cannot tell where the data comes from. Is it being cached, is it dynamically created, or is it coming from a remote system? Without that, M Language cannot effectively provide step optimization techniques as claimed with 'duplicate' vs 'reference'. The combination of missunderstanding of the capabilities with M Language optimziation and bad architectual advice lead me to this state. Stanging Query concept only works if you 'cache' the resultant set somewhere. Either in the DataModel or in a table.
Hi @jasbro,
I think your Staging Query will be run as many times as your sub-query, if the staging query is used as a duplicate on top of your sub-query.
To resolve trying using your Staging Query as a reference before building your sub-query on top of it
Hi @Thejeswar,
What do you mean, "try using your Stagging Query as a reference"?
I found another Ken article about how to Reference other Power Queries, but when I followed along, all it did was create a new query with syntax similar to:
let
Source = SQL_Staging_Qry
in
Source
This is what I already have.
However, I did come across another article, difference between Reference & Duplicate. This makes me think that maybe the UI isn't showing me what is really going on.
When I hit Ctrl+Alt+F5, I get the following
You see, each sub query is displaying a "Waiting for "{DB server name redacted} So my question is, am I hitting the database 3 times or am I hitting it once with my 'reference query'?
Hi @jasbro,
Yes as the post that you showed says, the Duplicate will re-run the code while reference will just re-use the result set
FYR, I am just giving some sample M Code for Duplicate and Reference
M-Code of my Source:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dZBLCsIwEECvErIuNjOTpFmLay9QulC0KtIO1N4fU5oPjLhICO89wiR9r0+X9a4bfeYDj+rK/H7Nj08EKq5/7piOalx4UhPP61OtrG5bPjS9hhZaNBBiimjTZT9w2/cas7DOl1pAH0yqKQuyrtQCgkGXclsNlVxAIB9S7rLxrk4uIBrElPtsgGouIEJ8wZ532YSuvlRAMpRvD3VMLLmARF38x+EL", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Input = _t, #" " = _t, Output = _t, #"(blank)" = _t, #"(blank).1" = _t, #"(blank).2" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Input", type text}, {" ", type text}, {"Output", type text}, {"(blank)", type text}, {"(blank).1", type text}, {"(blank).2", type text}}),
in
#"Changed Type"M-Code for Duplicated Query:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dZBLCsIwEECvErIuNjOTpFmLay9QulC0KtIO1N4fU5oPjLhICO89wiR9r0+X9a4bfeYDj+rK/H7Nj08EKq5/7piOalx4UhPP61OtrG5bPjS9hhZaNBBiimjTZT9w2/cas7DOl1pAH0yqKQuyrtQCgkGXclsNlVxAIB9S7rLxrk4uIBrElPtsgGouIEJ8wZ532YSuvlRAMpRvD3VMLLmARF38x+EL", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Input = _t, #" " = _t, Output = _t, #"(blank)" = _t, #"(blank).1" = _t, #"(blank).2" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Input", type text}, {" ", type text}, {"Output", type text}, {"(blank)", type text}, {"(blank).1", type text}, {"(blank).2", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1)
in
#"Added Index"M-Code for Referenced Query:
let
Source = Table1,
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1)
in
#"Added Index"As you can see, the duplicate Query runs through the source a second time apart from the actual Query. But in case of reference, it is just going to do the actions on top of the Source. So it is more likely that your duplicate query will run a separate query against your DB while you reference query will use the output of your source directly.
But when you do a manual Report Refresh, all your queries will be getting loading
The Other important thing that you can see when using a reference query is, when you make changes to your source query, Only your source query will be loaded again on giving apply changes, all the referenced queries will automatically reflect the changes. But here if you are using a duplicate query, you will have to make the changes again in all queries manually
Hi, I think I understand. You're point comes from the perspective of what M language does and how the compiler interprets duplicate vs reference.
My perspective comes from SQL DB optimization and reducing the total number of queries to the database. As over the wire latency maters much more than processor time.
What I found is... that Power Query does in fact send multiple SQL queries per sub query when the stagging query is set to 'Connection Only'.
Proof / Repro
I have detected exactly the same number of SQL Select statements hit the wire as there are sub queries.
Potential Workaround Found
Only one SQL query will hit the wire and all other sub queries will now wait for completion.
Further Discusion
Ideally, resultant set should be used, but M Language cannot tell where the data comes from. Is it being cached, is it dynamically created, or is it coming from a remote system? Without that, M Language cannot effectively provide step optimization techniques as claimed with 'duplicate' vs 'reference'. The combination of missunderstanding of the capabilities with M Language optimziation and bad architectual advice lead me to this state. Stanging Query concept only works if you 'cache' the resultant set somewhere. Either in the DataModel or in a table.
Nice Observation and Explanation ![]()
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 48 | |
| 45 | |
| 41 | |
| 20 | |
| 18 |
| User | Count |
|---|---|
| 69 | |
| 64 | |
| 32 | |
| 31 | |
| 27 |