Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
I'm used to writing SQL queries, but new to Power Query.
I'm struggling a bit to break out of my habbits.
I have a selection of one table (T1), that results in a few hundred lines.
I need to join with a different table (T2), which contains several million lines.
Using SQL I can write a LEFT JOIN and only select the lines I want from T2 based on my selection of T1.
In Power Query, I have to first import the whole of T2 and first then can I join (merge) with T1
But then I can't remove T2 and I'm stuck with importing the whole big table, even though I only need a few lines.
What could I do differently? Unfortunately I don't have the option of creating views on the DB itself.
Solved! Go to Solution.
Hi @AlexMB ,
You're thinking too much in the moment - think end-product:
Once you've done your T1/T2 merge, you right-click on the T2 query and UNCHECK 'Enable Load'. This prevents this query having to be run when you actually refresh your report in the service, as it doesn't need to be materialised for the data model. This just leaves your merged T1 query to be run which, as you can see, has all the details it needs within its folded native query to produce your merged T1 in the model.
Pete
Proud to be a Datanaut!
@AlexMB if the tables already exists on SQL, do the server-side transformayion and bring the transformed table to PQ. In my experience, no query folding has ever matched the server side transformation performance.
I completely agree with this. However, in most cases I'm not responsible for the DB's I'm querying and getting anything done server side is usually an up hill battle, hence wanting to solve as much as possible with queries.
I know this is not the perfect situation, but we work with what we have, right? 🙂
@AlexMB just being curious. If you are fetching the tables from SQL that means you have READ access which also enables you to write SQL queries, that does not affect the original tables structures in SQL and it also does not require involving a DBA of the SQL. I am probably missing something but I am trying to understand what prevents you from writing the SQL queries at this point.
select a.colA,b.colC from t1 a
left join t2 b on a.colA=b.colA
left join .........
Ah, sorry, I was somewhere else in my head and misread your suggestion. I thought you meant set up views server side, which I've seen suggested before.
I don't mind at all to build the SQL queries, but as initially stated I've gotten the impression that this approach is frowned upon.
I kind of see why, as well, since the people I tend to collaborate with do not typically know SQL as well as they know PQ.
@smpa01 - I think based on the original question and subsequent reply (here), @AlexMB was following 'best practice' to let PQ build the query to ensure folding rather than a native query.
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
If you found this post helpful, please give Kudos. It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen. If you find my signature vaguely amusing, please give Kudos. | Proud to be a Super User! |
@AlexMB - Unfortunately, there's really no "one size fits all" approach.
I agree, that 'best practice' would suggest to let Power BI handle the query/query folding as it knows the most efficient way to do this. This goes against everything I believe, I come from a SQL background also.
I like the option that @BA_Pete mentioned about forcing the query folding, however, I tested this just now, for a scenario I'm working with, and it seems that option errors as soon as your native query has joins to other tables.
Check out this post on stack overflow, specifically, the comments from smpa01 about the Table Variable. It may be of use. (I think it is the same @smpa01 that is here in the PBI community)
My advice for what it's worth...
The typical pattern I follow for my queries, if experience tells me that 'best practice' would be slow, is to keep it to as few steps as possible.
e.g.
let
RunSQL = Sql.Database("MyServer", "MyDatabase", [Query=MyQuery])
in
RunSQL
Where 'MyQuery' is my SQL query pasted into the advanced editor of another query (keeps things tidy and easier to edit).
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
If you found this post helpful, please give Kudos. It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen. If you find my signature vaguely amusing, please give Kudos. | Proud to be a Super User! |
@KNP ,
I like this idea of having a dedicated query just for the SQL native query. One of my gripes around native queries is that the SQL work is 'hidden' in the M query and difficult to reference or reuse.
One question:
- Presuming the native SQL query (T1) is not loaded to the model, and that the query that references it as [Query=MyQuery] (T2) fully folds, does this solve the problem of merges on T2 breaking folding?
Pete
Proud to be a Datanaut!
@BA_Pete - I'm not sure I follow what you mean. Can you elaborate?
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
If you found this post helpful, please give Kudos. It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen. If you find my signature vaguely amusing, please give Kudos. | Proud to be a Super User! |
First
Wait, so you're saying write the query as SQL? From all I can read this is considered bad form, hence me avoiding it.
I'd much rather write an SQL query than use Power Query, but I'm doing my best to learn the correct way of doing things in PBI.
Second
Yes, I know I can filter down, but in this case that still leaves me with a lot more lines than I need. I was hoping there was a more efficient approach, similar to writing a join. Especially since my scenario involves several big tables, not just two.
Toss away those misleading things telling you to avoid sql in PQ! Sql and PQ aren't natual enemies.
Frankingly speaking, nothing is better than sql in terms of performance when it comes to tranditional joins (inner/outer (left/right)/anti-joins); not to mention with huge tables. The advantage of PQ lies in the transformation processing of dataset, for instance, to replace part of a record, aggregation based on specific conditions, etc.
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
Hi @AlexMB ,
@dgurgel 's answer is somewhat outdated, insofar as you can maintain query folding with a native SQL query by using the [EnableFolding=true] argument in the Value.NativeQuery function.
However, I agree that they are generally bad form, but usually due to transformations being 'hidden' from other devs, and the fact that not all PBI devs can read/write SQL to a high standard.
If you are using a source that is foldable, then you just bring in your T2, filter it down to only the rows/columns that you want for the merge, then right-click on the final step of the query and make sure 'View Native Query' is not greyed-out. This query is being folded back to the source.
Do the same on the T1 query step before you perform your merge to ensure folding is also complete on this query then, when you perform the merge, the most efficient SQL will be sent to the source to materialise the result.
Have a go at performing transformations/filters etc. then select 'View Native Query' at each step and you'll see what I mean.
Pete
Proud to be a Datanaut!
Thanks for your reply.
This is the part that confuses me:
"the most efficient SQL will be sent to the source to materialise the result"
If I look at the native queries, as you suggest, it's clear that PBI is sending both a query for the "pure" T2 as well as for the joined T1+T2 table.
It seems to me that T2 is actually being queried twice, if I use a merge.
Maybe I'm too stuck on wanting to minimize load, but I really like to keep my queries lean and efficient.
I keep reading that PBI converts Power Query to the most effective SQL, but I don't really see that. Not with the approach I've taken, at least.
Hi @AlexMB ,
You're thinking too much in the moment - think end-product:
Once you've done your T1/T2 merge, you right-click on the T2 query and UNCHECK 'Enable Load'. This prevents this query having to be run when you actually refresh your report in the service, as it doesn't need to be materialised for the data model. This just leaves your merged T1 query to be run which, as you can see, has all the details it needs within its folded native query to produce your merged T1 in the model.
Pete
Proud to be a Datanaut!
@BA_Pete wrote:Once you've done your T1/T2 merge, you right-click on the T2 query and UNCHECK 'Enable Load'. This prevents this query having to be run when you actually refresh your report in the service, as it doesn't need to be materialised for the data model.
There it is! That's exactly what I was missing.
Thanks so much. After doing this, my refresh is much, much lighter.
Okay. So you either use the second method or use DirectQuery instead of importation.
Don't worry PowerQuery only loads the first 1000 rows to work with, once you're done with your transformations, the whole table will be imported into the file.
Learn about filters WITH parameters, totally different than a simple filter. It will reduce the number of rows refreshed in the file.
Or just use DirectQuery and call it a day. DirectQuery you don't have to worry about the number of tables or rows you have. The only problem is that your visualizations will take longer to refresh because the PBI file will query the database every single time you do something.
I think of two possible solutions:
First - You import your table with SQL instructions. This makes all the transformations run before importing it to PowerQuery, however, you either do all transformations this way or none, because you will lose the power of the native query. Before loading your table, open the "advanced options", where you can add your code.
Second - Filter with parameters. Parameters are great when you need to filter large data, so add a filter in your second table with only those rows you need. Instead of loading, let's say, 10 million rows to just select 10 thousand, with this method you will only load those 10 thousand.
Check out the October 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
65 | |
61 | |
23 | |
18 | |
12 |