Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
AlexMB
Regular Visitor

Confused about joins

I'm used to writing SQL queries, but new to Power Query.

 

I'm struggling a bit to break out of my habbits.

 

I have a selection of one table (T1), that results in a few hundred lines.

 

I need to join with a different table (T2), which contains several million lines.

 

Using SQL I can write a LEFT JOIN and only select the lines I want from T2 based on my selection of T1.

 

In Power Query, I have to first import the whole of T2 and first then can I join (merge) with T1

But then I can't remove T2 and I'm stuck with importing the whole big table, even though I only need a few lines.

 

What could I do differently? Unfortunately I don't have the option of creating views on the DB itself.

1 ACCEPTED SOLUTION

Hi @AlexMB ,

 

You're thinking too much in the moment - think end-product:

 

Once you've done your T1/T2 merge, you right-click on the T2 query and UNCHECK 'Enable Load'. This prevents this query having to be run when you actually refresh your report in the service, as it doesn't need to be materialised for the data model. This just leaves your merged T1 query to be run which, as you can see, has all the details it needs within its folded native query to produce your merged T1 in the model.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

16 REPLIES 16
smpa01
Super User
Super User

@AlexMB if the tables already exists on SQL, do the server-side transformayion and bring the transformed table to PQ. In my experience, no query folding has ever matched the server side transformation performance.

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

I completely agree with this. However, in most cases I'm not responsible for the DB's I'm querying and getting anything done server side is usually an up hill battle, hence wanting to solve as much as possible with queries.

 

I know this is not the perfect situation, but we work with what we have, right? 🙂

@AlexMB just being curious. If you are fetching the tables from SQL that means you have READ access which also enables you to write SQL queries, that does not affect the original tables structures in SQL and it also does not require involving a DBA of the SQL.  I am probably missing something but I am trying to understand what prevents you from writing the SQL queries at this point.  

select a.colA,b.colC from t1 a
left join t2 b on a.colA=b.colA
left join .........

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Ah, sorry, I was somewhere else in my head and misread your suggestion. I thought you meant set up views server side, which I've seen suggested before.

 

I don't mind at all to build the SQL queries, but as initially stated I've gotten the impression that this approach is frowned upon.

 

I kind of see why, as well, since the people I tend to collaborate with do not typically know SQL as well as they know PQ.

@smpa01 - I think based on the original question and subsequent reply (here), @AlexMB was following 'best practice' to let PQ build the query to ensure folding rather than a native query. 

 

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x
KNP
Super User
Super User

@AlexMB - Unfortunately, there's really no "one size fits all" approach.

 

I agree, that 'best practice' would suggest to let Power BI handle the query/query folding as it knows the most efficient way to do this. This goes against everything I believe, I come from a SQL background also.

 

I like the option that @BA_Pete mentioned about forcing the query folding, however, I tested this just now, for a scenario I'm working with, and it seems that option errors as soon as your native query has joins to other tables. 

 

Check out this post on stack overflow, specifically, the comments from smpa01 about the Table Variable. It may be of use. (I think it is the same @smpa01 that is here in the PBI community)

 

My advice for what it's worth...

  • Do everything as close to the source as you can.
  • ALWAYS use import if your query is acceptable in terms of performance, the issues you can face with Direct Query in both PQ and DAX are numerous and annoying.
  • If you have to use Direct Query, use aggregation tables where possible and test they are being hit (easy to do in DAX Studio).

 

The typical pattern I follow for my queries, if experience tells me that 'best practice' would be slow, is to keep it to as few steps as possible.

e.g.

let
    RunSQL = Sql.Database("MyServer", "MyDatabase", [Query=MyQuery])
in
    RunSQL

Where 'MyQuery' is my SQL query pasted into the advanced editor of another query (keeps things tidy and easier to edit).

 

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x

@KNP ,

 

I like this idea of having a dedicated query just for the SQL native query. One of my gripes around native queries is that the SQL work is 'hidden' in the M query and difficult to reference or reuse.

 

One question:

- Presuming the native SQL query (T1) is not loaded to the model, and that the query that references it as [Query=MyQuery] (T2) fully folds, does this solve the problem of merges on T2 breaking folding?

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




@BA_Pete - I'm not sure I follow what you mean. Can you elaborate? 

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x
AlexMB
Regular Visitor

First

Wait, so you're saying write the query as SQL? From all I can read this is considered bad form, hence me avoiding it.

 

I'd much rather write an SQL query than use Power Query, but I'm doing my best to learn the correct way of doing things in PBI.

 

Second

Yes, I know I can filter down, but in this case that still leaves me with a lot more lines than I need. I was hoping there was a more efficient approach, similar to writing a join. Especially since my scenario involves several big tables, not just two.

CNENFRNL
Community Champion
Community Champion

Toss away those misleading things telling you to avoid sql in PQ!  Sql and PQ aren't natual enemies.

 

Frankingly speaking, nothing is better than sql in terms of performance when it comes to tranditional joins (inner/outer (left/right)/anti-joins); not to mention with huge tables. The advantage of PQ lies in the transformation processing of dataset, for instance, to replace part of a record, aggregation based on specific conditions, etc.


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

Hi @AlexMB ,

 

@dgurgel 's answer is somewhat outdated, insofar as you can maintain query folding with a native SQL query by using the [EnableFolding=true] argument in the Value.NativeQuery function.

 

However, I agree that they are generally bad form, but usually due to transformations being 'hidden' from other devs, and the fact that not all PBI devs can read/write SQL to a high standard.

 

If you are using a source that is foldable, then you just bring in your T2, filter it down to only the rows/columns that you want for the merge, then right-click on the final step of the query and make sure 'View Native Query' is not greyed-out. This query is being folded back to the source.

BA_Pete_0-1644496742643.png

 

Do the same on the T1 query step before you perform your merge to ensure folding is also complete on this query then, when you perform the merge, the most efficient SQL will be sent to the source to materialise the result.

 

Have a go at performing transformations/filters etc. then select 'View Native Query' at each step and you'll see what I mean.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Thanks for your reply.

 

This is the part that confuses me:

"the most efficient SQL will be sent to the source to materialise the result"

 

If I look at the native queries, as you suggest, it's clear that PBI is sending both a query for the "pure" T2 as well as for the joined T1+T2 table.

 

It seems to me that T2 is actually being queried twice, if I use a merge.

 

Maybe I'm too stuck on wanting to minimize load, but I really like to keep my queries lean and efficient.

 

I keep reading that PBI converts Power Query to the most effective SQL, but I don't really see that. Not with the approach I've taken, at least.

Hi @AlexMB ,

 

You're thinking too much in the moment - think end-product:

 

Once you've done your T1/T2 merge, you right-click on the T2 query and UNCHECK 'Enable Load'. This prevents this query having to be run when you actually refresh your report in the service, as it doesn't need to be materialised for the data model. This just leaves your merged T1 query to be run which, as you can see, has all the details it needs within its folded native query to produce your merged T1 in the model.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!





@BA_Pete wrote:

Once you've done your T1/T2 merge, you right-click on the T2 query and UNCHECK 'Enable Load'. This prevents this query having to be run when you actually refresh your report in the service, as it doesn't need to be materialised for the data model.


There it is! That's exactly what I was missing.

 

Thanks so much. After doing this, my refresh is much, much lighter.

Okay. So you either use the second method or use DirectQuery instead of importation.

 

Don't worry PowerQuery only loads the first 1000 rows to work with, once you're done with your transformations, the whole table will be imported into the file.


Learn about filters WITH parameters, totally different than a simple filter. It will reduce the number of rows refreshed in the file.

 

Or just use DirectQuery and call it a day. DirectQuery you don't have to worry about the number of tables or rows you have. The only problem is that your visualizations will take longer to refresh because the PBI file will query the database every single time you do something.

dgurgel
Regular Visitor

I think of two possible solutions:

 

First - You import your table with SQL instructions. This makes all the transformations run before importing it to PowerQuery, however, you either do all transformations this way or none, because you will lose the power of the native query. Before loading your table, open the "advanced options", where you can add your code.

 

Second - Filter with parameters. Parameters are great when you need to filter large data, so add a filter in your second table with only those rows you need. Instead of loading, let's say, 10 million rows to just select 10 thousand, with this method you will only load those 10 thousand.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors