Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
rnola16
Advocate II
Advocate II

Basic INNER/LEFT joins in Data Model

Original SQL:

 

SELECT

A.MEMBER_NBR,

SUM( CASE WHEN A.AID_ID IN ('1','2') THEN 1 ELSE 0 END),

COUNT(DISTINCT(C.SSN),

SUM(B.AMOUNT)

FROM

A inner join B  ON ( A.REC_ID = B.REC_ID )

inner join C ON ( C.SSN=A.SSN)

WHERE

A.MEMBER_NBR IN 'parameter1' and B.SVC_DT >= 'parameter2' group by 1.

 

The above is my query I have in Business Objects, where upon execution the calculation is applied on the data source.

 

I tried doing this in PBI, using direct query bringing in selected columns only with the 3 tables ( A,B,C). Each of these tables have billions of records. How do I setup the data model and establish inner joins between tables ?

  tried using - Create new table and NATURALINNERJOIN() - but that blew up w message exceeding million rows.  

  tried using - 'Manage Relationships' creating joins - but the result was not as I get from the Original SQL,

  tried using - 'Merge queries' - not expected results, takes too long to merge data for a basic sql.

 

I'd like the joins conditions and calculations to be applied on the data source and not after getting results in PBI. What should be the ideal method when working with huge tables. ? The process of creating queries in BusObj/tableau/SQL is a straight forward approach. When in PBI, its challenging to work with big tables and creating basic joins.

 

Thanks and appreaciate the followup.

2 ACCEPTED SOLUTIONS
Akash_Varuna
Community Champion
Community Champion

Hi @rnola16 Try pushing the aggregations and everything to database here are some of the tips please try these out

Push Joins and Aggregations to the Database

  • Use a custom SQL query in DirectQuery mode:

 

SELECT A.MEMBER_NBR,
       SUM(CASE WHEN A.AID_ID IN ('1', '2') THEN 1 ELSE 0 END) AS Aid_Count,
       COUNT(DISTINCT C.SSN) AS Unique_SSN,
       SUM(B.AMOUNT) AS Total_Amount
FROM A
INNER JOIN B ON A.REC_ID = B.REC_ID
INNER JOIN C ON A.SSN = C.SSN
WHERE A.MEMBER_NBR IN ('parameter1') AND B.SVC_DT >= 'parameter2'
GROUP BY A.MEMBER_NBR

 

  • This minimizes data movement and uses the database engine for performance.

Avoid Power Query Joins for Large Tables

  • Let the database handle joins instead of using Power BI's "Merge Queries" feature, which struggles with billions of rows.

Use Pre-Aggregated Data

  • Preprocess and import only aggregated results to reduce data size.

DirectQuery Best Practices

  • Set up relationships and avoid row-by-row calculations.
  • Use parameters to filter data at the source.
    If this post helped please do give a kudos and accept this as a solution
    Thanks In Advance

View solution in original post

Great points, @Akash_Varuna  I completely agree that pushing joins and aggregations to the database is the best approach.

 

@rnola16 Additionally, I'd suggest ensuring query folding is happening in Power Query to maximize performance. You can check this by right-clicking a step in Power Query → 'View Native Query.' If it's grayed out, Power BI is handling the computation instead of the database, which can cause performance issues.

 

Another optimization is using partitioned tables in the database if working with billions of rows, as this will help reduce query execution time.

 

- Optimize DirectQuery Models

https://learn.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance 

- Query Folding in Power Query

https://learn.microsoft.com/en-us/power-query/query-folding-basics

If this solution was helpful, please accept it as a solution or give kudos to help other community members.

View solution in original post

11 REPLIES 11
v-ssriganesh
Community Support
Community Support

Hi @rnola16,
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.

v-ssriganesh
Community Support
Community Support

Hi @rnola16,
Could you please confirm if your query have been resolved by the solution provided by? If so, please mark it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

rnola16
Advocate II
Advocate II

Thank you for the responses @ArwaAldoud @Akash_Varuna .

We have been using BusObj for a while and users had the ability to query on WebI using the semantic layer 'universe' which is designed on top of the database ( contains 1000 tables). With a shift to PBI, can we make a semantic layer and let users create query and run of their choice of tables and fields ? From what has been explained above it looks like the devloper need to build custom SQL for each requirement ( eliminating the approach of creating joins in PBI, merge queries and managing relationships) and let users choose on the date/measure parameters or filters any from the query.

@rnola16  You're absolutely right about the differences between BusinessObjects (BO) and Power BI when it comes to handling a semantic layer.
Power BI operates differently than BO, but by designing a strong semantic layer with shared datasets, dataflows, or Analysis Services, users can query data flexibly without writing SQL.

I’d love to hear which approache align with your needs and how you solve your issue

Yes, it has been a challenge to provide a solution to the users to query the db on their own with their desired tables n timeperiod. I haven't explored the Azure analysis Services or XMLA yet, is there a doc/blog which explains how to set up a semantic layer for pbi ? Thanks again for all your input. But yes, I'll sure add here how this is resolved.

Hi @rnola16,

Thank you for your detailed follow-up and for engaging with the community. Thanks to @ArwaAldoud and @Akash_Varuna for their insightful suggestions. To help you set up a semantic layer in Power BI and explore Azure Analysis Services (AAS) and XMLA endpoints, here are three key resources that should guide you:

If this information is helpful, please “Accept as solution” and give a "kudos" to assist other community members in resolving similar issues more efficiently.
Thank you.

Hi @rnola16,
I hope this information is helpful. Please let me know if you have any further questions or if you'd like to discuss this further. If this answers your question, please accept it as a solution and give it a 'Kudos' so other community members with similar problems can find a solution faster.
Thank you.

Hi @rnola16 I think that could be achived through some of the azure services like Azure Analysis Services  or Power BI premuim XMLA endpoint for that you would ahve to give build access to users 

andrewsommer
Memorable Member
Memorable Member

We push transformations as far upstream as possible and generally avoid doing any form of transformations in Power Query.  Instead of the Power Query approach from a blank query with a native SQL script.  As you said the process of creating queries in SQL is a straight forward approach so just do it in SQL.  

Akash_Varuna
Community Champion
Community Champion

Hi @rnola16 Try pushing the aggregations and everything to database here are some of the tips please try these out

Push Joins and Aggregations to the Database

  • Use a custom SQL query in DirectQuery mode:

 

SELECT A.MEMBER_NBR,
       SUM(CASE WHEN A.AID_ID IN ('1', '2') THEN 1 ELSE 0 END) AS Aid_Count,
       COUNT(DISTINCT C.SSN) AS Unique_SSN,
       SUM(B.AMOUNT) AS Total_Amount
FROM A
INNER JOIN B ON A.REC_ID = B.REC_ID
INNER JOIN C ON A.SSN = C.SSN
WHERE A.MEMBER_NBR IN ('parameter1') AND B.SVC_DT >= 'parameter2'
GROUP BY A.MEMBER_NBR

 

  • This minimizes data movement and uses the database engine for performance.

Avoid Power Query Joins for Large Tables

  • Let the database handle joins instead of using Power BI's "Merge Queries" feature, which struggles with billions of rows.

Use Pre-Aggregated Data

  • Preprocess and import only aggregated results to reduce data size.

DirectQuery Best Practices

  • Set up relationships and avoid row-by-row calculations.
  • Use parameters to filter data at the source.
    If this post helped please do give a kudos and accept this as a solution
    Thanks In Advance

Great points, @Akash_Varuna  I completely agree that pushing joins and aggregations to the database is the best approach.

 

@rnola16 Additionally, I'd suggest ensuring query folding is happening in Power Query to maximize performance. You can check this by right-clicking a step in Power Query → 'View Native Query.' If it's grayed out, Power BI is handling the computation instead of the database, which can cause performance issues.

 

Another optimization is using partitioned tables in the database if working with billions of rows, as this will help reduce query execution time.

 

- Optimize DirectQuery Models

https://learn.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance 

- Query Folding in Power Query

https://learn.microsoft.com/en-us/power-query/query-folding-basics

If this solution was helpful, please accept it as a solution or give kudos to help other community members.

Helpful resources

Announcements
June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

May 2025 Monthly Update

Fabric Community Update - May 2025

Find out what's new and trending in the Fabric community.