Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
Original SQL:
SELECT
A.MEMBER_NBR,
SUM( CASE WHEN A.AID_ID IN ('1','2') THEN 1 ELSE 0 END),
COUNT(DISTINCT(C.SSN),
SUM(B.AMOUNT)
FROM
A inner join B ON ( A.REC_ID = B.REC_ID )
inner join C ON ( C.SSN=A.SSN)
WHERE
A.MEMBER_NBR IN 'parameter1' and B.SVC_DT >= 'parameter2' group by 1.
The above is my query I have in Business Objects, where upon execution the calculation is applied on the data source.
I tried doing this in PBI, using direct query bringing in selected columns only with the 3 tables ( A,B,C). Each of these tables have billions of records. How do I setup the data model and establish inner joins between tables ?
tried using - Create new table and NATURALINNERJOIN() - but that blew up w message exceeding million rows.
tried using - 'Manage Relationships' creating joins - but the result was not as I get from the Original SQL,
tried using - 'Merge queries' - not expected results, takes too long to merge data for a basic sql.
I'd like the joins conditions and calculations to be applied on the data source and not after getting results in PBI. What should be the ideal method when working with huge tables. ? The process of creating queries in BusObj/tableau/SQL is a straight forward approach. When in PBI, its challenging to work with big tables and creating basic joins.
Thanks and appreaciate the followup.
Solved! Go to Solution.
Hi @rnola16 Try pushing the aggregations and everything to database here are some of the tips please try these out
Push Joins and Aggregations to the Database
SELECT A.MEMBER_NBR,
SUM(CASE WHEN A.AID_ID IN ('1', '2') THEN 1 ELSE 0 END) AS Aid_Count,
COUNT(DISTINCT C.SSN) AS Unique_SSN,
SUM(B.AMOUNT) AS Total_Amount
FROM A
INNER JOIN B ON A.REC_ID = B.REC_ID
INNER JOIN C ON A.SSN = C.SSN
WHERE A.MEMBER_NBR IN ('parameter1') AND B.SVC_DT >= 'parameter2'
GROUP BY A.MEMBER_NBR
Avoid Power Query Joins for Large Tables
Use Pre-Aggregated Data
DirectQuery Best Practices
Great points, @Akash_Varuna I completely agree that pushing joins and aggregations to the database is the best approach.
@rnola16 Additionally, I'd suggest ensuring query folding is happening in Power Query to maximize performance. You can check this by right-clicking a step in Power Query → 'View Native Query.' If it's grayed out, Power BI is handling the computation instead of the database, which can cause performance issues.
Another optimization is using partitioned tables in the database if working with billions of rows, as this will help reduce query execution time.
- Optimize DirectQuery Models
https://learn.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance
- Query Folding in Power Query
https://learn.microsoft.com/en-us/power-query/query-folding-basics
If this solution was helpful, please accept it as a solution or give kudos to help other community members.
Hi @rnola16,
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.
Hi @rnola16,
Could you please confirm if your query have been resolved by the solution provided by? If so, please mark it as the solution. This will be helpful for other community members who have similar problems to solve it faster.
Thank you.
Thank you for the responses @ArwaAldoud @Akash_Varuna .
We have been using BusObj for a while and users had the ability to query on WebI using the semantic layer 'universe' which is designed on top of the database ( contains 1000 tables). With a shift to PBI, can we make a semantic layer and let users create query and run of their choice of tables and fields ? From what has been explained above it looks like the devloper need to build custom SQL for each requirement ( eliminating the approach of creating joins in PBI, merge queries and managing relationships) and let users choose on the date/measure parameters or filters any from the query.
@rnola16 You're absolutely right about the differences between BusinessObjects (BO) and Power BI when it comes to handling a semantic layer.
Power BI operates differently than BO, but by designing a strong semantic layer with shared datasets, dataflows, or Analysis Services, users can query data flexibly without writing SQL.
I’d love to hear which approache align with your needs and how you solve your issue
Yes, it has been a challenge to provide a solution to the users to query the db on their own with their desired tables n timeperiod. I haven't explored the Azure analysis Services or XMLA yet, is there a doc/blog which explains how to set up a semantic layer for pbi ? Thanks again for all your input. But yes, I'll sure add here how this is resolved.
Hi @rnola16,
Thank you for your detailed follow-up and for engaging with the community. Thanks to @ArwaAldoud and @Akash_Varuna for their insightful suggestions. To help you set up a semantic layer in Power BI and explore Azure Analysis Services (AAS) and XMLA endpoints, here are three key resources that should guide you:
If this information is helpful, please “Accept as solution” and give a "kudos" to assist other community members in resolving similar issues more efficiently.
Thank you.
Hi @rnola16,
I hope this information is helpful. Please let me know if you have any further questions or if you'd like to discuss this further. If this answers your question, please accept it as a solution and give it a 'Kudos' so other community members with similar problems can find a solution faster.
Thank you.
Hi @rnola16 I think that could be achived through some of the azure services like Azure Analysis Services or Power BI premuim XMLA endpoint for that you would ahve to give build access to users
We push transformations as far upstream as possible and generally avoid doing any form of transformations in Power Query. Instead of the Power Query approach from a blank query with a native SQL script. As you said the process of creating queries in SQL is a straight forward approach so just do it in SQL.
Hi @rnola16 Try pushing the aggregations and everything to database here are some of the tips please try these out
Push Joins and Aggregations to the Database
SELECT A.MEMBER_NBR,
SUM(CASE WHEN A.AID_ID IN ('1', '2') THEN 1 ELSE 0 END) AS Aid_Count,
COUNT(DISTINCT C.SSN) AS Unique_SSN,
SUM(B.AMOUNT) AS Total_Amount
FROM A
INNER JOIN B ON A.REC_ID = B.REC_ID
INNER JOIN C ON A.SSN = C.SSN
WHERE A.MEMBER_NBR IN ('parameter1') AND B.SVC_DT >= 'parameter2'
GROUP BY A.MEMBER_NBR
Avoid Power Query Joins for Large Tables
Use Pre-Aggregated Data
DirectQuery Best Practices
Great points, @Akash_Varuna I completely agree that pushing joins and aggregations to the database is the best approach.
@rnola16 Additionally, I'd suggest ensuring query folding is happening in Power Query to maximize performance. You can check this by right-clicking a step in Power Query → 'View Native Query.' If it's grayed out, Power BI is handling the computation instead of the database, which can cause performance issues.
Another optimization is using partitioned tables in the database if working with billions of rows, as this will help reduce query execution time.
- Optimize DirectQuery Models
https://learn.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance
- Query Folding in Power Query
https://learn.microsoft.com/en-us/power-query/query-folding-basics
If this solution was helpful, please accept it as a solution or give kudos to help other community members.
User | Count |
---|---|
85 | |
79 | |
64 | |
52 | |
46 |
User | Count |
---|---|
101 | |
44 | |
41 | |
39 | |
36 |