Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
lojith
Frequent Visitor

Help with optimal design

Hi experts,

 

I have a requirement to compare measure columns in 2 tables and report how many records are matching. The 2 tables can vary in number of records, so there will be 3 cases. 

1. Records counts match between 2 tables

2. Number of records more in first table

3. Number of records more in second table

 

Sample is given below for better understanding. 

In the first case, record counts match but first record doesnt match because Col2 is different. Second case, record counts dont match and only one record match, similar for the third case.

Case 1 Table 1  
      Table 2  
  IdCol1Col2 IdCol1Col2
  1002535 1002540
  1011020 1011020
         
Case 2 Table 1  
      Table 2  
  IdCol1Col2 IdCol1Col2
  1002535 1002540
  1011020 1011020
  1022020    
         
Case 3 Table 1  
      Table 2  
  IdCol1Col2 IdCol1Col2
  1002535 1002540
  1011020 1011020
      1022020

 

To take care all of the 3 cases, this is the method I have implemented.

 

1. Do a left outer join between 2 tables, so that all the records from first table is retrieved (Merge).

2. Do a right outer join between 2 tables, so that all the records from second table is retrieved (Merge).

3. Append the two Merge datasets and remove duplicates.

4. Added conditional columns to compare each of the columns between Table 1 and Table 2.

5. Created visuals.

 

This worked fine for tables with counts in the range 500k.

 

Now there is a set of tables which have counts in millions around 5 M. As a result of 2 outer joins and a append operation, performance is getting affected.

Wanted to check if there is a better way to implement this logic. Does doing a full outer join be better instead of 2 outer joins?

 

Thanks.

1 ACCEPTED SOLUTION
rajendraongole1
Super User
Super User

Hi @lojith - as @muhammad_786_1 mentioned you can try below approaches like full outer join. I am adding few points on optimization part 

Use DAX measures to calculate matching, non-matching, or extra records between the tables without merging. Filter out matching records early to avoid large append operations in power query editor itself.

These strategies should help improve the performance, especially when dealing with tables in the range of 5 million rows.

 

solution threads attached FYR:

Solved: Help needed/Tricks to handle huge data in power bi - Microsoft Fabric Community

Solved: What would be the best way to load a massive amoun... - Microsoft Fabric Community

Solved: Speed up refresh of 2 million rows when using an I... - Microsoft Fabric Community

 

Hope this helps.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





View solution in original post

8 REPLIES 8
DoriFussmann
New Member

Did you check www.craftedbi.com? has easy to use templates in Power BI, and kind of makes the entire design a simple plug-and-play. Check it out.

rajendraongole1
Super User
Super User

Hi @lojith - as @muhammad_786_1 mentioned you can try below approaches like full outer join. I am adding few points on optimization part 

Use DAX measures to calculate matching, non-matching, or extra records between the tables without merging. Filter out matching records early to avoid large append operations in power query editor itself.

These strategies should help improve the performance, especially when dealing with tables in the range of 5 million rows.

 

solution threads attached FYR:

Solved: Help needed/Tricks to handle huge data in power bi - Microsoft Fabric Community

Solved: What would be the best way to load a massive amoun... - Microsoft Fabric Community

Solved: Speed up refresh of 2 million rows when using an I... - Microsoft Fabric Community

 

Hope this helps.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





Hi @rajendraongole1 , @muhammad_786_1 

Just wondering if there is a way to import the 2 tables using Power Query and do Full outer using DAX? If there is an option, will that be faster?

Thanks.

Hi @lojith -Performing a Full Outer Join using DAX is not supported in DAX, as DAX does not offer direct support for traditional join types like SQL or Power Query does. However, you can import the two tables in Power Query, bring them into Power BI, and then use DAX to compare and count matches.

This approach can sometimes be faster than performing the full outer join directly in Power Query, especially if the tables are large and you want to avoid complex merging in Power Query.

 

Hope this helps.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





Thanks @rajendraongole1 

Thanks @rajendraongole1 for your inputs. Will try out.

muhammad_786_1
Super User
Super User

Hi @lojith 


You should use a Full Outer Join to retrieve all records from both tables in one step instead of performing two joins (left outer and right outer) and then appending them. This approach avoids the need for appending and duplicate removal, which can be costly for large datasets.

 

Additionally, you should ensure that the "common column" like ID used for the join is "indexed". Indexing helps to speed up the join process, improving overall performance.

 

You can also consider using DirectQuery to avoid importing the full dataset into Power BI, depending on your requirements. This allows Power BI to query data directly from the source without loading large datasets into memory, optimizing performance for very large tables.

Best Regards,
Muhammad Yousaf

LinkedIn

Thanks @muhammad_786_1. Will try out the options mentioned and let you know.

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors
Top Kudoed Authors