Solved: Re: Help with optimal design

lojith · ‎09-06-2024

Hi experts,

I have a requirement to compare measure columns in 2 tables and report how many records are matching. The 2 tables can vary in number of records, so there will be 3 cases.

1. Records counts match between 2 tables

2. Number of records more in first table

3. Number of records more in second table

Sample is given below for better understanding.

In the first case, record counts match but first record doesnt match because Col2 is different. Second case, record counts dont match and only one record match, similar for the third case.

Case 1	Table 1
				Table 2
	Id	Col1	Col2	Id	Col1	Col2
	100	25	35	100	25	40
	101	10	20	101	10	20

Case 2	Table 1
				Table 2
	Id	Col1	Col2	Id	Col1	Col2
	100	25	35	100	25	40
	101	10	20	101	10	20
	102	20	20

Case 3	Table 1
				Table 2
	Id	Col1	Col2	Id	Col1	Col2
	100	25	35	100	25	40
	101	10	20	101	10	20
				102	20	20

To take care all of the 3 cases, this is the method I have implemented.

1. Do a left outer join between 2 tables, so that all the records from first table is retrieved (Merge).

2. Do a right outer join between 2 tables, so that all the records from second table is retrieved (Merge).

3. Append the two Merge datasets and remove duplicates.

4. Added conditional columns to compare each of the columns between Table 1 and Table 2.

5. Created visuals.

This worked fine for tables with counts in the range 500k.

Now there is a set of tables which have counts in millions around 5 M. As a result of 2 outer joins and a append operation, performance is getting affected.

Wanted to check if there is a better way to implement this logic. Does doing a full outer join be better instead of 2 outer joins?

Thanks.

rajendraongole1 · ‎09-06-2024

Hi @lojith - as @muhammad_786_1 mentioned you can try below approaches like full outer join. I am adding few points on optimization part

Use DAX measures to calculate matching, non-matching, or extra records between the tables without merging. Filter out matching records early to avoid large append operations in power query editor itself.

These strategies should help improve the performance, especially when dealing with tables in the range of 5 million rows.

solution threads attached FYR:

Solved: Help needed/Tricks to handle huge data in power bi - Microsoft Fabric Community

Solved: What would be the best way to load a massive amoun... - Microsoft Fabric Community

Solved: Speed up refresh of 2 million rows when using an I... - Microsoft Fabric Community

Hope this helps.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

View solution in original post

DoriFussmann · ‎12-23-2024

Did you check www.craftedbi.com? has easy to use templates in Power BI, and kind of makes the entire design a simple plug-and-play. Check it out.

rajendraongole1 · ‎09-06-2024

Hi @lojith - as @muhammad_786_1 mentioned you can try below approaches like full outer join. I am adding few points on optimization part

Use DAX measures to calculate matching, non-matching, or extra records between the tables without merging. Filter out matching records early to avoid large append operations in power query editor itself.

These strategies should help improve the performance, especially when dealing with tables in the range of 5 million rows.

solution threads attached FYR:

Solved: Help needed/Tricks to handle huge data in power bi - Microsoft Fabric Community

Solved: What would be the best way to load a massive amoun... - Microsoft Fabric Community

Solved: Speed up refresh of 2 million rows when using an I... - Microsoft Fabric Community

Hope this helps.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

lojith · ‎09-06-2024

Hi @rajendraongole1 , @muhammad_786_1

Just wondering if there is a way to import the 2 tables using Power Query and do Full outer using DAX? If there is an option, will that be faster?

Thanks.

rajendraongole1 · ‎09-06-2024

Hi @lojith -Performing a Full Outer Join using DAX is not supported in DAX, as DAX does not offer direct support for traditional join types like SQL or Power Query does. However, you can import the two tables in Power Query, bring them into Power BI, and then use DAX to compare and count matches.

This approach can sometimes be faster than performing the full outer join directly in Power Query, especially if the tables are large and you want to avoid complex merging in Power Query.

Hope this helps.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

lojith · ‎09-06-2024

Thanks @rajendraongole1

lojith · ‎09-06-2024

Thanks @rajendraongole1 for your inputs. Will try out.

muhammad_786_1 · ‎09-06-2024

Hi @lojith

You should use a Full Outer Join to retrieve all records from both tables in one step instead of performing two joins (left outer and right outer) and then appending them. This approach avoids the need for appending and duplicate removal, which can be costly for large datasets.

Additionally, you should ensure that the "common column" like ID used for the join is "indexed". Indexing helps to speed up the join process, improving overall performance.

You can also consider using DirectQuery to avoid importing the full dataset into Power BI, depending on your requirements. This allows Power BI to query data directly from the source without loading large datasets into memory, optimizing performance for very large tables.

Best Regards,
Muhammad Yousaf

lojith · ‎09-06-2024

Thanks @muhammad_786_1. Will try out the options mentioned and let you know.

Help with optimal design

Helpful resources

FabCon Global Hackathon

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta

Help with optimal design

Helpful resources

FabCon Global Hackathon

FabCon Atlanta 2026

How to Get Your Question Answered Quickly