Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Gabry
Super User
Super User

Optimizing merge operation for large table

Hello everyone,

I'm currently working in Power query with a fact table that contains around 200 million rows. My goal is to perform a merge operation to incorporate IDs from a dimension table. Given the size of the data, I'm concerned about performance and efficiency. (I tried and it took too many hours, even if I set the enhanced compute engine on, using PPU).

I need to perform multiple merge operations. Merge with dimension tables to bring in the required IDs. 

 

I am wondering if select the columns before merging could improve the performance. So instead of doing just 

 

#"Merged queries 1" = Table.NestedJoin(Source, {"BCode"}, #"dim TB", {"TBCode"}, "TB", JoinKind.LeftOuter)

 

Do this

 

TB= Table.SelectColumns(#"dim TB", {"TBCode", "TBID"}),
#"Merged queries 1" = Table.NestedJoin(Source, {"BCode"}, TB, {"TBCode"}, "TB", JoinKind.LeftOuter)

 

Could this improve performance?

 

Are there any other strategies or best practices that could help optimize the merge operation?

I’d appreciate any insights or suggestions on how to handle large-scale merges more efficiently. Does selecting only necessary columns prior to merging generally provide significant benefits? Are there other optimization techniques you would recommend?

 

Thanks in advance for your help!

2 REPLIES 2
Anonymous
Not applicable

Yes, selecting only the columns you need on both queries will usually help.  Ensuring that your join columns are both integers will also help. What will help you the most is if both of your ID columns are sorted the same way, then instead of using Table.NestedJoin, you can use Table.Join, and add the final parameter, JoinAlgorithm.SortMerge. I assure you that this will speed up your query significantly. You will need to rename one of the join columns so you don't end up with duplicate column names after the join. 

 

--Nate

Ok I Will try tò select the columns as I wrote above, I'll check performance. But as I read on Chris web blog It shouldn't change much.

I can't use sort merge algorithm as I Need tò join the fact table with many different dim tables. Also I can't use integers because I use text codes to bring in IDs. It's sad that there aren't any other options tò improve performance 

Mark the code columns as Key could make any differenze?

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors