Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
Gabry
Super User
Super User

Optimizing merge operation for large table

Hello everyone,

I'm currently working in Power query with a fact table that contains around 200 million rows. My goal is to perform a merge operation to incorporate IDs from a dimension table. Given the size of the data, I'm concerned about performance and efficiency. (I tried and it took too many hours, even if I set the enhanced compute engine on, using PPU).

I need to perform multiple merge operations. Merge with dimension tables to bring in the required IDs. 

 

I am wondering if select the columns before merging could improve the performance. So instead of doing just 

 

#"Merged queries 1" = Table.NestedJoin(Source, {"BCode"}, #"dim TB", {"TBCode"}, "TB", JoinKind.LeftOuter)

 

Do this

 

TB= Table.SelectColumns(#"dim TB", {"TBCode", "TBID"}),
#"Merged queries 1" = Table.NestedJoin(Source, {"BCode"}, TB, {"TBCode"}, "TB", JoinKind.LeftOuter)

 

Could this improve performance?

 

Are there any other strategies or best practices that could help optimize the merge operation?

I’d appreciate any insights or suggestions on how to handle large-scale merges more efficiently. Does selecting only necessary columns prior to merging generally provide significant benefits? Are there other optimization techniques you would recommend?

 

Thanks in advance for your help!

2 REPLIES 2
Anonymous
Not applicable

Yes, selecting only the columns you need on both queries will usually help.  Ensuring that your join columns are both integers will also help. What will help you the most is if both of your ID columns are sorted the same way, then instead of using Table.NestedJoin, you can use Table.Join, and add the final parameter, JoinAlgorithm.SortMerge. I assure you that this will speed up your query significantly. You will need to rename one of the join columns so you don't end up with duplicate column names after the join. 

 

--Nate

Ok I Will try tò select the columns as I wrote above, I'll check performance. But as I read on Chris web blog It shouldn't change much.

I can't use sort merge algorithm as I Need tò join the fact table with many different dim tables. Also I can't use integers because I use text codes to bring in IDs. It's sad that there aren't any other options tò improve performance 

Mark the code columns as Key could make any differenze?

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.