Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The FabCon + SQLCon recap series starts April 14th at 8am Pacific. If you’re tracking where AI is going inside Fabric, this first session is a can't miss. Register now

Reply
Gabry
Super User
Super User

Optimizing merge operation for large table

Hello everyone,

I'm currently working in Power query with a fact table that contains around 200 million rows. My goal is to perform a merge operation to incorporate IDs from a dimension table. Given the size of the data, I'm concerned about performance and efficiency. (I tried and it took too many hours, even if I set the enhanced compute engine on, using PPU).

I need to perform multiple merge operations. Merge with dimension tables to bring in the required IDs. 

 

I am wondering if select the columns before merging could improve the performance. So instead of doing just 

 

#"Merged queries 1" = Table.NestedJoin(Source, {"BCode"}, #"dim TB", {"TBCode"}, "TB", JoinKind.LeftOuter)

 

Do this

 

TB= Table.SelectColumns(#"dim TB", {"TBCode", "TBID"}),
#"Merged queries 1" = Table.NestedJoin(Source, {"BCode"}, TB, {"TBCode"}, "TB", JoinKind.LeftOuter)

 

Could this improve performance?

 

Are there any other strategies or best practices that could help optimize the merge operation?

I’d appreciate any insights or suggestions on how to handle large-scale merges more efficiently. Does selecting only necessary columns prior to merging generally provide significant benefits? Are there other optimization techniques you would recommend?

 

Thanks in advance for your help!

2 REPLIES 2
Anonymous
Not applicable

Yes, selecting only the columns you need on both queries will usually help.  Ensuring that your join columns are both integers will also help. What will help you the most is if both of your ID columns are sorted the same way, then instead of using Table.NestedJoin, you can use Table.Join, and add the final parameter, JoinAlgorithm.SortMerge. I assure you that this will speed up your query significantly. You will need to rename one of the join columns so you don't end up with duplicate column names after the join. 

 

--Nate

Ok I Will try tò select the columns as I wrote above, I'll check performance. But as I read on Chris web blog It shouldn't change much.

I can't use sort merge algorithm as I Need tò join the fact table with many different dim tables. Also I can't use integers because I use text codes to bring in IDs. It's sad that there aren't any other options tò improve performance 

Mark the code columns as Key could make any differenze?

Helpful resources

Announcements
New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.

Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

March Power BI Update Carousel

Power BI Community Update - March 2026

Check out the March 2026 Power BI update to learn about new features.