Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
John_Doe3
Helper II
Helper II

Stuck applying changes

Greetings dear data folks,

 

In my country there is problem.

 

Background:

I work in plant business and I have to gather and process data a lot. One of the things I'm trying to achieve with the program is to combine all the weekly production numbers of all our trial locations in one big fat file. Ergo, lots of appending. These files contain about ~150 rows each (with the exception of one which is ~3.5k). Now, in order to assign segments to certain varieties, I merge a production file containing this information after all appending has been completed. The amount of rows jumps from ~250k to 15 million after this merge, and it takes a bit of time to complete this merge. A little concerning, but so far, so good. 

 

The issue:

Yours truely wants to now merge a file with weekly pricing information per segment into the combined file. I have patiently watched the amount of rows climb, and climb...... and climb. When it hit 250 million rows after half an hour, I decided something might be amiss. I'm not a programmer of any sort, but I'm suspecting I'm putting a bit of strain on the system here.

 

Are there any tips or tricks any of you know that might make this less of a PC-melting experience? Did I engage in bad practices?

 

 

2 ACCEPTED SOLUTIONS
pankajnamekar25
Super User
Super User

Hello @John_Doe3 

You're experiencing a massive row explosion because you're merging two tables that both have multiple rows per key (like segment or week), leading to a many-to-many relationship. This creates a Cartesian product, which multiplies rows instead of just matching them one-to-one. To fix this, you need to clean and aggregate your data before merging—ensure only one row per key in one of the tables, remove duplicates, and only keep necessary columns. Merge in stages and monitor the row count after each step. Pre-aggregating pricing data and turning off intermediate query loads can also help reduce memory usage and speed up performance.

Thanks

 Pankaj Namekar | LinkedIn

If this solution helps, please accept it and give a kudos (Like), it would be greatly appreciated.

 

View solution in original post

Hi @John_Doe3,

 

Thank you for your follow-up and for clearly outlining your current challenge. Your efforts to resolve this are evident, and you are correctly identifying potential causes.

From your description, the significant increase in row count, from approximately 220K to several million suggests a many-to-many join during the merge process. This typically occurs in Power Query when both datasets contain multiple rows with the same key, resulting in a Cartesian product.

To address this, please consider these steps:

 

De-duplicate the key column in your reference table (Segment file):
Verify that the table you are merging (with the "segment" column) contains only one row per unique key (e.g., Variety). Use the "Remove Duplicates" feature in Power Query to ensure this and prevent unintended row multiplication.

 

Check the merge configuration:
When merging, select the Left Outer Join to keep all rows from your primary dataset and bring in matching segment data. Ensure the join key is consistent in both datasets, with matching data types and no extra spaces.

 

Preview the merge results:
Before applying changes, review the row count in the merge preview. If there is a substantial increase, pause and examine the key columns for duplicates or mismatches.

 

Use the merge as a lookup:
If your goal is to add a single "segment" column, reduce the segment reference table to just the join key and segment column. Make sure it is a one-to-one mapping before merging to avoid row expansion.

Your observation about simply wanting to add a column is valid; a one-to-one join should not increase the row count. This indicates the issue is likely due to duplicate or unmatched keys.

You are making excellent progress with Power BI, and these challenges are common even for advanced users.

 

Best regards,
Sahasra
Microsoft Fabric Community Support Team.

View solution in original post

11 REPLIES 11
v-sgandrathi
Community Support
Community Support

Hi @John_Doe3,

 

We wanted to follow up since we haven't heard back from you regarding our last response. We hope your issue has been resolved.

Thank you for being a valued member of the Microsoft Fabric Community Forum!

I'm going to translate the jargon and try a few things. Good answers were given, but I need a bit of time to process this.

Hi @John_Doe3,

 

We wanted to follow up since we haven't heard back from you regarding our last response. We hope your issue has been resolved.

If you need any further assistance, feel free to reach out.

 

Thank you for being a valued member of the Microsoft Fabric Community Forum!

Hi @John_Doe3,

 

As we have not received a response from you yet, I would like to confirm whether you have successfully resolved the issue or if you require further assistance.
Thank you.

HarishKM
Memorable Member
Memorable Member

@John_Doe3 Hey, 
I will prefer to follow below steps to improve my data model in report.

Note : I always implement any sort of ETL part in SQL query only like add a conditional column, any flags , date column, Month names. This will improve my data model. if it too much as ask then follow below

  1. Ensure that your joining keys are indexed and reduce redundant data before performing the merge. Check for duplications that might multiply rows unexpectedly.

  2. Apply filters to datasets to remove unnecessary rows before merging. Only include relevant data by pre-processing each file separately.

  3. Split the merge operation into smaller batches and process them separately, then recombine the results.

  4. Instead of row-by-row merging, aggregate data to reduce granularity. Summarize or group data before merging.

  5. Utilize data processing tools or databases optimized for handling large datasets, such as SQL databases, Python with Pandas, or dedicated data management platforms.

 


Thanks

Harish KM

If above step's solved your problem. Kudos will be highly apprieciated. Accept this as solution as well

Hi HarishKM,

 

Thank you for your reply, it helps clarify what is going on in my model and confirms my suspicions. I'm going to see what all the jargon actually means (I'm not an actual datascientist, I'm a plant guy) and see what I can do to improve.

Hi HarishKM,

 

I have delved a little more into the suggestions you made, but it is still a little unclear to me what you mean in some instances, and how to apply this to my dataset.

 

  1. Ensure that your joining keys are indexed and reduce redundant data before performing the merge. Check for duplications that might multiply rows unexpectedly

What exactly do you mean by "indexing joining keys"? There are many duplication as the tables I'm appending are all untransposed. I fear that deleting duplicates will mess up my model, but maybe I'm missing something (I'm no data scientist)

 

2. Apply filters to datasets to remove unnecessary rows before merging. Only include relevant data by pre-processing each file separately

- I'm unsure what to filter. As far as I know, these tables are condensed as far as they can be. What do you mean by pre-processing each file seperately?

 

3. Split the merge operation into smaller batches and process them separately, then recombine the results.

- I have no idea how to split this merge operation into smaller batches. All I'm trying to do is add a column "segment" to my combined master file, where it assigns a segment to my varieties. However, for some inexpliccable reason, this process adds hundreds of thousands of rows to my file. I just needed the one column (Maybe this clarifies my problem)

 

4. Instead of row-by-row merging, aggregate data to reduce granularity. Summarize or group data before merging.

- I have just spent some time trying to figure out how to aggregate the data, but I have no idea how to do this to my set of data. I've looked into summarizing and grouping, but I still dont understand how I'm going to do this without sacrificing data. For the record (pun unintended), This dataset oughta be a lot smaller than it turns out to be. I still don't get how I jump from 220k rows to 15 million just by adding a simple column. Maybe it's because I'm not a data scientist, but in my novice understanding of what I'm trying to do, I shouldn't even be adding a single row... I'm trying to add a column.

 

5. Utilize data processing tools or databases optimized for handling large datasets, such as SQL databases, Python with Pandas, or dedicated data management platforms.

- As I've explained above, this dataset shouldn't be this large. Its becoming large for some reason when I try to add a column by merging, which somehow, some way, adds millions of rows. None of which I need, btw.

 

To better help understand what is happening, I quickly drew the follow image:

 

Append - merge - millions of rows.png

Hi @John_Doe3,

 

Thank you for your follow-up and for clearly outlining your current challenge. Your efforts to resolve this are evident, and you are correctly identifying potential causes.

From your description, the significant increase in row count, from approximately 220K to several million suggests a many-to-many join during the merge process. This typically occurs in Power Query when both datasets contain multiple rows with the same key, resulting in a Cartesian product.

To address this, please consider these steps:

 

De-duplicate the key column in your reference table (Segment file):
Verify that the table you are merging (with the "segment" column) contains only one row per unique key (e.g., Variety). Use the "Remove Duplicates" feature in Power Query to ensure this and prevent unintended row multiplication.

 

Check the merge configuration:
When merging, select the Left Outer Join to keep all rows from your primary dataset and bring in matching segment data. Ensure the join key is consistent in both datasets, with matching data types and no extra spaces.

 

Preview the merge results:
Before applying changes, review the row count in the merge preview. If there is a substantial increase, pause and examine the key columns for duplicates or mismatches.

 

Use the merge as a lookup:
If your goal is to add a single "segment" column, reduce the segment reference table to just the join key and segment column. Make sure it is a one-to-one mapping before merging to avoid row expansion.

Your observation about simply wanting to add a column is valid; a one-to-one join should not increase the row count. This indicates the issue is likely due to duplicate or unmatched keys.

You are making excellent progress with Power BI, and these challenges are common even for advanced users.

 

Best regards,
Sahasra
Microsoft Fabric Community Support Team.

lbendlin
Super User
Super User

Merges in Power Query are often bad, but they can equally often be avoided by loading the queries as is, and then using the Power BI data model to do the joins.

pankajnamekar25
Super User
Super User

Hello @John_Doe3 

You're experiencing a massive row explosion because you're merging two tables that both have multiple rows per key (like segment or week), leading to a many-to-many relationship. This creates a Cartesian product, which multiplies rows instead of just matching them one-to-one. To fix this, you need to clean and aggregate your data before merging—ensure only one row per key in one of the tables, remove duplicates, and only keep necessary columns. Merge in stages and monitor the row count after each step. Pre-aggregating pricing data and turning off intermediate query loads can also help reduce memory usage and speed up performance.

Thanks

 Pankaj Namekar | LinkedIn

If this solution helps, please accept it and give a kudos (Like), it would be greatly appreciated.

 

Hi Panka,

 

Thank you for clarifiying. This confirms my suspicions.

 

I will now move on to translate the jargon you used (I'm not as versed into the terminology as most users here), and see if I can clear this up.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.