Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
icedavies
Frequent Visitor

Removing Duplicates in Power Query based on X

Hi, I'm having a bit of trouble and hoping someone can help 🙂

I have a table with duplicates reference numbers in a column, lets call it "Ref Numbers". I also have another column which I will call "Decider".  This Decider column has only 2 values in it, either "RS or "RNS".
I want to remove all duplicates in the "Ref Numbers" column, however if a duplicated reference has both "RS" or "RNS" in the Decider column, on different rows, then I want Power Query to remove the "RS" rows and keep the "RNS" rows.
I have been advised to sort the columns into Ascending for Ref Number column and Descending for "Decider" column, but when I test this, it doesn't work for all entries.  Can you suggest a way of doing this please?

Out of interest how does Power Query decide which duplicate to remove?  From what I've read the consensus seems to be that it keeps the first row with a duplicated value but I find that this is not always the case.

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @icedavies 

 

In Power Query, select all the columns right-click and select the Remove Duplicates

vzhengdxumsft_0-1720598265657.png

Then try this code:

 

= Table.Group(#"Removed Duplicates", {"Ref Numbers"}, {{"Count", each if Table.RowCount(_) >1 then Table.SelectRows(_, each [Decider] = "RNS") else _ , type table }})

 

Select the Decider in the vzhengdxumsft_2-1720598422001.png :

vzhengdxumsft_1-1720598414195.png

The result is as follow:

vzhengdxumsft_3-1720598467050.png

 

 

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

10 REPLIES 10
icedavies
Frequent Visitor

Thanks Zhengdong Xu,

I have done what you have said and as I write this an hour after I have tried loading the data, Power BI is still loading the data.  I'm not sure how or why it is loading over 5.3GB of data. The file I'm loading from is not even 3MB.  Is there a reason why it's loading so much data?

 

icedavies_0-1720605279993.png

 

Anonymous
Not applicable

Hi @icedavies 

 

No, this is not normal, and executing this code does not increase memory.

Maybe you can close it and reload it again.

 

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks Zhengdong Xu,  I think it was me causing the issue but it's resolved now.
Thank you so much for your help, I was stuck on this for days!

rajendraongole1
Super User
Super User

Hi @icedavies - Can you please provide with some sample data by attaching power bi file. it helps to analyse and provide help.

 

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





Thanks Rajendraongole1.  Please see an edited version of the sample table below.  There's a mix of reference numbers which appear once or more.  And a mix of RS or RNS.  I want to remove all duplicate reference numbers within the "Transform Data" section of Power BI and if a duplicated reference number has both RS and RNS then I would want the reference number with RS removed and the one with RNS kept.  Hope that makes sense

 

Ref NumbersDecider
008930502RNS
008930502RS
008930502RS
008930502RS
000000028RS
000770045RNS
000770045RNS
000770045RNS
999737400RS
001063583RS
001063583RS
001063583RNS
Anonymous
Not applicable

Hi @icedavies 

 

Please try this:
The sample data which you provided:

vzhengdxumsft_0-1720591377717.png

Here I create a calculated table with the following dax:

Table2 = 
    FILTER(
		SUMMARIZE(
			'Table',
			'Table'[Decider],
			'Table'[Ref Numbers]
		),
		IF(
			CALCULATE(
				DISTINCTCOUNT('Table'[Decider]),
				FILTER(
					ALLSELECTED('Table'),
					'Table'[Ref Numbers] = EARLIER('Table'[Ref Numbers])
				)
			) > 1,
			'Table'[Decider] = "RNS"
		) || CALCULATE(
			DISTINCTCOUNT('Table'[Decider]),
			FILTER(
				ALLSELECTED('Table'),
				'Table'[Ref Numbers] = EARLIER('Table'[Ref Numbers])
			)
		) = 1
	)

 The result is as follow:

vzhengdxumsft_1-1720591404297.png

 

 

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi Zhengdong Xu,

Thank you so much for your reply but it's not quite what I was after.  I should have specified that there is a lot more data in my original table. I've only provided 2 columns to show what columns are needed to remove the duplicates based on the particular criteria.

Also because of the relationships that I want the table to have with other tables in the model and how the users would interact with the dashboard, I would want the duplicates removed at Power Query level, rather than using DAX.

Anonymous
Not applicable

Hi @icedavies 

 

In Power Query, select all the columns right-click and select the Remove Duplicates

vzhengdxumsft_0-1720598265657.png

Then try this code:

 

= Table.Group(#"Removed Duplicates", {"Ref Numbers"}, {{"Count", each if Table.RowCount(_) >1 then Table.SelectRows(_, each [Decider] = "RNS") else _ , type table }})

 

Select the Decider in the vzhengdxumsft_2-1720598422001.png :

vzhengdxumsft_1-1720598414195.png

The result is as follow:

vzhengdxumsft_3-1720598467050.png

 

 

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks Zhengdong Xu,

 

Apologies I'm reasonably new to Power BI.  When you say to enter that code, is that to replace the code where I've just removed the Duplicates?
Thanks

Anonymous
Not applicable

Hi @icedavies 

 

No, you can add a custom column and paste the code in it:

For example, add a custom column:

vzhengdxumsft_0-1720600373283.pngvzhengdxumsft_1-1720600384400.png

Then paste the code in this:

vzhengdxumsft_2-1720600429870.png

 

vzhengdxumsft_3-1720600477381.png

 

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors