Solved: Removing Duplicates in Power Query based on X

icedavies · ‎07-08-2024

Hi, I'm having a bit of trouble and hoping someone can help 🙂

I have a table with duplicates reference numbers in a column, lets call it "Ref Numbers". I also have another column which I will call "Decider". This Decider column has only 2 values in it, either "RS or "RNS".
I want to remove all duplicates in the "Ref Numbers" column, however if a duplicated reference has both "RS" or "RNS" in the Decider column, on different rows, then I want Power Query to remove the "RS" rows and keep the "RNS" rows.
I have been advised to sort the columns into Ascending for Ref Number column and Descending for "Decider" column, but when I test this, it doesn't work for all entries. Can you suggest a way of doing this please?

Out of interest how does Power Query decide which duplicate to remove? From what I've read the consensus seems to be that it keeps the first row with a duplicated value but I find that this is not always the case.

Anonymous · ‎07-10-2024

Hi @icedavies

In Power Query, select all the columns right-click and select the Remove Duplicates

Then try this code:

= Table.Group(#"Removed Duplicates", {"Ref Numbers"}, {{"Count", each if Table.RowCount(_) >1 then Table.SelectRows(_, each [Decider] = "RNS") else _ , type table }})

Select the Decider in the :

The result is as follow:

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

icedavies · ‎07-10-2024

Thanks Zhengdong Xu,

I have done what you have said and as I write this an hour after I have tried loading the data, Power BI is still loading the data. I'm not sure how or why it is loading over 5.3GB of data. The file I'm loading from is not even 3MB. Is there a reason why it's loading so much data?

Anonymous · ‎07-10-2024

Hi @icedavies

No, this is not normal, and executing this code does not increase memory.

Maybe you can close it and reload it again.

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

icedavies · ‎07-11-2024

Thanks Zhengdong Xu, I think it was me causing the issue but it's resolved now.
Thank you so much for your help, I was stuck on this for days!

rajendraongole1 · ‎07-08-2024

Hi @icedavies - Can you please provide with some sample data by attaching power bi file. it helps to analyse and provide help.

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

icedavies · ‎07-09-2024

Thanks Rajendraongole1. Please see an edited version of the sample table below. There's a mix of reference numbers which appear once or more. And a mix of RS or RNS. I want to remove all duplicate reference numbers within the "Transform Data" section of Power BI and if a duplicated reference number has both RS and RNS then I would want the reference number with RS removed and the one with RNS kept. Hope that makes sense

Ref Numbers	Decider
008930502	RNS
008930502	RS
008930502	RS
008930502	RS
000000028	RS
000770045	RNS
000770045	RNS
000770045	RNS
999737400	RS
001063583	RS
001063583	RS
001063583	RNS

Anonymous · ‎07-09-2024

Hi @icedavies

Please try this:
The sample data which you provided:

Here I create a calculated table with the following dax:

Table2 = 
    FILTER(
		SUMMARIZE(
			'Table',
			'Table'[Decider],
			'Table'[Ref Numbers]
		),
		IF(
			CALCULATE(
				DISTINCTCOUNT('Table'[Decider]),
				FILTER(
					ALLSELECTED('Table'),
					'Table'[Ref Numbers] = EARLIER('Table'[Ref Numbers])
				)
			) > 1,
			'Table'[Decider] = "RNS"
		) || CALCULATE(
			DISTINCTCOUNT('Table'[Decider]),
			FILTER(
				ALLSELECTED('Table'),
				'Table'[Ref Numbers] = EARLIER('Table'[Ref Numbers])
			)
		) = 1
	)

The result is as follow:

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

icedavies · ‎07-10-2024

Hi Zhengdong Xu,

Thank you so much for your reply but it's not quite what I was after. I should have specified that there is a lot more data in my original table. I've only provided 2 columns to show what columns are needed to remove the duplicates based on the particular criteria.

Also because of the relationships that I want the table to have with other tables in the model and how the users would interact with the dashboard, I would want the duplicates removed at Power Query level, rather than using DAX.

Anonymous · ‎07-10-2024

Hi @icedavies

In Power Query, select all the columns right-click and select the Remove Duplicates

Then try this code:

= Table.Group(#"Removed Duplicates", {"Ref Numbers"}, {{"Count", each if Table.RowCount(_) >1 then Table.SelectRows(_, each [Decider] = "RNS") else _ , type table }})

Select the Decider in the :

The result is as follow:

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

icedavies · ‎07-10-2024

Thanks Zhengdong Xu,

Apologies I'm reasonably new to Power BI. When you say to enter that code, is that to replace the code where I've just removed the Duplicates?
Thanks

Anonymous · ‎07-10-2024

Hi @icedavies

No, you can add a custom column and paste the code in it:

For example, add a custom column:

Then paste the code in this:

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Removing Duplicates in Power Query based on X

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta

Removing Duplicates in Power Query based on X

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly