Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
Hi, I'm having a bit of trouble and hoping someone can help 🙂
I have a table with duplicates reference numbers in a column, lets call it "Ref Numbers". I also have another column which I will call "Decider". This Decider column has only 2 values in it, either "RS or "RNS".
I want to remove all duplicates in the "Ref Numbers" column, however if a duplicated reference has both "RS" or "RNS" in the Decider column, on different rows, then I want Power Query to remove the "RS" rows and keep the "RNS" rows.
I have been advised to sort the columns into Ascending for Ref Number column and Descending for "Decider" column, but when I test this, it doesn't work for all entries. Can you suggest a way of doing this please?
Out of interest how does Power Query decide which duplicate to remove? From what I've read the consensus seems to be that it keeps the first row with a duplicated value but I find that this is not always the case.
Solved! Go to Solution.
Hi @icedavies
In Power Query, select all the columns right-click and select the Remove Duplicates
Then try this code:
= Table.Group(#"Removed Duplicates", {"Ref Numbers"}, {{"Count", each if Table.RowCount(_) >1 then Table.SelectRows(_, each [Decider] = "RNS") else _ , type table }})
Select the Decider in the :
The result is as follow:
Best Regards
Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thanks Zhengdong Xu,
I have done what you have said and as I write this an hour after I have tried loading the data, Power BI is still loading the data. I'm not sure how or why it is loading over 5.3GB of data. The file I'm loading from is not even 3MB. Is there a reason why it's loading so much data?
Hi @icedavies
No, this is not normal, and executing this code does not increase memory.
Maybe you can close it and reload it again.
Best Regards
Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thanks Zhengdong Xu, I think it was me causing the issue but it's resolved now.
Thank you so much for your help, I was stuck on this for days!
Hi @icedavies - Can you please provide with some sample data by attaching power bi file. it helps to analyse and provide help.
Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!
Proud to be a Super User! | |
Thanks Rajendraongole1. Please see an edited version of the sample table below. There's a mix of reference numbers which appear once or more. And a mix of RS or RNS. I want to remove all duplicate reference numbers within the "Transform Data" section of Power BI and if a duplicated reference number has both RS and RNS then I would want the reference number with RS removed and the one with RNS kept. Hope that makes sense
|
Hi @icedavies
Please try this:
The sample data which you provided:
Here I create a calculated table with the following dax:
Table2 =
FILTER(
SUMMARIZE(
'Table',
'Table'[Decider],
'Table'[Ref Numbers]
),
IF(
CALCULATE(
DISTINCTCOUNT('Table'[Decider]),
FILTER(
ALLSELECTED('Table'),
'Table'[Ref Numbers] = EARLIER('Table'[Ref Numbers])
)
) > 1,
'Table'[Decider] = "RNS"
) || CALCULATE(
DISTINCTCOUNT('Table'[Decider]),
FILTER(
ALLSELECTED('Table'),
'Table'[Ref Numbers] = EARLIER('Table'[Ref Numbers])
)
) = 1
)
The result is as follow:
Best Regards
Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi Zhengdong Xu,
Thank you so much for your reply but it's not quite what I was after. I should have specified that there is a lot more data in my original table. I've only provided 2 columns to show what columns are needed to remove the duplicates based on the particular criteria.
Also because of the relationships that I want the table to have with other tables in the model and how the users would interact with the dashboard, I would want the duplicates removed at Power Query level, rather than using DAX.
Hi @icedavies
In Power Query, select all the columns right-click and select the Remove Duplicates
Then try this code:
= Table.Group(#"Removed Duplicates", {"Ref Numbers"}, {{"Count", each if Table.RowCount(_) >1 then Table.SelectRows(_, each [Decider] = "RNS") else _ , type table }})
Select the Decider in the :
The result is as follow:
Best Regards
Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thanks Zhengdong Xu,
Apologies I'm reasonably new to Power BI. When you say to enter that code, is that to replace the code where I've just removed the Duplicates?
Thanks
Hi @icedavies
No, you can add a custom column and paste the code in it:
For example, add a custom column:
Then paste the code in this:
Best Regards
Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
Check out the October 2025 Power BI update to learn about new features.