Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Anonymous
Not applicable

In Power Query Editor, how to remove the duplicates while also removing the original

duplicated.png

 

Hi all,

 

I am now at Power Bi's Power Query Editor, I like to delete all the rows that I marked red, they are duplicates, if I do right click and remove duplicated, one of the duplicates still remain.

 

I really need to both the original and the duplicates, so all the rows in my red brackets should be removed. Im new to power bi and M language. THanks.

1 ACCEPTED SOLUTION
PC2790
Community Champion
Community Champion

For that, you need to do an extra thing.

In step 1 of grouping, go to Advance section and add an aggregation to get all rows. Something like this:

PC2790_0-1611289396835.png

2) Then filter out the rows having Duplicates count as 1 as below:

PC2790_1-1611289460814.png

 

3) Then expand other columns as below:

PC2790_3-1611289716882.png

 

The end output will give you all your required columns.

View solution in original post

14 REPLIES 14
Icey
Community Support
Community Support

Hi @Anonymous ,

 

Please check the latest reply of @PC2790 . It should show you what you want:

 


@PC2790 wrote:

For that, you need to do an extra thing.

In step 1 of grouping, go to Advance section and add an aggregation to get all rows. Something like this:

PC2790_0-1611289396835.png

2) Then filter out the rows having Duplicates count as 1 as below:

PC2790_1-1611289460814.png

 

3) Then expand other columns as below:

PC2790_3-1611289716882.png

 

The end output will give you all your required columns.


If this doesn't work, please let us know.

 

 

Best Regards,

Icey

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Anonymous
Not applicable

if you make a copyable table available, I can try to do what you ask

PC2790
Community Champion
Community Champion

You can do it in Power Query.

Here are the steps:

1) To identify the duplicate columns. Do a grouping based on the identifier. Something like this:

PC2790_0-1611232497724.png

 

PC2790_1-1611232525451.png

 

In your case, 'Identifier' will be there instead of 'Passenger Name'

Corresponding M Query:

= Table.Group(#"Changed Type", {"Identifier"}, {{"Duplicates", each Table.RowCount(_), Int64.Type}})

PC2790_2-1611232744522.png

 

2) Delete the rows that are duplicates along with the original records.Corresponding M query:

= Table.SelectRows(#"Grouped Rows",each _[Duplicates] =1)

The end result will be the only rows containing unique values in identifier section.

PC2790_3-1611232768628.png

See if this works for you

Anonymous
Not applicable

@StefanoGrimaldi @PC2790 

 

Thank you for your replys! but after I do the grouping and filtering, Only 2 columns left, other columns all disappear,i had more than 20 columns in this dataset. How to fix?

 

 

PC2790
Community Champion
Community Champion

For that, you need to do an extra thing.

In step 1 of grouping, go to Advance section and add an aggregation to get all rows. Something like this:

PC2790_0-1611289396835.png

2) Then filter out the rows having Duplicates count as 1 as below:

PC2790_1-1611289460814.png

 

3) Then expand other columns as below:

PC2790_3-1611289716882.png

 

The end output will give you all your required columns.

you need to add each column in the grouping function, see the example are in basic, select advance and you can add more columns, add them to your grouping function. 





Did I answer your question? Mark my post as a solution! / Did it help? Give some Kudos!

Proud to be a Super User!




Anonymous
Not applicable

@StefanoGrimaldi 

If I did what you said, I add all other columns as my "filter by". but in this case, I only want to filter by "identidier" column, I just want to use "identifier" column to filter all the rows, and keep all other columns 

you add all the columns, and created the new column as before, in advance you select in the first section all the columns, in the second section you add a new costum column using the reference column with the countrows. this will give you the amount its duplicated and them filter those bigger tham 1.

 

StefanoGrimaldi_0-1611259114467.png

 

StefanoGrimaldi_2-1611259316343.png

 

StefanoGrimaldi_3-1611259331494.png

 





Did I answer your question? Mark my post as a solution! / Did it help? Give some Kudos!

Proud to be a Super User!




StefanoGrimaldi
Resident Rockstar
Resident Rockstar

create a new query referenced from the original one, make a group by function for that only column with a column that count the amount of time it apperas pretty easy, them use a new column on the original table to get teh amount of count for each reference on the new table and use a filter all value bigger tham 1. 

StefanoGrimaldi_0-1611230826868.png

second option do the grouping in the original one, add that count column equally and same filter as above described. 

 





Did I answer your question? Mark my post as a solution! / Did it help? Give some Kudos!

Proud to be a Super User!




Anonymous
Not applicable

Someone else knows? please help:)

StefanoGrimaldi
Resident Rockstar
Resident Rockstar

you have to ways

1) select the column, right click on the column that have the refered duplicate and look for the remove duplicate option, have done it and works if isnt working for you would be very strange

2) if all row of each duplicate its the same, go to the main tabs and choose under the rows deletion options the delete duplicate rows. 

for a better insight share the M code of how you trying the remove duplicate step to see where its failing. 





Did I answer your question? Mark my post as a solution! / Did it help? Give some Kudos!

Proud to be a Super User!




Anonymous
Not applicable

@StefanoGrimaldi , thank you for your reply. 

but your way can only help to remove the duplicated one, I also need to remove the original one.

@Anonymous If the only instances you want to delete are the ones visible in the screenshot, you might just filter out the rows that you do not want, using the 'identifier' column.

P.S. The last red brackets in the picture aren't actually duplicates 🙂

Anonymous
Not applicable

thank you for your reply!

 

Unfortunately, I really need to filter those rows out programatically. Those red lines are the rows that our shippers ship by mistake.  

 

Do you know other solutions ?

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.