Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
heimk008
Frequent Visitor

Remove duplicate rows based on max value of a different column

Hello!

 

I have a set of data that I am trying to remove duplicates from based on the number of filled cells per row. The data is form submissions that have a person's email address and then a number of different fields that could be filled, ranging from 1 - 16 filled cells. I was able to get make a column that tells me how many cells are filled for each of my rows which basically gives me a max. I have removed the duplicates that have the same email address and the same amount of filled cells but now I need to remove any duplicates (based on the email address) that have less than the max amount of filled cells for each email address. I found another a solution in the forums to do this with DAX but I need to be able to do this within the Query Editor so that once I have removed the duplicates I can append this data with other ones. 

 

Here is a snap shot of what the query currently looks like, and I have marked the ones that I would want removed: 

query.jpg

 

Thanks! 

1 ACCEPTED SOLUTION

Yes:

 

Specializing in Power Query Formula Language (M)

View solution in original post

6 REPLIES 6
jthomson
Solution Sage
Solution Sage

I do something similar in one of my reports that should work here (given the lack of incremental load), where I've merged two queries, a bunch of existing data which I give a number 1 in a custom field, and new data, which I give the number 2 in the same custom field. If I sort it on that column in descending order, it gets rid of any old data that's been updated, so the same principle should work for your problem - sort it descending by your #filled value then remove duplicates on your email column?

Thanks for the response , however that doesn't seem to work. Even when I sort by decending and then remove duplicate emails it still takes away some of the emails that have the highest number of filled cells. 

 

For example:

before-

 before.jpg

after -

after.jpg

 

you can see that I now lost the entry with 16 filled cells. 

A known issue with sorting and removing duplicates is that you need to buffer the table in between.

 

You might wrap your sort step like: Table.Buffer(your sortcode).

Specializing in Power Query Formula Language (M)

@MarcelBeug,

still relatively new to Power BI, what does "wrap your sort step like: Table.Buffer(your sortcode)" actually mean? are you able to show me by chance? 

Yes:

 

Specializing in Power Query Formula Language (M)

Thanks for this 🙌

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.