Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Anonymous
Not applicable

Remove Duplicates Per Group (keep first) measure help

Hi All

 

Is it possible to create a measure in PBI that remove duplicate values per group, and keeps the first occurrence of the value? Perhaps not a measure but maybe a new table, I'm unsure of how to work around this. For context, the database I'm working with has a funky treatment of some data that has a function back-end but doesn't make sense when trying to visualise user behaviour.

 

Currently, the data looks like:

USERENTRY IDENTRYSTATUSTIME SUBMITTED (HH:mm:ss)
userA1unicornfail10:23:10
userA2forestpass10:30:49
userA1unicornfail10:30:49
userB1unicornfail13:40:22
userB1fairypass13:43:59

 

I want to clean it so it looks like:

USERENTRY IDENTRYSTATUSTIME SUBMITTED (HH:mm:ss)
userA1unicornfail10:23:10
userA2forestpass10:30:49
userB1unicornfail13:40:22
userB1fairypass13:43:59
...............

 

Note the row I want to remove has

  1. duplicated ENTRY from the first instance for userA and
  2. duplicated TIME SUBMITTED from the first instance for userA

Also, the ENTRY ID cannot be used.

Any pointers would be greatly appreciated 🙂 

2 ACCEPTED SOLUTIONS
Pragati11
Super User
Super User

Hi @Anonymous ,

 

Check if this existing thread helps:

https://community.powerbi.com/t5/Power-Query/Remove-duplicates-keeping-the-most-recent-row/m-p/757837

 

Thanks,

Pragati

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

View solution in original post

Anonymous
Not applicable

Here's the M code that does what you want:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKi1OLXJU0lEyBOLSvMzk/KI8ICstMTMHJGhgZWRsZWigFKuDUGkEks8vSi0uATIKEouLIQqNDaxMLFEU4jISVaUTTpXGViZA+40wVALliyqR7AaqM7YyBZoYCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, EntryID = _t, Entry = _t, Status = _t, Time = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"EntryID", Int64.Type}, {"Entry", type text}, {"Status", type text}, {"Time", type time}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "FirstTimeEntry", 
        each List.Min( 
            Table.SelectRows(
                #"Changed Type",
                (r) => r[User] = [User] and r[Entry] = [Entry]
            )[Time] 
        ) = [Time]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([FirstTimeEntry] = true)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"FirstTimeEntry"})
in
    #"Removed Columns"

 

Best

D

View solution in original post

5 REPLIES 5
Anonymous
Not applicable

Here's the M code that does what you want:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKi1OLXJU0lEyBOLSvMzk/KI8ICstMTMHJGhgZWRsZWigFKuDUGkEks8vSi0uATIKEouLIQqNDaxMLFEU4jISVaUTTpXGViZA+40wVALliyqR7AaqM7YyBZoYCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, EntryID = _t, Entry = _t, Status = _t, Time = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"EntryID", Int64.Type}, {"Entry", type text}, {"Status", type text}, {"Time", type time}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "FirstTimeEntry", 
        each List.Min( 
            Table.SelectRows(
                #"Changed Type",
                (r) => r[User] = [User] and r[Entry] = [Entry]
            )[Time] 
        ) = [Time]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([FirstTimeEntry] = true)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"FirstTimeEntry"})
in
    #"Removed Columns"

 

Best

D

Anonymous
Not applicable

Thanks @Anonymous, had to have a little play around and add in r[entry id] = [entry id] but it's working well now 👍 cheers again

Anonymous
Not applicable

Yeah... That's a perfect job for Power Query. You can do it in DAX as well as a calculated table but it really should be performed in PQ as this is the data-munging tool. I can create some sample M code for you so that you can see how such cleaning is done...

Best
D
Pragati11
Super User
Super User

Hi @Anonymous ,

 

Check if this existing thread helps:

https://community.powerbi.com/t5/Power-Query/Remove-duplicates-keeping-the-most-recent-row/m-p/757837

 

Thanks,

Pragati

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

Anonymous
Not applicable

Thanks @Pragati11 , the buffer got me halfway there! Now it's just removing the wrong duplicate, hopefully the other reply will resolve this 🙂 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.