Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
Anonymous
Not applicable

Remove Duplicates Per Group (keep first) measure help

Hi All

 

Is it possible to create a measure in PBI that remove duplicate values per group, and keeps the first occurrence of the value? Perhaps not a measure but maybe a new table, I'm unsure of how to work around this. For context, the database I'm working with has a funky treatment of some data that has a function back-end but doesn't make sense when trying to visualise user behaviour.

 

Currently, the data looks like:

USERENTRY IDENTRYSTATUSTIME SUBMITTED (HH:mm:ss)
userA1unicornfail10:23:10
userA2forestpass10:30:49
userA1unicornfail10:30:49
userB1unicornfail13:40:22
userB1fairypass13:43:59

 

I want to clean it so it looks like:

USERENTRY IDENTRYSTATUSTIME SUBMITTED (HH:mm:ss)
userA1unicornfail10:23:10
userA2forestpass10:30:49
userB1unicornfail13:40:22
userB1fairypass13:43:59
...............

 

Note the row I want to remove has

  1. duplicated ENTRY from the first instance for userA and
  2. duplicated TIME SUBMITTED from the first instance for userA

Also, the ENTRY ID cannot be used.

Any pointers would be greatly appreciated 🙂 

2 ACCEPTED SOLUTIONS
Pragati11
Super User
Super User

Hi @Anonymous ,

 

Check if this existing thread helps:

https://community.powerbi.com/t5/Power-Query/Remove-duplicates-keeping-the-most-recent-row/m-p/757837

 

Thanks,

Pragati

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

View solution in original post

Anonymous
Not applicable

Here's the M code that does what you want:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKi1OLXJU0lEyBOLSvMzk/KI8ICstMTMHJGhgZWRsZWigFKuDUGkEks8vSi0uATIKEouLIQqNDaxMLFEU4jISVaUTTpXGViZA+40wVALliyqR7AaqM7YyBZoYCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, EntryID = _t, Entry = _t, Status = _t, Time = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"EntryID", Int64.Type}, {"Entry", type text}, {"Status", type text}, {"Time", type time}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "FirstTimeEntry", 
        each List.Min( 
            Table.SelectRows(
                #"Changed Type",
                (r) => r[User] = [User] and r[Entry] = [Entry]
            )[Time] 
        ) = [Time]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([FirstTimeEntry] = true)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"FirstTimeEntry"})
in
    #"Removed Columns"

 

Best

D

View solution in original post

5 REPLIES 5
Anonymous
Not applicable

Here's the M code that does what you want:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKi1OLXJU0lEyBOLSvMzk/KI8ICstMTMHJGhgZWRsZWigFKuDUGkEks8vSi0uATIKEouLIQqNDaxMLFEU4jISVaUTTpXGViZA+40wVALliyqR7AaqM7YyBZoYCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, EntryID = _t, Entry = _t, Status = _t, Time = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"EntryID", Int64.Type}, {"Entry", type text}, {"Status", type text}, {"Time", type time}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "FirstTimeEntry", 
        each List.Min( 
            Table.SelectRows(
                #"Changed Type",
                (r) => r[User] = [User] and r[Entry] = [Entry]
            )[Time] 
        ) = [Time]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([FirstTimeEntry] = true)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"FirstTimeEntry"})
in
    #"Removed Columns"

 

Best

D

Anonymous
Not applicable

Thanks @Anonymous, had to have a little play around and add in r[entry id] = [entry id] but it's working well now 👍 cheers again

Anonymous
Not applicable

Yeah... That's a perfect job for Power Query. You can do it in DAX as well as a calculated table but it really should be performed in PQ as this is the data-munging tool. I can create some sample M code for you so that you can see how such cleaning is done...

Best
D
Pragati11
Super User
Super User

Hi @Anonymous ,

 

Check if this existing thread helps:

https://community.powerbi.com/t5/Power-Query/Remove-duplicates-keeping-the-most-recent-row/m-p/757837

 

Thanks,

Pragati

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

Anonymous
Not applicable

Thanks @Pragati11 , the buffer got me halfway there! Now it's just removing the wrong duplicate, hopefully the other reply will resolve this 🙂 

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

July Newsletter

Fabric Community Update - July 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors