Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
IssyB
Frequent Visitor

Remove Duplicates Per Group (keep first) measure help

Hi All

 

Is it possible to create a measure in PBI that remove duplicate values per group, and keeps the first occurrence of the value? Perhaps not a measure but maybe a new table, I'm unsure of how to work around this. For context, the database I'm working with has a funky treatment of some data that has a function back-end but doesn't make sense when trying to visualise user behaviour.

 

Currently, the data looks like:

USERENTRY IDENTRYSTATUSTIME SUBMITTED (HH:mm:ss)
userA1unicornfail10:23:10
userA2forestpass10:30:49
userA1unicornfail10:30:49
userB1unicornfail13:40:22
userB1fairypass13:43:59

 

I want to clean it so it looks like:

USERENTRY IDENTRYSTATUSTIME SUBMITTED (HH:mm:ss)
userA1unicornfail10:23:10
userA2forestpass10:30:49
userB1unicornfail13:40:22
userB1fairypass13:43:59
...............

 

Note the row I want to remove has

  1. duplicated ENTRY from the first instance for userA and
  2. duplicated TIME SUBMITTED from the first instance for userA

Also, the ENTRY ID cannot be used.

Any pointers would be greatly appreciated 🙂 

2 ACCEPTED SOLUTIONS
Pragati11
Super User
Super User

Hi @IssyB ,

 

Check if this existing thread helps:

https://community.powerbi.com/t5/Power-Query/Remove-duplicates-keeping-the-most-recent-row/m-p/75783...

 

Thanks,

Pragati

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

View solution in original post

Anonymous
Not applicable

Here's the M code that does what you want:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKi1OLXJU0lEyBOLSvMzk/KI8ICstMTMHJGhgZWRsZWigFKuDUGkEks8vSi0uATIKEouLIQqNDaxMLFEU4jISVaUTTpXGViZA+40wVALliyqR7AaqM7YyBZoYCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, EntryID = _t, Entry = _t, Status = _t, Time = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"EntryID", Int64.Type}, {"Entry", type text}, {"Status", type text}, {"Time", type time}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "FirstTimeEntry", 
        each List.Min( 
            Table.SelectRows(
                #"Changed Type",
                (r) => r[User] = [User] and r[Entry] = [Entry]
            )[Time] 
        ) = [Time]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([FirstTimeEntry] = true)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"FirstTimeEntry"})
in
    #"Removed Columns"

 

Best

D

View solution in original post

5 REPLIES 5
Anonymous
Not applicable

Here's the M code that does what you want:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKi1OLXJU0lEyBOLSvMzk/KI8ICstMTMHJGhgZWRsZWigFKuDUGkEks8vSi0uATIKEouLIQqNDaxMLFEU4jISVaUTTpXGViZA+40wVALliyqR7AaqM7YyBZoYCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, EntryID = _t, Entry = _t, Status = _t, Time = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"EntryID", Int64.Type}, {"Entry", type text}, {"Status", type text}, {"Time", type time}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "FirstTimeEntry", 
        each List.Min( 
            Table.SelectRows(
                #"Changed Type",
                (r) => r[User] = [User] and r[Entry] = [Entry]
            )[Time] 
        ) = [Time]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([FirstTimeEntry] = true)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"FirstTimeEntry"})
in
    #"Removed Columns"

 

Best

D

Thanks @Anonymous, had to have a little play around and add in r[entry id] = [entry id] but it's working well now 👍 cheers again

Anonymous
Not applicable

Yeah... That's a perfect job for Power Query. You can do it in DAX as well as a calculated table but it really should be performed in PQ as this is the data-munging tool. I can create some sample M code for you so that you can see how such cleaning is done...

Best
D
Pragati11
Super User
Super User

Hi @IssyB ,

 

Check if this existing thread helps:

https://community.powerbi.com/t5/Power-Query/Remove-duplicates-keeping-the-most-recent-row/m-p/75783...

 

Thanks,

Pragati

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

Thanks @Pragati11 , the buffer got me halfway there! Now it's just removing the wrong duplicate, hopefully the other reply will resolve this 🙂 

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Power BI Carousel June 2024

Power BI Monthly Update - June 2024

Check out the June 2024 Power BI update to learn about new features.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.