Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
user-1
Regular Visitor

Remove duplicates with multiple criteria

I'm hoping that someone can help me, I've gone through a number of different solutions already posted and none of them are working for me. 

I have a data base that creates a row for employees for each interaction but I really just want to keep the current 'Completion Status'

 

When I try sorting by Completion Status and then highlighting User ID & Completion Status and 'remove duplicates' it either removes them all or still leaves the incompletes. 

The idea is that if an employee has completed the course then I want one row that indicates that but if the employee has only incomplete recorded for that course then I want one record that indicates that.

This is part of my dashboard so I am using Power Query for this, the file contains a few hundred thousand rows.

Target YearUser IDFirst NameLast NameGroupDivisionCourse TypeCompletion StatusCourse TitleAssignment DateCountry/Region
2020E0303116DADeCOSMA International111 vILTCompleteAntitrust EssentialsMonday, April 5, 2021US
2020E0303116DADeCOSMA International111 vILTCompleteAntitrust EssentialsWednesday, November 24, 2021US
2020E0303116DADeCOSMA International111 vILTCompleteAntitrust EssentialsThursday, February 17, 2022US
2020E0303116DADeCOSMA International111 vILTIncompleteAntitrust EssentialsWednesday, November 24, 2021US
2020E0303154ScSchCOSMA International111 vILTCompleteAntitrust EssentialsMonday, April 5, 2021US
2020E0303154ScSchCOSMA International111 vILTCompleteAntitrust EssentialsMonday, January 30, 2023US
2020E0303154ScSchCOSMA International111 vILTIncompleteAntitrust EssentialsSaturday, September 25, 2021US
2020E0303154ScSchCOSMA International111 vILTIncompleteAntitrust EssentialsFriday, October 22, 2021US
2020E0220961TBoeMagna Steyr (871)vILTIncompleteAntitrust EssentialsThursday, October 28, 2021DE
1 ACCEPTED SOLUTION
MarkLaf
Super User
Super User

I think you basically want to group by User ID and Course Title (or whatever course identifier is) and then grab the top row in your group where you've sorted the table beforehand to ensure that the top row is the one you want to represent the User/Course pair.

 

 

let
    // dummy data provided
    Source = Table.FromRows(
        Json.Document(Binary.Decompress(Binary.FromText("xdNPa8MgFADwryKeNshBTf/tmK0pZCzrwY4dSg42fayBVMPTFPLtp250p0EHaXd56lP0h0+3WyqYYDShOUtZyvnMd5dZCODD01qWGSm0A9TKNUar1mc558Q3p+JlE9aYY9eCC8sz7RqHvXUktxb8QLXWp0uj92pISNZh05JpQvyR3OffJK2SWwDeYa/BRsOrOcFxB0jE5PaOzaHHL8YKdtgrHAifR4YYj1Ho+hoXMp34rqxjOPzL07iu4FnpWJCURUQ6IuKSikjleowQCZ37rsj493EJZYVNhKxrZyJD/MoQgj3MwkTY+9GETUv1oRWRDgb0I3K3mPP7v53/80vOgsVZsMxpVX0C", BinaryEncoding.Base64), Compression.Deflate)), 
        let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Target Year" = _t, #"User ID" = _t, #"First Name" = _t, #"Last Name" = _t, Group = _t, Division = _t, #"Course Type" = _t, #"Completion Status" = _t, #"Course Title" = _t, #"Assignment Date" = _t, #"Country/Region" = _t]
    ),

    // need this step if we actually want date sort later on to be correct
    Types = Table.TransformColumnTypes(Source,{{"Assignment Date", type date}}),

    // we will use GroupKind.Local later, so sort everything first
    SortRows = Table.Sort(
        Types,
        {
            // first sort by your group key (User ID and Course Title)
            {"User ID", Order.Ascending}, {"Course Title", Order.Ascending},
            
            // then sort so representative row is first
            // {Complete before Incomplete}    and    {newer before older }
            {"Completion Status", Order.Ascending}, {"Assignment Date", Order.Descending}
        }
    ),

    // specify group columns and then dynamically get the rest of your columns
    GroupByCols = {"User ID", "Course Title"}, ExpandCols = List.Difference( Table.ColumnNames( Types ), GroupByCols ),

    // now do the local grouping where you grab your top row
    Group = Table.Group(
        SortRows, 
        GroupByCols, 
        {{"rows", each Table.First( _ ) , Type.TableRow( Value.Type( SortRows ) ) }},
        GroupKind.Local
    ),

    // expand to restore all your columns
    // Note: this should be manual if you want a subset of cols and you can remove ExpandCols step
    ExpandRows = Table.ExpandRecordColumn(Group, "rows", ExpandCols )
in
    ExpandRows

 

Result:

MarkLaf_0-1697663138512.png

 

 

View solution in original post

2 REPLIES 2
user-1
Regular Visitor

@MarkLaf 

OMG I can not thank you enough that worked !!

MarkLaf
Super User
Super User

I think you basically want to group by User ID and Course Title (or whatever course identifier is) and then grab the top row in your group where you've sorted the table beforehand to ensure that the top row is the one you want to represent the User/Course pair.

 

 

let
    // dummy data provided
    Source = Table.FromRows(
        Json.Document(Binary.Decompress(Binary.FromText("xdNPa8MgFADwryKeNshBTf/tmK0pZCzrwY4dSg42fayBVMPTFPLtp250p0EHaXd56lP0h0+3WyqYYDShOUtZyvnMd5dZCODD01qWGSm0A9TKNUar1mc558Q3p+JlE9aYY9eCC8sz7RqHvXUktxb8QLXWp0uj92pISNZh05JpQvyR3OffJK2SWwDeYa/BRsOrOcFxB0jE5PaOzaHHL8YKdtgrHAifR4YYj1Ho+hoXMp34rqxjOPzL07iu4FnpWJCURUQ6IuKSikjleowQCZ37rsj493EJZYVNhKxrZyJD/MoQgj3MwkTY+9GETUv1oRWRDgb0I3K3mPP7v53/80vOgsVZsMxpVX0C", BinaryEncoding.Base64), Compression.Deflate)), 
        let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Target Year" = _t, #"User ID" = _t, #"First Name" = _t, #"Last Name" = _t, Group = _t, Division = _t, #"Course Type" = _t, #"Completion Status" = _t, #"Course Title" = _t, #"Assignment Date" = _t, #"Country/Region" = _t]
    ),

    // need this step if we actually want date sort later on to be correct
    Types = Table.TransformColumnTypes(Source,{{"Assignment Date", type date}}),

    // we will use GroupKind.Local later, so sort everything first
    SortRows = Table.Sort(
        Types,
        {
            // first sort by your group key (User ID and Course Title)
            {"User ID", Order.Ascending}, {"Course Title", Order.Ascending},
            
            // then sort so representative row is first
            // {Complete before Incomplete}    and    {newer before older }
            {"Completion Status", Order.Ascending}, {"Assignment Date", Order.Descending}
        }
    ),

    // specify group columns and then dynamically get the rest of your columns
    GroupByCols = {"User ID", "Course Title"}, ExpandCols = List.Difference( Table.ColumnNames( Types ), GroupByCols ),

    // now do the local grouping where you grab your top row
    Group = Table.Group(
        SortRows, 
        GroupByCols, 
        {{"rows", each Table.First( _ ) , Type.TableRow( Value.Type( SortRows ) ) }},
        GroupKind.Local
    ),

    // expand to restore all your columns
    // Note: this should be manual if you want a subset of cols and you can remove ExpandCols step
    ExpandRows = Table.ExpandRecordColumn(Group, "rows", ExpandCols )
in
    ExpandRows

 

Result:

MarkLaf_0-1697663138512.png

 

 

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors