Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
neilh
Regular Visitor

Remove Duplicates Based on String

Hello Everyone,

 

I have a large excel sheet of users who were assigned training and I want to generate completion statistics for this in Power BI. Two courses were assigned, with a row for each course showing whether it was completed or not. For example:

BobCourse AComplete
BobCourse BIncomplete
TomCourse AComplete
TomCourse BComplete

How can I remove duplicates in Power Query so that if a user only completed one course, it keeps the one that wasn't completed? Currently when I remove duplicates based on the username, it keeps the rows marked Complete no matter how it's sorted. When doing this in Excel, I would put all the incomplete rows at the top, and removing duplicates would always keep the first entry. This doesn't seem to be the case in Power BI. I am new to this so forgive me if this is trivial. Any help would be great, thank you!

2 ACCEPTED SOLUTIONS

Hi @neilh 

 

You can get the expected result with two steps easily. First group by person name column and perform All Rows operation on the new column named as "All Data". 

vjingzhang_0-1653382139838.png

 

Then add fx to create a custom step with the following code. 

= Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})

vjingzhang_1-1653382663044.png

 

My sample data is as below. "Status" is the column which marks "Complete" or "Incomplete" for every row. 

vjingzhang_2-1653382822882.png

 

Here is the full code. You can paste it into a blank query's Advanced Editor to check every step. 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSlWJ1MFU4gXieecnIakLycwmYgq7CCUOFc0ZRZjGmKWg2YarCcE8sAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text]}}),
    #"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
    #"Aggregated All Data"

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

View solution in original post

Hi @neilh 

 

Just need to modify the first step, switch to Advanced option and add a second new column Department with Max operation on Department. You can add multiple new columns with appropriate aggregations. 

vjingzhang_0-1653443248476.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSgWxHpVgdTFVOIJ5nXjK6upD8XLymOWFV5YRVlXNGUWYxpmkotjrjUInpPqDKWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t, Department = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text, Department=nullable text]}, {"Department", each List.Max([Department]), type nullable text}}),
    #"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
    #"Aggregated All Data"

 

Jing

View solution in original post

8 REPLIES 8
Vijay_A_Verma
Super User
Super User

First apply filter on last colum to keep only records which were not completed.

I think just applying the filter will solve your issue as there are only 2 records per user. If needed, select first column and remove duplicates. If duplicates are in 2nd column also for a user, then you will need to select first 2 columns and then remove duplictes.

Thank you for your reply! That definitely works for seeing only the users who didn't complete it. The thing is, I want to see users who did complete the training as well. If I apply a filter showing only records which were completed, it will include users who did not complete one of the two trainings.

I need to see the expected output for the given table before I can suggest the solution. 

I basically want to take a bunch of data that looks like this:

Bob Training A

 Complete

Bob Training B Incomplete
Tom Training A Complete
Tom Training B Complete
Chris Training A Incomplete
Chris Training B Incomplete

And turn it into this:

Bob Incomplete
Tom Complete
Chris Incomplete

Thank you!

Hi @neilh 

 

You can get the expected result with two steps easily. First group by person name column and perform All Rows operation on the new column named as "All Data". 

vjingzhang_0-1653382139838.png

 

Then add fx to create a custom step with the following code. 

= Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})

vjingzhang_1-1653382663044.png

 

My sample data is as below. "Status" is the column which marks "Complete" or "Incomplete" for every row. 

vjingzhang_2-1653382822882.png

 

Here is the full code. You can paste it into a blank query's Advanced Editor to check every step. 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSlWJ1MFU4gXieecnIakLycwmYgq7CCUOFc0ZRZjGmKWg2YarCcE8sAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text]}}),
    #"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
    #"Aggregated All Data"

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

Awesome that worked, thank you! Something else I should have mentioned, what if I had additional columns I wanted to include, for example the user's department? 

Hi @neilh 

 

Just need to modify the first step, switch to Advanced option and add a second new column Department with Max operation on Department. You can add multiple new columns with appropriate aggregations. 

vjingzhang_0-1653443248476.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSgWxHpVgdTFVOIJ5nXjK6upD8XLymOWFV5YRVlXNGUWYxpmkotjrjUInpPqDKWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t, Department = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text, Department=nullable text]}, {"Department", each List.Max([Department]), type nullable text}}),
    #"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
    #"Aggregated All Data"

 

Jing

This is perfect, thank you so much!

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.