The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hello Everyone,
I have a large excel sheet of users who were assigned training and I want to generate completion statistics for this in Power BI. Two courses were assigned, with a row for each course showing whether it was completed or not. For example:
Bob | Course A | Complete |
Bob | Course B | Incomplete |
Tom | Course A | Complete |
Tom | Course B | Complete |
How can I remove duplicates in Power Query so that if a user only completed one course, it keeps the one that wasn't completed? Currently when I remove duplicates based on the username, it keeps the rows marked Complete no matter how it's sorted. When doing this in Excel, I would put all the incomplete rows at the top, and removing duplicates would always keep the first entry. This doesn't seem to be the case in Power BI. I am new to this so forgive me if this is trivial. Any help would be great, thank you!
Solved! Go to Solution.
Hi @neilh
You can get the expected result with two steps easily. First group by person name column and perform All Rows operation on the new column named as "All Data".
Then add fx to create a custom step with the following code.
= Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
My sample data is as below. "Status" is the column which marks "Complete" or "Incomplete" for every row.
Here is the full code. You can paste it into a blank query's Advanced Editor to check every step.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSlWJ1MFU4gXieecnIakLycwmYgq7CCUOFc0ZRZjGmKWg2YarCcE8sAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text]}}),
#"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
#"Aggregated All Data"
Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.
Hi @neilh
Just need to modify the first step, switch to Advanced option and add a second new column Department with Max operation on Department. You can add multiple new columns with appropriate aggregations.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSgWxHpVgdTFVOIJ5nXjK6upD8XLymOWFV5YRVlXNGUWYxpmkotjrjUInpPqDKWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t, Department = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text, Department=nullable text]}, {"Department", each List.Max([Department]), type nullable text}}),
#"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
#"Aggregated All Data"
Jing
First apply filter on last colum to keep only records which were not completed.
I think just applying the filter will solve your issue as there are only 2 records per user. If needed, select first column and remove duplicates. If duplicates are in 2nd column also for a user, then you will need to select first 2 columns and then remove duplictes.
Thank you for your reply! That definitely works for seeing only the users who didn't complete it. The thing is, I want to see users who did complete the training as well. If I apply a filter showing only records which were completed, it will include users who did not complete one of the two trainings.
I need to see the expected output for the given table before I can suggest the solution.
I basically want to take a bunch of data that looks like this:
Bob | Training A | Complete |
Bob | Training B | Incomplete |
Tom | Training A | Complete |
Tom | Training B | Complete |
Chris | Training A | Incomplete |
Chris | Training B | Incomplete |
And turn it into this:
Bob | Incomplete |
Tom | Complete |
Chris | Incomplete |
Thank you!
Hi @neilh
You can get the expected result with two steps easily. First group by person name column and perform All Rows operation on the new column named as "All Data".
Then add fx to create a custom step with the following code.
= Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
My sample data is as below. "Status" is the column which marks "Complete" or "Incomplete" for every row.
Here is the full code. You can paste it into a blank query's Advanced Editor to check every step.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSlWJ1MFU4gXieecnIakLycwmYgq7CCUOFc0ZRZjGmKWg2YarCcE8sAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text]}}),
#"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
#"Aggregated All Data"
Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.
Awesome that worked, thank you! Something else I should have mentioned, what if I had additional columns I wanted to include, for example the user's department?
Hi @neilh
Just need to modify the first step, switch to Advanced option and add a second new column Department with Max operation on Department. You can add multiple new columns with appropriate aggregations.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSgWxHpVgdTFVOIJ5nXjK6upD8XLymOWFV5YRVlXNGUWYxpmkotjrjUInpPqDKWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t, Department = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text, Department=nullable text]}, {"Department", each List.Max([Department]), type nullable text}}),
#"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
#"Aggregated All Data"
Jing
This is perfect, thank you so much!
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.