Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
SuperSayan
Resolver I
Resolver I

Manage duplicates with different values

Hello everyone,

 

I'm struggling with a new issue.
I thought it would be simple to deal with but can't figure it out.

 

I have a type of intervention that is a string.

I have a hundred of reference but there is one specificity I have 3 references for very similar intervention and a lot of duplicates.

Below is a sample of the dataset where this reference is in the "service" column.

Basically I have:

- TM04

- TM4A

- TM4E

 

What I want to do, for every couple of "Fin réelle" and "Code implantation" I want to keep a single line of all the above value.

Even more challenging:

- if there is a TM04 and a TM4A I want to keep one row TM4A

- same if I have TM04 and TM4E, I want to keep one row TM4E.

- I can't have TM4A and TM4E at the same time.

- The only instance where I want to keep TM04 is when I have only this one.

 

For info, I'm in the process of removing the possibility to have TM04 in the future so I can just perform "regular" duplicates removal, unfortunately I can't clean the past :(.

 

Intervention data.JPG

1 ACCEPTED SOLUTION
SuperSayan
Resolver I
Resolver I

I actually managed to perform the above using @MarcelBeug proposed code on another topic! (See here)

 

The logic behind that is that I remove all duplicates based on "Service", "Fin réelle" and "Code Implantation" therefore I can only have 0, 1 or 2 Services TM04/TM4A/TM4E by "Code Implantation"/"Fin réelle"

Then I group by "Code Implantation"/"Fin réelle" and count the 3 different types of services and finally keep only the ones I want with a filter.

 

I guess I asked too quickly but maybe this can help someone else (or someone can give me a better/quicker solution to this one).

 

let
    Source = Excel.Workbook(File.Contents("C:\Work\Réorganisation STMC\PowerBI\Gestion doublons.xlsx"), null, true),
    Feuil1_Sheet = Source{[Item="Feuil1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Feuil1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Code intervention", type text}, {"Code implantation", Int64.Type}, {"Service", type text}, {"Clients", type text}, {"Code postal", Int64.Type}, {"Ville", type text}, {"Catégorie (Matériel)", type text}, {"Statut", type text}, {"Raison de statut", type text}, {"Code canton", Int64.Type}, {"Echéance max.", type datetime}, {"Référence matériel", type text}, {"Ressources", type text}, {"Fin réelle", type date}, {"Durée réelle", Int64.Type}, {"Description", type text}, {"Date création intervention", type datetime}, {"Division", type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"Service", "Code Implantation", "Fin réelle"}),
    #"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Code Implantation", "Fin réelle"},
                        {{"New Column", each List.Count(List.Select([Service], each Text.StartsWith(_,"TM4") or Text.StartsWith(_,"TM04") )), Int64.Type},
                        {"AllData", each _, Value.Type(#"Removed Duplicates")}}),
     #"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Service"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded AllData", each [New Column] <> 2 or [Service] <> "TM04")
in
    #"Filtered Rows"

 

 

 

 

 

View solution in original post

1 REPLY 1
SuperSayan
Resolver I
Resolver I

I actually managed to perform the above using @MarcelBeug proposed code on another topic! (See here)

 

The logic behind that is that I remove all duplicates based on "Service", "Fin réelle" and "Code Implantation" therefore I can only have 0, 1 or 2 Services TM04/TM4A/TM4E by "Code Implantation"/"Fin réelle"

Then I group by "Code Implantation"/"Fin réelle" and count the 3 different types of services and finally keep only the ones I want with a filter.

 

I guess I asked too quickly but maybe this can help someone else (or someone can give me a better/quicker solution to this one).

 

let
    Source = Excel.Workbook(File.Contents("C:\Work\Réorganisation STMC\PowerBI\Gestion doublons.xlsx"), null, true),
    Feuil1_Sheet = Source{[Item="Feuil1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Feuil1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Code intervention", type text}, {"Code implantation", Int64.Type}, {"Service", type text}, {"Clients", type text}, {"Code postal", Int64.Type}, {"Ville", type text}, {"Catégorie (Matériel)", type text}, {"Statut", type text}, {"Raison de statut", type text}, {"Code canton", Int64.Type}, {"Echéance max.", type datetime}, {"Référence matériel", type text}, {"Ressources", type text}, {"Fin réelle", type date}, {"Durée réelle", Int64.Type}, {"Description", type text}, {"Date création intervention", type datetime}, {"Division", type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"Service", "Code Implantation", "Fin réelle"}),
    #"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Code Implantation", "Fin réelle"},
                        {{"New Column", each List.Count(List.Select([Service], each Text.StartsWith(_,"TM4") or Text.StartsWith(_,"TM04") )), Int64.Type},
                        {"AllData", each _, Value.Type(#"Removed Duplicates")}}),
     #"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Service"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded AllData", each [New Column] <> 2 or [Service] <> "TM04")
in
    #"Filtered Rows"

 

 

 

 

 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.