Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI DataViz World Championships are on! With four chances to enter, you could win a spot in the LIVE Grand Finale in Las Vegas. Show off your skills.

Reply
SuperSayan
Resolver I
Resolver I

Manage duplicates with different values

Hello everyone,

 

I'm struggling with a new issue.
I thought it would be simple to deal with but can't figure it out.

 

I have a type of intervention that is a string.

I have a hundred of reference but there is one specificity I have 3 references for very similar intervention and a lot of duplicates.

Below is a sample of the dataset where this reference is in the "service" column.

Basically I have:

- TM04

- TM4A

- TM4E

 

What I want to do, for every couple of "Fin réelle" and "Code implantation" I want to keep a single line of all the above value.

Even more challenging:

- if there is a TM04 and a TM4A I want to keep one row TM4A

- same if I have TM04 and TM4E, I want to keep one row TM4E.

- I can't have TM4A and TM4E at the same time.

- The only instance where I want to keep TM04 is when I have only this one.

 

For info, I'm in the process of removing the possibility to have TM04 in the future so I can just perform "regular" duplicates removal, unfortunately I can't clean the past :(.

 

Intervention data.JPG

1 ACCEPTED SOLUTION
SuperSayan
Resolver I
Resolver I

I actually managed to perform the above using @MarcelBeug proposed code on another topic! (See here)

 

The logic behind that is that I remove all duplicates based on "Service", "Fin réelle" and "Code Implantation" therefore I can only have 0, 1 or 2 Services TM04/TM4A/TM4E by "Code Implantation"/"Fin réelle"

Then I group by "Code Implantation"/"Fin réelle" and count the 3 different types of services and finally keep only the ones I want with a filter.

 

I guess I asked too quickly but maybe this can help someone else (or someone can give me a better/quicker solution to this one).

 

let
    Source = Excel.Workbook(File.Contents("C:\Work\Réorganisation STMC\PowerBI\Gestion doublons.xlsx"), null, true),
    Feuil1_Sheet = Source{[Item="Feuil1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Feuil1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Code intervention", type text}, {"Code implantation", Int64.Type}, {"Service", type text}, {"Clients", type text}, {"Code postal", Int64.Type}, {"Ville", type text}, {"Catégorie (Matériel)", type text}, {"Statut", type text}, {"Raison de statut", type text}, {"Code canton", Int64.Type}, {"Echéance max.", type datetime}, {"Référence matériel", type text}, {"Ressources", type text}, {"Fin réelle", type date}, {"Durée réelle", Int64.Type}, {"Description", type text}, {"Date création intervention", type datetime}, {"Division", type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"Service", "Code Implantation", "Fin réelle"}),
    #"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Code Implantation", "Fin réelle"},
                        {{"New Column", each List.Count(List.Select([Service], each Text.StartsWith(_,"TM4") or Text.StartsWith(_,"TM04") )), Int64.Type},
                        {"AllData", each _, Value.Type(#"Removed Duplicates")}}),
     #"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Service"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded AllData", each [New Column] <> 2 or [Service] <> "TM04")
in
    #"Filtered Rows"

 

 

 

 

 

View solution in original post

1 REPLY 1
SuperSayan
Resolver I
Resolver I

I actually managed to perform the above using @MarcelBeug proposed code on another topic! (See here)

 

The logic behind that is that I remove all duplicates based on "Service", "Fin réelle" and "Code Implantation" therefore I can only have 0, 1 or 2 Services TM04/TM4A/TM4E by "Code Implantation"/"Fin réelle"

Then I group by "Code Implantation"/"Fin réelle" and count the 3 different types of services and finally keep only the ones I want with a filter.

 

I guess I asked too quickly but maybe this can help someone else (or someone can give me a better/quicker solution to this one).

 

let
    Source = Excel.Workbook(File.Contents("C:\Work\Réorganisation STMC\PowerBI\Gestion doublons.xlsx"), null, true),
    Feuil1_Sheet = Source{[Item="Feuil1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Feuil1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Code intervention", type text}, {"Code implantation", Int64.Type}, {"Service", type text}, {"Clients", type text}, {"Code postal", Int64.Type}, {"Ville", type text}, {"Catégorie (Matériel)", type text}, {"Statut", type text}, {"Raison de statut", type text}, {"Code canton", Int64.Type}, {"Echéance max.", type datetime}, {"Référence matériel", type text}, {"Ressources", type text}, {"Fin réelle", type date}, {"Durée réelle", Int64.Type}, {"Description", type text}, {"Date création intervention", type datetime}, {"Division", type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"Service", "Code Implantation", "Fin réelle"}),
    #"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Code Implantation", "Fin réelle"},
                        {{"New Column", each List.Count(List.Select([Service], each Text.StartsWith(_,"TM4") or Text.StartsWith(_,"TM04") )), Int64.Type},
                        {"AllData", each _, Value.Type(#"Removed Duplicates")}}),
     #"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Service"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded AllData", each [New Column] <> 2 or [Service] <> "TM04")
in
    #"Filtered Rows"

 

 

 

 

 

Helpful resources

Announcements
Feb2025 Sticker Challenge

Join our Community Sticker Challenge 2025

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.