Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
SuperSayan
Resolver I
Resolver I

Manage duplicates with different values

Hello everyone,

 

I'm struggling with a new issue.
I thought it would be simple to deal with but can't figure it out.

 

I have a type of intervention that is a string.

I have a hundred of reference but there is one specificity I have 3 references for very similar intervention and a lot of duplicates.

Below is a sample of the dataset where this reference is in the "service" column.

Basically I have:

- TM04

- TM4A

- TM4E

 

What I want to do, for every couple of "Fin réelle" and "Code implantation" I want to keep a single line of all the above value.

Even more challenging:

- if there is a TM04 and a TM4A I want to keep one row TM4A

- same if I have TM04 and TM4E, I want to keep one row TM4E.

- I can't have TM4A and TM4E at the same time.

- The only instance where I want to keep TM04 is when I have only this one.

 

For info, I'm in the process of removing the possibility to have TM04 in the future so I can just perform "regular" duplicates removal, unfortunately I can't clean the past :(.

 

Intervention data.JPG

1 ACCEPTED SOLUTION
SuperSayan
Resolver I
Resolver I

I actually managed to perform the above using @MarcelBeug proposed code on another topic! (See here)

 

The logic behind that is that I remove all duplicates based on "Service", "Fin réelle" and "Code Implantation" therefore I can only have 0, 1 or 2 Services TM04/TM4A/TM4E by "Code Implantation"/"Fin réelle"

Then I group by "Code Implantation"/"Fin réelle" and count the 3 different types of services and finally keep only the ones I want with a filter.

 

I guess I asked too quickly but maybe this can help someone else (or someone can give me a better/quicker solution to this one).

 

let
    Source = Excel.Workbook(File.Contents("C:\Work\Réorganisation STMC\PowerBI\Gestion doublons.xlsx"), null, true),
    Feuil1_Sheet = Source{[Item="Feuil1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Feuil1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Code intervention", type text}, {"Code implantation", Int64.Type}, {"Service", type text}, {"Clients", type text}, {"Code postal", Int64.Type}, {"Ville", type text}, {"Catégorie (Matériel)", type text}, {"Statut", type text}, {"Raison de statut", type text}, {"Code canton", Int64.Type}, {"Echéance max.", type datetime}, {"Référence matériel", type text}, {"Ressources", type text}, {"Fin réelle", type date}, {"Durée réelle", Int64.Type}, {"Description", type text}, {"Date création intervention", type datetime}, {"Division", type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"Service", "Code Implantation", "Fin réelle"}),
    #"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Code Implantation", "Fin réelle"},
                        {{"New Column", each List.Count(List.Select([Service], each Text.StartsWith(_,"TM4") or Text.StartsWith(_,"TM04") )), Int64.Type},
                        {"AllData", each _, Value.Type(#"Removed Duplicates")}}),
     #"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Service"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded AllData", each [New Column] <> 2 or [Service] <> "TM04")
in
    #"Filtered Rows"

 

 

 

 

 

View solution in original post

1 REPLY 1
SuperSayan
Resolver I
Resolver I

I actually managed to perform the above using @MarcelBeug proposed code on another topic! (See here)

 

The logic behind that is that I remove all duplicates based on "Service", "Fin réelle" and "Code Implantation" therefore I can only have 0, 1 or 2 Services TM04/TM4A/TM4E by "Code Implantation"/"Fin réelle"

Then I group by "Code Implantation"/"Fin réelle" and count the 3 different types of services and finally keep only the ones I want with a filter.

 

I guess I asked too quickly but maybe this can help someone else (or someone can give me a better/quicker solution to this one).

 

let
    Source = Excel.Workbook(File.Contents("C:\Work\Réorganisation STMC\PowerBI\Gestion doublons.xlsx"), null, true),
    Feuil1_Sheet = Source{[Item="Feuil1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Feuil1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Code intervention", type text}, {"Code implantation", Int64.Type}, {"Service", type text}, {"Clients", type text}, {"Code postal", Int64.Type}, {"Ville", type text}, {"Catégorie (Matériel)", type text}, {"Statut", type text}, {"Raison de statut", type text}, {"Code canton", Int64.Type}, {"Echéance max.", type datetime}, {"Référence matériel", type text}, {"Ressources", type text}, {"Fin réelle", type date}, {"Durée réelle", Int64.Type}, {"Description", type text}, {"Date création intervention", type datetime}, {"Division", type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"Service", "Code Implantation", "Fin réelle"}),
    #"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Code Implantation", "Fin réelle"},
                        {{"New Column", each List.Count(List.Select([Service], each Text.StartsWith(_,"TM4") or Text.StartsWith(_,"TM04") )), Int64.Type},
                        {"AllData", each _, Value.Type(#"Removed Duplicates")}}),
     #"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Service"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded AllData", each [New Column] <> 2 or [Service] <> "TM04")
in
    #"Filtered Rows"

 

 

 

 

 

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

AugPowerBI_Carousel

Power BI Monthly Update - August 2024

Check out the August 2024 Power BI update to learn about new features.

August Carousel

Fabric Community Update - August 2024

Find out what's new and trending in the Fabric Community.