Solved: Re: Remove duplicates that contains null in one pa...

Analyst_in_prog · ‎07-29-2024

I am trying to remove duplicates on the basis of "report id ", but some records "priority level" is not null i want to keep such records in which priority level is not null , but if both duplicate records contains null then keep any one of them.
for example the report id 894 is having 2 duplicated but only one with priority level is required .
how can i do this ..please help me out

Anonymous · ‎07-29-2024

Hi @Analyst_in_prog ,

Here is my sample data:

Here is the whole M code in the Advanced Editor:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUfLJLweSFhYGQLJAKVYHIoom5ISk0BBFFE3IGUmhEYoompALTMgYRQimFyHqClNoAiRL8Yt5ZKZnoIu7IZlqhiqMLuYOFbM0ISwGswpZ3AOm1hS/mCdMzAxVDG4mmjjU+XDhWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [comments = _t, #"priority level" = _t, reportID = _t, eventType = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"comments", type text}, {"priority level", type text}, {"reportID", Int64.Type}, {"eventType", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type","",null,Replacer.ReplaceValue,{"priority level"}),
    #"Grouped Rows" = Table.Group(#"Replaced Value", {"reportID"}, {{"AllNulls", each List.NonNullCount([priority level]) = 0, type logical}}),
    #"Merged" = Table.NestedJoin(#"Replaced Value", {"reportID"}, #"Grouped Rows", {"reportID"}, "GroupedData", JoinKind.LeftOuter),
    #"Expanded GroupedData" = Table.ExpandTableColumn(Merged, "GroupedData", {"AllNulls"}, {"AllNulls"}),
    #"Removed Duplicates" = Table.Distinct(#"Expanded GroupedData", {"reportID", "priority level"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Duplicates", each not ([priority level] = null and [AllNulls] = false))
in
    #"Filtered Rows"

And the final output is as below:

My overall idea is to first add a column that marks the reportIDs where the priority level are both null as TRUE, then delete duplicate rows for both the reportID and priority level columns, and finally delete the rows where the priority level=null and marked as FALSE.

Best Regards,
Dino Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Analyst_in_prog · ‎07-29-2024

hii @Anonymous ,

thank you so much for replying , to be honest i didn't worked with Advance editor that much, and some part of your M code i didn't understand very well but i tried it and it worked .!!! 😄🤝
I Request you , if you have done this using inbuilt functions can you please tell me step by step ? or if you directly done it using M code can you please guide me where i can learn how to work with M code ?

One more thing i would be happy to connect with you on linkedin. please share your ID , otherwise sent a request to mine .
https://www.linkedin.com/in/lalit--soni/

Thanks and Regards,

Lalit soni

Anonymous · ‎07-29-2024

Hi @Analyst_in_prog ,

Here is my sample data:

Here is the whole M code in the Advanced Editor:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUfLJLweSFhYGQLJAKVYHIoom5ISk0BBFFE3IGUmhEYoompALTMgYRQimFyHqClNoAiRL8Yt5ZKZnoIu7IZlqhiqMLuYOFbM0ISwGswpZ3AOm1hS/mCdMzAxVDG4mmjjU+XDhWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [comments = _t, #"priority level" = _t, reportID = _t, eventType = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"comments", type text}, {"priority level", type text}, {"reportID", Int64.Type}, {"eventType", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type","",null,Replacer.ReplaceValue,{"priority level"}),
    #"Grouped Rows" = Table.Group(#"Replaced Value", {"reportID"}, {{"AllNulls", each List.NonNullCount([priority level]) = 0, type logical}}),
    #"Merged" = Table.NestedJoin(#"Replaced Value", {"reportID"}, #"Grouped Rows", {"reportID"}, "GroupedData", JoinKind.LeftOuter),
    #"Expanded GroupedData" = Table.ExpandTableColumn(Merged, "GroupedData", {"AllNulls"}, {"AllNulls"}),
    #"Removed Duplicates" = Table.Distinct(#"Expanded GroupedData", {"reportID", "priority level"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Duplicates", each not ([priority level] = null and [AllNulls] = false))
in
    #"Filtered Rows"

And the final output is as below:

My overall idea is to first add a column that marks the reportIDs where the priority level are both null as TRUE, then delete duplicate rows for both the reportID and priority level columns, and finally delete the rows where the priority level=null and marked as FALSE.

Best Regards,
Dino Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.