cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
marcelmunk
Frequent Visitor

Find duplicated/contained in item and add new column which identifies the duplicated column

Hi, guys!

 

Please help me. I couldn't find out a solution for this.

 

I have a table for the product TAG and different properties:

marcelmunk_0-1653574254879.png

 

I need to find product TAGs that are contained in the another, and then identify which property (column) is differente from the previous one, adding a new column.

 

The result should look like this:

marcelmunk_1-1653574446806.png

 

In this example it's comparing the row with the last one, but ideally it would lookfor duplicates in the whole column to find TAGs which have duplicated parts.

 

Thanks in advance!

 

1 ACCEPTED SOLUTION
Icey
Community Support
Community Support

Hi @marcelmunk ,

 

You can try the following method. It's a bit complicated, and maybe someone else might have a better solution.

Icey_1-1653905142015.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRS0lFKyilNBVLFGYlFBUC6JCMzDySaX5SSWqQUqwNS5YypLMTD09kbVZ2ziytQoDI1Jye/HKQyNz+/JAPdxJzU4mKoamTFwR6OQQHY1cYCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [TAG = _t, Prop1 = _t, Prop2 = _t, Prop3 = _t, Prop4 = _t]),
    #"Reversed Rows" = Table.ReverseRows(Source),
    Custom1 = Table.AddFuzzyClusterColumn(#"Reversed Rows","TAG","New TAG",[IgnoreCase = true, IgnoreSpace = true]),
    #"Reversed Rows1" = Table.ReverseRows(Custom1),
    #"Removed Columns" = Table.RemoveColumns(#"Reversed Rows1",{"TAG"}),
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"New TAG"}, {{"ALL", each _, type table [Prop1=nullable text, Prop2=nullable text, Prop3=nullable text, Prop4=nullable text, New TAG=text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([ALL],"Index",0)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"Prop1", "Prop2", "Prop3", "Prop4", "New TAG", "Index"}, {"Prop1", "Prop2", "Prop3", "Prop4", "New TAG", "Index"}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Custom", {"New TAG", "Index"}, "Attribute", "Value"),
    #"Grouped Rows1" = Table.Group(#"Unpivoted Other Columns", {"New TAG", "Attribute"}, {{"ALL", each _, type table [New TAG=text, Index=number, Attribute=text, Value=text, Custom=any]}}),
    #"Added Custom1" = Table.AddColumn(#"Grouped Rows1", "Custom", each let 
AllDataTable = [ALL],
PreRowValue = 
    Table.AddColumn(
        AllDataTable, "PreValue", 
        each try AllDataTable [Value] { [Index] - 1} otherwise [Value]
    )
in
PreRowValue),
    #"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Custom"}),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Custom", {"New TAG", "Index", "Attribute", "Value", "PreValue"}, {"New TAG", "Index", "Attribute", "Value", "PreValue"}),
    #"Filtered Rows1" = Table.SelectRows(#"Expanded Custom1", each ([Index] = 1)),
    #"Added Conditional Column" = Table.AddColumn(#"Filtered Rows1", "Duplicate", each if [Value] <> [PreValue] then [Attribute]&" - "&[PreValue] else null),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Conditional Column",{"Index", "PreValue"}),
    #"Grouped Rows2" = Table.Group(#"Removed Columns1", {"New TAG"}, {{"ALL", each _, type table [New TAG=text, Attribute=text, Value=text, Duplicate=nullable text]}}),
    #"Added Custom2" = Table.AddColumn(#"Grouped Rows2", "Custom", each let 
AllDataTable = [ALL],
ModifiedDuplicate = 
    Table.AddColumn(
        AllDataTable, "ModifiedDuplicate", 
        each List.RemoveNulls(AllDataTable [Duplicate])
    )
in
ModifiedDuplicate),
    #"Removed Other Columns2" = Table.SelectColumns(#"Added Custom2",{"Custom"}),
    #"Expanded Custom2" = Table.ExpandTableColumn(#"Removed Other Columns2", "Custom", {"New TAG", "Attribute", "Value", "Duplicate", "ModifiedDuplicate"}, {"New TAG", "Attribute", "Value", "Duplicate", "ModifiedDuplicate"}),
    #"Extracted Values" = Table.TransformColumns(#"Expanded Custom2", {"ModifiedDuplicate", each Text.Combine(List.Transform(_, Text.From)), type text}),
    #"Removed Columns2" = Table.RemoveColumns(#"Extracted Values",{"Duplicate"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns2", List.Distinct(#"Removed Columns2"[Attribute]), "Attribute", "Value"),
    #"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"New TAG", "Prop1", "Prop2", "Prop3", "Prop4", "ModifiedDuplicate"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns",{{"New TAG", type text}, {"Prop1", type text}, {"Prop2", type text}, {"Prop3", type text}, {"Prop4", type text}, {"ModifiedDuplicate", type text}})
in
    #"Changed Type"

 

 

Best Regards,

Icey

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

6 REPLIES 6
Icey
Community Support
Community Support

Hi @marcelmunk ,

 

You can try the following method. It's a bit complicated, and maybe someone else might have a better solution.

Icey_1-1653905142015.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRS0lFKyilNBVLFGYlFBUC6JCMzDySaX5SSWqQUqwNS5YypLMTD09kbVZ2ziytQoDI1Jye/HKQyNz+/JAPdxJzU4mKoamTFwR6OQQHY1cYCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [TAG = _t, Prop1 = _t, Prop2 = _t, Prop3 = _t, Prop4 = _t]),
    #"Reversed Rows" = Table.ReverseRows(Source),
    Custom1 = Table.AddFuzzyClusterColumn(#"Reversed Rows","TAG","New TAG",[IgnoreCase = true, IgnoreSpace = true]),
    #"Reversed Rows1" = Table.ReverseRows(Custom1),
    #"Removed Columns" = Table.RemoveColumns(#"Reversed Rows1",{"TAG"}),
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"New TAG"}, {{"ALL", each _, type table [Prop1=nullable text, Prop2=nullable text, Prop3=nullable text, Prop4=nullable text, New TAG=text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([ALL],"Index",0)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"Prop1", "Prop2", "Prop3", "Prop4", "New TAG", "Index"}, {"Prop1", "Prop2", "Prop3", "Prop4", "New TAG", "Index"}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Custom", {"New TAG", "Index"}, "Attribute", "Value"),
    #"Grouped Rows1" = Table.Group(#"Unpivoted Other Columns", {"New TAG", "Attribute"}, {{"ALL", each _, type table [New TAG=text, Index=number, Attribute=text, Value=text, Custom=any]}}),
    #"Added Custom1" = Table.AddColumn(#"Grouped Rows1", "Custom", each let 
AllDataTable = [ALL],
PreRowValue = 
    Table.AddColumn(
        AllDataTable, "PreValue", 
        each try AllDataTable [Value] { [Index] - 1} otherwise [Value]
    )
in
PreRowValue),
    #"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Custom"}),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Custom", {"New TAG", "Index", "Attribute", "Value", "PreValue"}, {"New TAG", "Index", "Attribute", "Value", "PreValue"}),
    #"Filtered Rows1" = Table.SelectRows(#"Expanded Custom1", each ([Index] = 1)),
    #"Added Conditional Column" = Table.AddColumn(#"Filtered Rows1", "Duplicate", each if [Value] <> [PreValue] then [Attribute]&" - "&[PreValue] else null),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Conditional Column",{"Index", "PreValue"}),
    #"Grouped Rows2" = Table.Group(#"Removed Columns1", {"New TAG"}, {{"ALL", each _, type table [New TAG=text, Attribute=text, Value=text, Duplicate=nullable text]}}),
    #"Added Custom2" = Table.AddColumn(#"Grouped Rows2", "Custom", each let 
AllDataTable = [ALL],
ModifiedDuplicate = 
    Table.AddColumn(
        AllDataTable, "ModifiedDuplicate", 
        each List.RemoveNulls(AllDataTable [Duplicate])
    )
in
ModifiedDuplicate),
    #"Removed Other Columns2" = Table.SelectColumns(#"Added Custom2",{"Custom"}),
    #"Expanded Custom2" = Table.ExpandTableColumn(#"Removed Other Columns2", "Custom", {"New TAG", "Attribute", "Value", "Duplicate", "ModifiedDuplicate"}, {"New TAG", "Attribute", "Value", "Duplicate", "ModifiedDuplicate"}),
    #"Extracted Values" = Table.TransformColumns(#"Expanded Custom2", {"ModifiedDuplicate", each Text.Combine(List.Transform(_, Text.From)), type text}),
    #"Removed Columns2" = Table.RemoveColumns(#"Extracted Values",{"Duplicate"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns2", List.Distinct(#"Removed Columns2"[Attribute]), "Attribute", "Value"),
    #"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"New TAG", "Prop1", "Prop2", "Prop3", "Prop4", "ModifiedDuplicate"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns",{{"New TAG", type text}, {"Prop1", type text}, {"Prop2", type text}, {"Prop3", type text}, {"Prop4", type text}, {"ModifiedDuplicate", type text}})
in
    #"Changed Type"

 

 

Best Regards,

Icey

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks for your answer, @Icey 

 

As you said, a bit complex and it took me some time to 'digest' it.

 

Though, I plugged in some real data and got some good and some weird results:

marcelmunk_0-1654025126675.png

marcelmunk_1-1654028986479.png

 

These are prints of the 'New TAG' reorderd after the AddFuzzyClusterColumn.

 

Many times it works in finding the duplicates, but other times it gives weird values, as in the second print.

 

Any hint of what may be wrong?

 

TIA!

Hi @marcelmunk ,

 

There's a paramter in Table.AddFuzzyClusterColumn function named "Threshold". It's a number between 0.00 and 1.00 that specifies the similarity score at which two values will be grouped. For example, "Grapes" and "Graes" (missing "p") are grouped together only if this option is set to less than 0.90. A threshold of 1.00 is the same as specifying an exact match criteria while grouping. The default value is 0.80.

 

Please try to adjust it manually and check if it works.

 

 

Best Regards,

Icey

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi, @Icey 

 

Late response, but as I was getting the wrong results I supposed it was not a solution.

 

But it was my fault, as I had to adapt the column names and some parameters to my real data. And I made a mistake in the 'New TAG' column, so I was not getting the desired results.

 

With a little more time now, I 'readapted' the solution and it's working just fine. I marked your answer as 'solution'.

 

Thanks!

marcelmunk
Frequent Visitor

Just to make it more clear, 'AB' is contained in 'ABC', and 'CD' is contained in 'CDE'. It can be both ways.

marcelmunk
Frequent Visitor

Also I have more than 4 properties, but I can extend the expression to accomodate it.

Helpful resources

Announcements
PBI Sept Update Carousel

Power BI September 2023 Update

Take a look at the September 2023 Power BI update to learn more.

Learn Live

Learn Live: Event Series

Join Microsoft Reactor and learn from developers.

Dashboard in a day with date

Exclusive opportunity for Women!

Join us for a free, hands-on Microsoft workshop led by women trainers for women where you will learn how to build a Dashboard in a Day!

MPPC 2023 PBI Carousel

Power Platform Conference-Power BI and Fabric Sessions

Join us Oct 1 - 6 in Las Vegas for the Microsoft Power Platform Conference.

Top Solution Authors