Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
Hi,
Is there a way to remove duplicate strings from all cells in a column? I have a table with a column something like below.
Resources |
UserEmail SPN AAD UserEmail SPN AAD |
UserEmail |
UserEmail SPN AAD UserEmail SPN AAD |
UserEmail SPN AAD UserEmail SPN AAD |
UserEmail SPN AAD UserEmail SPN AAD |
UserEmail SPN AAD UserEmail SPN AAD |
UserEmail SPN AAD UserEmail SPN AAD |
Thanks
Shane
Solved! Go to Solution.
You can use the Table.TransformColumns function.
See the code below for an example:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLXLNTczMUQgO8FNwdHRRwBBRitVBUofKI1bXYFQXCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Resources = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Resources", type text}}),
#"Remove Duplicate Phrases" = Table.TransformColumns(#"Changed Type",{"Resources", each
Text.Combine(
List.Distinct(
Text.Split(_," ")
), " ")})
in
#"Remove Duplicate Phrases"
or, since you wrote you have multiple tables, you can write it as a separate function, where you call it by specifying the Table and Column to process:
Add as blank query and rename eg: fnDeDupPhrases
(tbl as table, col as text)=>
Table.TransformColumns(tbl,{col, each
Text.Combine(
List.Distinct(
Text.Split(_," ")
), " "), type text})
Then use like this
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLXLNTczMUQgO8FNwdHRRwBBRitVBUofKI1bXYFQXCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Resources = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Resources", type text}}),
#"Remove Duplicate Phrases" = fnDeDupPhrases(#"Changed Type","Resources")
in
#"Remove Duplicate Phrases"
@swalford thanks for sharing, just copy the below code into Advance editor
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLXLNTczMUQgO8FNwdHRRwBBRitVBUofKI1bXYFQXCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Resources = _t]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Combine(List.Distinct(Text.ToList([Resources]))))
in
#"Added Custom"
You can use the Table.TransformColumns function.
See the code below for an example:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLXLNTczMUQgO8FNwdHRRwBBRitVBUofKI1bXYFQXCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Resources = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Resources", type text}}),
#"Remove Duplicate Phrases" = Table.TransformColumns(#"Changed Type",{"Resources", each
Text.Combine(
List.Distinct(
Text.Split(_," ")
), " ")})
in
#"Remove Duplicate Phrases"
or, since you wrote you have multiple tables, you can write it as a separate function, where you call it by specifying the Table and Column to process:
Add as blank query and rename eg: fnDeDupPhrases
(tbl as table, col as text)=>
Table.TransformColumns(tbl,{col, each
Text.Combine(
List.Distinct(
Text.Split(_," ")
), " "), type text})
Then use like this
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLXLNTczMUQgO8FNwdHRRwBBRitVBUofKI1bXYFQXCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Resources = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Resources", type text}}),
#"Remove Duplicate Phrases" = fnDeDupPhrases(#"Changed Type","Resources")
in
#"Remove Duplicate Phrases"
Hi,
I have multiple tables with duplicates, I probably should have created a new thread for this but i was in the process of updating the example table data when you responded.
Hi @swalford
My proposed query steps should work for any table and any string duplicates, you just need to apply the same steps for each table / column that has this kind of issue.
Proud to be a Super User! | |
Either of the below will work
= Table.TransformColumns(Source, {"Resources", each if Text.Contains(_, "Yes Yes") then "Yes" else _})
= Table.ReplaceValue(Source, each [Resources], each if Text.Contains([Resources], "Yes Yes") then "Yes" else [Resources], Replacer.ReplaceValue, {"Resources"})
Hi @swalford
As long as your actual date is as clean as your example (meaning, every word is separated by a single space) You can achieve this by splitting the column by delimiter, unpivoting, removing duplicates, and then grouping back again with concatenation (if needed).
Please see my sample Power Query based on your data (I extended it to show the behaviour when your data is not only "Yes". If the result is incorrect please specify what the intended outcome is for those cases)
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkotzi8tSk4tVorViVaKTC1WgGIYH10cXR5dHBsmxQyCav3ywUw4pYDMUoCogGuAMWMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"(blank)" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"(blank)", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Resources", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type1", "Index", 1, 1, Int64.Type),
#"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Resources"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Reordered Columns", "Resources", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Resources.1", "Resources.2", "Resources.3", "Resources.4", "Resources.5", "Resources.6", "Resources.7", "Resources.8", "Resources.9", "Resources.10"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Resources.1", type text}, {"Resources.2", type text}, {"Resources.3", type text}, {"Resources.4", type text}, {"Resources.5", type text}, {"Resources.6", type text}, {"Resources.7", type text}, {"Resources.8", type text}, {"Resources.9", type text}, {"Resources.10", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type2", {"Index"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
#"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Index"}, {{"Resources", each Text.Combine([Value]," "), type text}})
in
#"Grouped Rows"
The Group By text concatenation trick is from this blog post by gorilla.bi: https://gorilla.bi/power-query/group-by-to-concatenate-text/
Let me know if you have any questions!
Proud to be a Super User! | |
@swalford See below the code for your updated sample data:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkotzi8tSk4tVorViVYKLU4tcs1NzMxRCA7wU3B0dFHAEEFVR56uwaguFgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"(blank)" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"(blank)", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Resources", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type1", "Index", 1, 1, Int64.Type),
#"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Resources"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Reordered Columns", "Resources", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Resources.1", "Resources.2", "Resources.3", "Resources.4", "Resources.5", "Resources.6", "Resources.7", "Resources.8", "Resources.9", "Resources.10"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Resources.1", type text}, {"Resources.2", type text}, {"Resources.3", type text}, {"Resources.4", type text}, {"Resources.5", type text}, {"Resources.6", type text}, {"Resources.7", type text}, {"Resources.8", type text}, {"Resources.9", type text}, {"Resources.10", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type2", {"Index"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
#"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Index"}, {{"Resources", each Text.Combine([Value]," "), type text}})
in
#"Grouped Rows"
Proud to be a Super User! | |