March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hello.
When using Power Query to cleanse data from a table, before normalizing it, I wonder how to do the following please:
(a) How is it possible in Power Query for a given column, to perform an action if a cell starts with a number, and a different action if a cell starts with a character? For example when adding a custom column to label the values as "number" or "text".
(b) Another question please: how is it possible to split a column based on a custom separator like the following?: "(Comma)(space)(integer)(dot)(space)"?
example:
1. Option A, 2. Option B, 71. Option C
would be split into:
Column1: 1. Option A
Column2: Option B
Column3: Option C
Thank you very much, kind regards
Solved! Go to Solution.
Hi @himynameisjuan, answer to question no. 1
Result
let
Source = #table({"Data"}, {{1}, {"A"}, {25}, {"B"}, {"10"}}),
v1 = Table.AddColumn(Source, "v1", each if [Data] is number then "Number" else "Text", type text),
v2 = Table.AddColumn(v1, "v2", each if (try Number.From([Data]) otherwise [Data]) is number then "Number" else "Text", type text)
in
v2
I have modified my code to
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtRT8C8oyczPU3DUUTCCc5x0FMwRUs5KsTrRSqYIgWQdBUMTOC8CyLOA86LAiokzF6jRwADOddFRAPIRil2VYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//Use regex to split
#"Run Python script" = Python.Execute("dataset['Split'] = dataset['Column1'].str.split(r',\s+\d+\.\s+')#(lf)",[dataset=#"Changed Type"]),
//remove unneeded columns and expand the table
#"Removed Columns" = Table.RemoveColumns(#"Run Python script",{"Name"}),
#"Expanded Value" = Table.ExpandTableColumn(#"Removed Columns", "Value", {"Split"}),
//Replace single quote with double quote to create a valid Json
#"Replaced Value" = Table.ReplaceValue(#"Expanded Value","'","""",Replacer.ReplaceText,{"Split"}),
//Convert the Python array to an M List
#"To List" = Table.TransformColumns(#"Replaced Value",{"Split", Json.Document}),
//Max number of Columns
numCols = List.Max(List.Transform(#"To List"[Split], each List.Count(_))),
#"Extracted Values" = Table.TransformColumns(#"To List", {"Split", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "Split",
Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), List.Transform({1..numCols}, each "Split." & Text.From(_)))
in
#"Split Column by Delimiter"
Source Data
Results
I have modified my code to
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtRT8C8oyczPU3DUUTCCc5x0FMwRUs5KsTrRSqYIgWQdBUMTOC8CyLOA86LAiokzF6jRwADOddFRAPIRil2VYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//Use regex to split
#"Run Python script" = Python.Execute("dataset['Split'] = dataset['Column1'].str.split(r',\s+\d+\.\s+')#(lf)",[dataset=#"Changed Type"]),
//remove unneeded columns and expand the table
#"Removed Columns" = Table.RemoveColumns(#"Run Python script",{"Name"}),
#"Expanded Value" = Table.ExpandTableColumn(#"Removed Columns", "Value", {"Split"}),
//Replace single quote with double quote to create a valid Json
#"Replaced Value" = Table.ReplaceValue(#"Expanded Value","'","""",Replacer.ReplaceText,{"Split"}),
//Convert the Python array to an M List
#"To List" = Table.TransformColumns(#"Replaced Value",{"Split", Json.Document}),
//Max number of Columns
numCols = List.Max(List.Transform(#"To List"[Split], each List.Count(_))),
#"Extracted Values" = Table.TransformColumns(#"To List", {"Split", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "Split",
Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), List.Transform({1..numCols}, each "Split." & Text.From(_)))
in
#"Split Column by Delimiter"
Source Data
Results
For splitting on your complex delimiter, you can use Regular Expressions.
I've not used Python much in Power Query, so there may be more efficient methods, but this does work on your data:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtRT8C8oyczPU3DUUTCCc5x0FMwRUs5KsTrRSqYIgWQdBUMTOC8CyLOA86KUYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//Use regex to split
#"Run Python script" = Python.Execute("dataset['Split'] = dataset['Column1'].str.split(r',\s\d+\.\s*')#(lf)",[dataset=#"Changed Type"]),
//remove unneeded columns and expand the table
#"Removed Columns" = Table.RemoveColumns(#"Run Python script",{"Name"}),
#"Expanded Value" = Table.ExpandTableColumn(#"Removed Columns", "Value", {"Split"}),
//Replace single quote with double quote to create a valid Json
#"Replaced Value" = Table.ReplaceValue(#"Expanded Value","'","""",Replacer.ReplaceText,{"Split"}),
//Convert the Python array to an M List
#"To List" = Table.TransformColumns(#"Replaced Value",{"Split", Json.Document}),
//Expand the list to new rows (could expand to columns if preferred
#"Expanded Split" = Table.ExpandListColumn(#"To List", "Split"),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded Split",{{"Split", type text}})
in
#"Changed Type1"
Source
Results
Hello, thank you very much @dufoq3 @jgeddes for the great responses.
For (2) It would be nice not to trust in the number of spaces (one or two spaces) or the number of integers (1 or 11 or 111). As you pointed out, the ideal approach would be one that dynamically adapts regardless of the number of spaces or numbers.
Doesn't Power Query have a means to find and separate into columns numbers from text and vice versa?
Maybe running a Python or R script for (2) would more dynamic?
Thank you very much, kind regards
Hi @himynameisjuan, you're welcome.
It is possible to do such split also in PQ, but it is useless for sample data you provided.
Column1 splitted:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSkxKNjQyTklNU4rViVZydHIG8lxc3ZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
Ad_Separated = Table.AddColumn(Source, "Separated", each
[ a = Splitter.SplitTextByCharacterTransition((x)=> not List.Contains({"0".."9"}, x), {"0".."9"})([Column1]),
b = List.Combine(List.Transform(a, Splitter.SplitTextByCharacterTransition({"0".."9"}, (x)=> not List.Contains({"0".."9"}, x)))),
c = Text.Combine(b, "|")
][c], type text),
Ad_SplitCount = Table.AddColumn(Ad_Separated, "SplitCount", each Text.Length(Text.Select([Separated], "|")) +1, Int64.Type),
SplitColumnByDelimiter = Table.SplitColumn(Ad_SplitCount, "Separated", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"1"..Text.From(List.Max(Ad_SplitCount[SplitCount]))} ),
RemovedColumns = Table.RemoveColumns(SplitColumnByDelimiter,{"SplitCount"})
in
RemovedColumns
For a) you can use Table.TransformColumns and check if the character in the first position is a letter or number. The code below is an example.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTwLyjJzM9TcFSK1YlWcoJxjcBcZxg3GcxVhHFNlGJjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
changeDataType = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
Custom1 = Table.TransformColumns(changeDataType, {{"Column1", each if Text.PositionOfAny(_, {"0".."9"}) = 0 then "Starts With A Number" else if (Text.PositionOfAny(_, {"A".."Z", "a".."z"})) = 0 then "Starts With A Letter" else "What does it start with?"}})
in
Custom1
For b) you can split the column into rows by comma delimiter and then extract the text after the second space delimiter for rows that start with a space. The code below is and example.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtRT8C8oyczPU3DUUTCCc5x0FMwRUs5KsbEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Extracted Text After Delimiter" = Table.TransformColumns(#"Split Column by Delimiter", {{"Column1", each if Text.StartsWith(_, " ") then Text.AfterDelimiter(_, " ", {1, RelativePosition.FromStart}) else _, type text}})
in
#"Extracted Text After Delimiter"
Proud to be a Super User! | |
@himynameisjuan, @jgeddes found great solution for 2nd question, but it is not easy to find a pattern. It will be even harder if you add 2 more value i.e.:
1. Option A, 2. Option B, 71. Option C, 100. Option D, 102. Option E
and you would like to have outcome like this:
Column1: 1. Option A
Column2: Option B
Column3: Option C
Column4: 100. Option D
Column5: Option E
What I'm trying to tell is that to do this there has to be such pattern. If there is a pattern, than there are usually many ways.
Hi @himynameisjuan, answer to question no. 1
Result
let
Source = #table({"Data"}, {{1}, {"A"}, {25}, {"B"}, {"10"}}),
v1 = Table.AddColumn(Source, "v1", each if [Data] is number then "Number" else "Text", type text),
v2 = Table.AddColumn(v1, "v2", each if (try Number.From([Data]) otherwise [Data]) is number then "Number" else "Text", type text)
in
v2
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.