Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Good afternoon community.
In a column I have texts, numbers and symbols. I try to extract only 4 digit numbers and separate them into rows but I can't achieve it
I tried using the formula List.Select(Text.Split(Text.Select([DATES],{"0".."9","-"}),"-"),(x)=>Text.Length ( x)=4) in power query. It extracts numbers from certain digits and I have the option to split it into rows. The result is the same as this image that I attached, it duplicates the value of the DATES column but the value of the "desired extraction" column changes. However, when reviewing the values I realize that some numbers are correct and others are not, and in some cases it fails to extract numbers (knowing that there are 4 digit numbers in the cell) showing it as null. I do not understand why.
I appreciate any help you can provide me. This has me crazy.
Solved! Go to Solution.
Hi @ercc026,
In that case let's use a splitter this one splits text into a list of texts at each whitespace.
List.Select( List.Transform( Splitter.SplitTextByWhitespace()( [DATES] ), each try Number.From( Text.Select(_, {"0".."9"})) otherwise null ), (x)=> Text.Length(Text.From( x)) =4 )
Ps. If this helps solve your query please mark this post as Solution, thanks!
Hi @ercc026
Give something like this a go
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VcwxCsAgFAPQq4Q/Fgf9aq2UQs/gat3qXAqC1+8Xp74MIUtypvsH1gfGsUeLR+Ai9P6OMoYZKeGUuLCx1pqKytRaHfoEwzZ6q1AnyGC/ysEMFgEnqJQP", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [DATES = _t]),
AddExtraction = Table.AddColumn(Source, "Extraction", each List.Select( List.Transform( Text.Split( [DATES], " "), each try Number.From(_) otherwise null ), (x)=> Text.Length(Text.From( x)) =4 )),
ExpandValues = Table.ExpandListColumn(AddExtraction, "Extraction")
in
ExpandValues
with this result
Note you'll have to see if there can be a punctuation at the end of a 4 length number, if so you'll need to deal with that as well. A possibility can be to include Text.Select - as shown here:
List.Select( List.Transform( Text.Split( [DATES], " "), each try Number.From( Text.Select(_, {"0".."9"})) otherwise null ), (x)=> Text.Length(Text.From( x)) =4 )
Ps. If this helps solve your query please mark this post as Solution, thanks!
The second function served me better. But I realized that 4-digit numbers with a space towards another line and followed by a consecutive number, does not extract it. Rows with this type of problem are marked in yellow. Will the function have to be modified to fix this?
@m_dekorte Thank you very much for the help.
Hi @ercc026,
In that case let's use a splitter this one splits text into a list of texts at each whitespace.
List.Select( List.Transform( Splitter.SplitTextByWhitespace()( [DATES] ), each try Number.From( Text.Select(_, {"0".."9"})) otherwise null ), (x)=> Text.Length(Text.From( x)) =4 )
Ps. If this helps solve your query please mark this post as Solution, thanks!
Heya @m_dekorte,
Great solution, and it works for me most of the time. It doesnt work for me where there are leading 0s in my number within the text string.
For example, the string "Lorem ipsum 0124 dolor sit amet 9567" returns only 9567 and not 0124. Could you please assist with modifying your second function to take this scenario into consideration?
Thanks in advance!
Hi @PBI_SG
Losing the "0" is due to the conversion to number, give this a go instead
I hope this is helpful