Solved: How to extract 5/6 length ID numbers from a text s...

Anonymous · ‎01-26-2019

Hi,

I'm searching for a best-practice method to solve the following problem. I'm trying to extract a 5/6 length ID number from the following Page column.

I want to extract the 5 & 6 length ID numbers from the page column for each unique URL and return the result in the VacancyId column.

Is there a best-practice method for this?

Zubair_Muhammad · ‎01-26-2019

@Anonymous

Try this..

I think it will be close.
Please see the attached file as well

Tricks I learnt from @ImkeF.
But I believe she still would have a better solution

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W0i9LTE4sKS1K1Tc0MDc0NdXNTU1JLS/KTi2yjyk1MDBONjI0NiwCM1ON9YxLS42MTdLS4u2VYnWQNcN1QU0hTzNCmxFMm76hhaGJqbE+adrhtupCtOuSq93YxAhFL8R7SrGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Page = _t]),
    ChangedType = Table.TransformColumnTypes(Source,{{"Page", type text}}),
    CharactersToReplace = List.RemoveItems(List.Transform({1..126}, each Character.FromNumber(_)),{"0".."9"}),
    CharactersToReplaceWith=List.Repeat({"|"},List.Count(CharactersToReplace)),
    MyList=List.Zip({CharactersToReplace,CharactersToReplaceWith}),
    Convert=Table.AddColumn(ChangedType, "Custom1", each Text.Combine(
                            List.ReplaceMatchingItems
                            (Text.ToList([Page]),
                            MyList)
                            )),
    Convert2=Table.AddColumn(Convert, "Custom2", each Text.Combine(
                            List.Select(
                            List.RemoveItems(
                            Text.Split([Custom1],"|")
                            ,
                            {null,""})
                            ,
                            each Text.Length(_)>=5)
                            ,
                            ","
                            ))
in
    Convert2

Regards
Zubair

Please try my custom visuals

View solution in original post

ImkeF · ‎01-27-2019

That's a pretty nifty solution @Zubair_Muhammad!

Using Text.SplitAny, the code can be shortened considerably:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W0i9LTE4sKS1K1Tc0MDc0NdXNTU1JLS/KTi2yjyk1MDBONjI0NiwCM1ON9YxLS42MTdLS4u2VYnWQNcN1QU0hTzNCmxFMm76hhaGJqbE+adrhtupCtOuSq93YxAhFL8R7SrGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Page = _t]),
    ChangedType = Table.TransformColumnTypes(Source,{{"Page", type text}}),
    CharactersToReplace = Text.Combine(List.RemoveItems(List.Transform({1..126}, each Character.FromNumber(_)),{"0".."9"})),
    Convert=Table.AddColumn(ChangedType, "Custom1", each Text.SplitAny(
                            [Page],
                            CharactersToReplace)
                            ),
    Convert2=Table.AddColumn(Convert, "Custom2", each List.Select(
			[Custom1], 
			(li) => Text.Length(li) >=5){0}
			)
in
    Convert2

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

Greg_Deckler · ‎01-26-2019

I don't see any real pattern to where those numbers appear.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎01-26-2019

Hi Greg,

Thnx for your reply. Correct, the pattern is very diffent in the Url's.

I'm thinking of a process like this;

Var VacancyId =

//1 List of possible delimiters
del = {" ".."/",":".."@","[".."'","{".."~", "#(cr)", "#(lf)"},

//2 List of possible delimiters
delreplacewith = List.Transform(del, each {_, " "}),

//3 Match each delimiter with blank space
delreplacewith = List.Transform(del, each {_, " "}),

//4 Replace each delimiter with blank space
delreplaced = List.ReplaceMatchingItems(Text.ToList(text), delreplacewith),

//5 split the text by blank space
tolist = Text.Split(backtotext, " "),

//6 Find the first value in the string

//7 If this value is of length 5, then select this value

//8 If this value <> length 5, then find the second value

//9 If this value is of length 5, then select this value

//10 Etc.

Return
Var VacancyId

Am I thinking in the right way? Are there better solutions?

Zubair_Muhammad · ‎01-26-2019

@Anonymous

Try this..

I think it will be close.
Please see the attached file as well

Tricks I learnt from @ImkeF.
But I believe she still would have a better solution

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W0i9LTE4sKS1K1Tc0MDc0NdXNTU1JLS/KTi2yjyk1MDBONjI0NiwCM1ON9YxLS42MTdLS4u2VYnWQNcN1QU0hTzNCmxFMm76hhaGJqbE+adrhtupCtOuSq93YxAhFL8R7SrGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Page = _t]),
    ChangedType = Table.TransformColumnTypes(Source,{{"Page", type text}}),
    CharactersToReplace = List.RemoveItems(List.Transform({1..126}, each Character.FromNumber(_)),{"0".."9"}),
    CharactersToReplaceWith=List.Repeat({"|"},List.Count(CharactersToReplace)),
    MyList=List.Zip({CharactersToReplace,CharactersToReplaceWith}),
    Convert=Table.AddColumn(ChangedType, "Custom1", each Text.Combine(
                            List.ReplaceMatchingItems
                            (Text.ToList([Page]),
                            MyList)
                            )),
    Convert2=Table.AddColumn(Convert, "Custom2", each Text.Combine(
                            List.Select(
                            List.RemoveItems(
                            Text.Split([Custom1],"|")
                            ,
                            {null,""})
                            ,
                            each Text.Length(_)>=5)
                            ,
                            ","
                            ))
in
    Convert2

Regards
Zubair

Please try my custom visuals

ImkeF · ‎01-27-2019

That's a pretty nifty solution @Zubair_Muhammad!

Using Text.SplitAny, the code can be shortened considerably:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W0i9LTE4sKS1K1Tc0MDc0NdXNTU1JLS/KTi2yjyk1MDBONjI0NiwCM1ON9YxLS42MTdLS4u2VYnWQNcN1QU0hTzNCmxFMm76hhaGJqbE+adrhtupCtOuSq93YxAhFL8R7SrGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Page = _t]),
    ChangedType = Table.TransformColumnTypes(Source,{{"Page", type text}}),
    CharactersToReplace = Text.Combine(List.RemoveItems(List.Transform({1..126}, each Character.FromNumber(_)),{"0".."9"})),
    Convert=Table.AddColumn(ChangedType, "Custom1", each Text.SplitAny(
                            [Page],
                            CharactersToReplace)
                            ),
    Convert2=Table.AddColumn(Convert, "Custom2", each List.Select(
			[Custom1], 
			(li) => Text.Length(li) >=5){0}
			)
in
    Convert2

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

ballofwax · ‎02-01-2023

This solution is great.. I am using it for a different application.
I have one column containing 4 and 5 digit numbers that I need to grab and add to a new column. This same column also has 10 digit numbers (in the same column as the 4 and 5 digit numbers) that I need to also grab and add to another column.
So, current column contains 4/5 digits as well as 10 digit numbers.... 4/5 digits I want to extract into a column.... 10 digits I want to extract into another column.
The solution you posted above works well in extracting them, except they are all in the same column... I can then do another transform from there. However, if the list doesn't contain either... I get "Errors" --- what is the best way for me to replace such cases with "null" if no match is found for >=4 ?

Anonymous · ‎01-27-2019

Thanks very much 🙂

@ImkeF
@Zubair_Muhammad

Both the solutions give the result I was looking for.

It also works on 5 and 6+ digit ID’s, so it gives exactly the result I was looking for.

I’m using these solutions to solve a slightly more complicated PBI problem.
I will post a new message for that one.

Anonymous · ‎01-26-2019

Hi,

I'm searching for a best-practice method to solve the following problem. I'm trying to extract a specific 4/5/6 length ID number for each line in the following Page column.

I want to extract the correct 4/5/6 length ID number for each line in the Page column and return the result in the VacancyId column.

Is there a best-practice way for this?

How to extract 5/6 length ID numbers from a text string?

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly