Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
agc1234
New Member

Extracting a Specific string format from a string of text

Good Day Everyone,

 

I am trying to extract a specific format of: ##a# or (number)(number)(letter)(number)

 

I need first 4 parts of the serial number. I've attached examples of source data and what I'd like to extract:

 

agc1234_0-1707419347433.png

 

Does anyone have any ideas? Really stuck trying to get a reliable formula for this.

 

Thanks Everyone!

 

6 REPLIES 6
AlienSx
Super User
Super User

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jY4xCgIxEEWvMqQ2kMxMJoldomAhpLERYgqFLRa0Wdn7q5scwPLNPPivVmVNEiaPqu2quhSDhyCGqeO0zPcnlPX1mBaY37AHS+cYkXjo30MsJOy8BLitxqCAxZML/ip+c6xAAgkkfQFRZ+8Cx05WZybNTrprtpR8HPQrSyMtjecfw6q1Dw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Serial #" = _t]),
    pattern = {false, false, true, false},
    chars = List.Buffer({"0".."9", "A".."Z"}),    
    finder = (t as text) =>
        [lst = List.Buffer(List.Select(Text.ToList(t), (x) => List.Contains(chars, x))),
        gen = List.Generate(
            () => [ch = lst, okay = false, serial = {}],
            (x) => not List.IsEmpty(x[ch]) or x[okay], 
            (x) => 
                [okay = List.Transform(List.FirstN(x[ch], 4), (w) => (try Number.From(w))[HasError]) = pattern,
                serial =  if okay then List.FirstN(x[ch], 4) else {},
                ch = List.Skip(x[ch], if okay then 4 else 1)],
            (x) => Text.Combine(x[serial])
        ),
        z = Text.Combine(List.Select(gen, each _ <> ""), ", ")][z],
    out = Table.AddColumn(Source, "Desired Outcome", (x) => finder(x[#"Serial #"]))
in
    out
dufoq3
Super User
Super User

Hello @agc1234,
same result as @Vijay_A_Verma but different approach.

 

See line four. This one also matches the criteria.

dufoq3_0-1707514941652.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("LY4xCwIxDEb/SuhsoU3StHVrFRyELi5C7aBww8G5nNz/V68dX/Lge7Uqa5IweVTtUNWtGDwFMUwdp3V+LlC292taYf7AESxdY0Tiof8OsZCw8xLgsRmDAhYvLvi7+N2xAgkkkPQFRJ29Cxw7WZ2ZNDvprtlT8nnQvyyNtDSeqrUv", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Serial #" = _t]),
    UppercasedText = Table.TransformColumns(Source,{{"Serial #", Text.Upper, type text}}),
    Ad_Cleaned = Table.AddColumn(UppercasedText , "Cleaned", each Text.Select([#"Serial #"], List.Buffer({"0".."9", "A".."Z"}))),
    Ad_EachFourCombination = Table.AddColumn(Ad_Cleaned, "Each 4 Combination", each List.Accumulate( 
    {0..Text.Length([Cleaned])-1},
    {},
    (s,c)=> List.RemoveNulls(s & { try Text.Range([Cleaned], c, 4) otherwise null })
), type list),
    Ad_OnlyMatching = Table.AddColumn(Ad_EachFourCombination, "Only Matching", each List.RemoveNulls(List.Transform([Each 4 Combination], each 
      if  (try Value.Is(Number.From(Text.ToList(_){0}), Number.Type) otherwise false) //1st digit check if is number
      and (try Value.Is(Number.From(Text.ToList(_){1}), Number.Type) otherwise false) //2nd digit check if is number
      and (try not Value.Is(Number.From(Text.ToList(_){2}), Number.Type) otherwise true) //3rd digit check if is text
      and (try Value.Is(Number.From(Text.ToList(_){3}), Number.Type) otherwise false) //4th digit check if is number
      then _ else null)), type list),
    Ad_FourDigits = Table.AddColumn(Ad_OnlyMatching, "4 digits", each Text.Combine([Only Matching], ", "), type text),
    RemovedOtherColumns = Table.SelectColumns(Ad_FourDigits,{"Serial #", "4 digits"})
in
    RemovedOtherColumns

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

latimeria
Solution Specialist
Solution Specialist

Hi @agc1234 ,

Another approach

 

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Serial#", type text}}),
    #"Added Desired Output" = Table.AddColumn(
        #"Changed Type", 
        "Desired Output", 
        each 
            List.Transform(
                Text.Split(Text.Remove([#"Serial#"],{" ", "-"}),"&"), 
                each Text.Start(
                    Text.RemoveRange(_,
                    0,
                    Text.PositionOfAny( 
                        _,
                        {"0".."9"},Occurrence.First
                    )
                ),
                4
            )
        )
    ),
    #"Extracted Values" = Table.TransformColumns(
        #"Added Desired Output", 
        {"Desired Output", each Text.Combine(List.Transform(_, Text.From), ", "), type text}
        )
in
    #"Extracted Values"

 

Hi @latimeria,

 

These should be excluded.

dufoq3_1-1707515318711.png

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Vijay_A_Verma
Super User
Super User

Use below code

Note - In row 4, 87X6 is also a pattern matching your requirement.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("LY4xCwIxDEb/SuhsoU3StHVrFRyELi5C7aBww8G5nNz/V68dX/Lge7Uqa5IweVTtUNWtGDwFMUwdp3V+LlC292taYf7AESxdY0Tiof8OsZCw8xLgsRmDAhYvLvi7+N2xAgkkkPQFRJ29Cxw7WZ2ZNDvprtlT8nnQvyyNtDSeqrUv", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Serial #" = _t]),
    #"Added Custom2" = Table.AddColumn(Source, "Custom.1", (x)=> 
    let
        p = Text.Select(x[#"Serial #"], {"0".."9", "A".."Z"}),
        c = Text.Length(p)-3,
        r = List.Generate(()=>[i=0, a=Text.Start(p,4)], each [i]<c, each [i=[i]+1, a=Text.Middle(p,i,4)], each [a])
    in
        Text.Combine(List.Select(r, (x)=> List.ContainsAll({"0".."9"}, Text.ToList(Text.Start(x, 2) & Text.End(x, 1))) and Text.Select(x, {"A".."Z"}) = Text.At(x,2)), "; ")    )
in
    #"Added Custom2"

 

 

v-stephen-msft
Community Support
Community Support

Hi @agc1234 ,

 

In Power BI, when we want to process complex and unordered data, adding many steps using power query may not get the desired results. As we know, power query cannot use regular expressions. If we call Python script and use regular expressions to process the data this time, it will become simple. Please refer to

Using regular expressions in power bi desktop - Microsoft Fabric Community

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.           

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.