Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
monojchakrab
Resolver III
Resolver III

Extract text (sub) strings from a text string using a pre-specified list

Hiya good people of PQ,

 

I have a text column as containing various texts,e.g. xxyyzz, xxxabc, acdfgh.

 

I want to extract the the texts "x", "c" and "h" for example, using a pre-specified list {"x","c","h"}, if there is a match.

 

I am currently using if text.contains code, but its starting to get very complicated with every new data refresh adding more complexity to the text column.

 

If there was a simpler solution with List.accumulate, e..g, then I could just update the list.

 

Thanks and best regds.,

4 ACCEPTED SOLUTIONS

Table.AddColumn(Custom2, "Ingredient", each
            List.Skip(
                      {{{"50%","sugarlo"},"SugarLo"},
                       {{"Stevia","Green"},"Stevia"},
                       {{"classic","cook","zero calorie"},"Sucralose"},
                       {{"original","Low Calories"},"Aspartame"},
                       {{"xylitol"},"Xylitol"}
                      },
                      (x)=>not List.Contains(x{0},[Title],(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
                      ){0}?{1}?
               )

please  change List.ContainsAny to List.Contains

View solution in original post

1 if use flat list, you can not get the replacement when more than one substrings be assigned a replacement. the list can be re-write as this

{"50%","SugarLo"},
{"sugarlo","SugarLo"},
{"Stevia","Stevia"},
{"Green","Stevia"},
{"classic","Sucralose"},
{"cook","Sucralose"},
{"zero calorie""Sucralose"},
{"original","Aspartame"},
{"Low Calories","Aspartame"},
{"xylitol","Xylitol"}

 and re-write the second arguement as 

(x)=>Text.Contains([Title],x,Comparer.OrdinalIgnoreCase)

I just intergerate the same replacement into one item.

2 (ItemFromSupportingList)=>not List.Contains(ItemFromSupportingList{0},[Title],(1stArguement,2ndArgument)=>Text.Contains(2ndArgument,1stArguement,Comparer.OrdinalIgnoreCase))
){0}?{1}?

{0} is select the first value of the list after skiping all the items not match the condition. this value will be in the structure as {list,text}, then {1} to get the "text" as the replacement.

View solution in original post

wdx223_Daniel_0-1698967838387.png

so you get a blank list {}, so {}{0} is error, then {}{0}{1} is error.

can add a ? to tolerate the error, when there is no item on the index you want, it will give a null

 

View solution in original post

26 REPLIES 26
wdx223_Daniel
Super User
Super User

each item should like {{"50%","sugarlo"},"SugarLo"}

when all the value in first item's list can not be found in the Title column, then this item {{"50%","sugarlo"},"SugarLo"} will be skipped.

monojchakrab
Resolver III
Resolver III

Hey @Daryl-Lynch-Bzy ,

 

Below is a snapshot of the text column :

 

monojchakrab_0-1698379984092.png

And here is the code I am using for extracting the key words like Stevia or Aspartame and I use similar code to extract the pack size of type (like sachets or tablets) :

 

Table.AddColumn(Custom2, "Ingredient", each if 
Text.Contains([Title],"50%",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"sugarlo",Comparer.OrdinalIgnoreCase)then "SugarLo" else 
if Text.Contains([Title],"Stevia",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"Green",Comparer.OrdinalIgnoreCase)
then "Stevia" else 
        if Text.Contains([Title],"classic",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"cook",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"zero calorie",Comparer.OrdinalIgnoreCase) then "Sucralose" else 
        if Text.Contains([Title],"original" ,Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"Low Calories", Comparer.OrdinalIgnoreCase) 
        then "Aspartame" else 
        if Text.Contains([Title],"xylitol",Comparer.OrdinalIgnoreCase) then "Xylitol" else null)

 

This code keeps getting longer and more complex with every refresh as the texts start getting new additions everytime so I  have to mody this code post every refresh. 

But what remains invariant is the list of actives or the list of pack sizes or the type etc - hence if I can work off a list, it will be a simple task just to update the relevant list

Thanks and appreciate the leg-up

Table.AddColumn(Custom2, "Ingredient", each
            List.Skip(
                      {{{"50%","sugarlo"},"SugarLo"},
                       {{"Stevia","Green"},"Stevia"},
                       {{"classic","cook","zero calorie"},"Sucralose"},
                       {{"original","Low Calories"},"Aspartame"},
                       {{"xylitol"},"Xylitol"}
                      },
                      (x)=>not List.ContainsAny(x{0},[Title],(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
                      ){0}?{1}?
               )

Hey @wdx223_Daniel - this is the error I am getting :

 

monojchakrab_0-1698404876045.png

 

I typed in your code verbatim.

Table.AddColumn(Custom2, "Ingredient", each
            List.Skip(
                      {{{"50%","sugarlo"},"SugarLo"},
                       {{"Stevia","Green"},"Stevia"},
                       {{"classic","cook","zero calorie"},"Sucralose"},
                       {{"original","Low Calories"},"Aspartame"},
                       {{"xylitol"},"Xylitol"}
                      },
                      (x)=>not List.Contains(x{0},[Title],(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
                      ){0}?{1}?
               )

please  change List.ContainsAny to List.Contains

@wdx223_Daniel ,

 

Sorry to be a pest here but I am havig trouble working thru' the logic of the code. Appreciate your being patient with my dumb questions :

1. Why are we using a nested list - why cannot we use a flat list, e.g. {"50%", "sugarlo", "low calories","Aspartame"} and so on? The reaso I am asking is this is that the [Title] column also has some other information like forms, pack size etc for which we may not need a nested list as in this code. 

2. I am also not clear as to how do you skip a nested list

3. In the last part of the code, 

(x)=>not List.Contains(x{0},[Title],(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
                      ){0}?{1}?

I can see you are using the logic for skipping the list, but I did not understand the following :

a. where is variable (x) getting its value from? and why are using the 1st positing by using x{0}?

b. I am not able to understand the 3rd parameter within the list.contains

c. where is variable (y) getting its value from

d. Then in the last part why are we using {0} and then {1} positions and of which list?

Possibly all dumb questions but if I get this logis straight, I can reconstruct the code for extracting the other text strings :

a. Form - "tablets","jars","pouch","sachets"

b. Pack size - "100","300","500","50","100","150" etc

c. Multi pack - "1","2","3","4","5","6"

Appreciate the leg-up and apologies for the long-winded question!

 

1 if use flat list, you can not get the replacement when more than one substrings be assigned a replacement. the list can be re-write as this

{"50%","SugarLo"},
{"sugarlo","SugarLo"},
{"Stevia","Stevia"},
{"Green","Stevia"},
{"classic","Sucralose"},
{"cook","Sucralose"},
{"zero calorie""Sucralose"},
{"original","Aspartame"},
{"Low Calories","Aspartame"},
{"xylitol","Xylitol"}

 and re-write the second arguement as 

(x)=>Text.Contains([Title],x,Comparer.OrdinalIgnoreCase)

I just intergerate the same replacement into one item.

2 (ItemFromSupportingList)=>not List.Contains(ItemFromSupportingList{0},[Title],(1stArguement,2ndArgument)=>Text.Contains(2ndArgument,1stArguement,Comparer.OrdinalIgnoreCase))
){0}?{1}?

{0} is select the first value of the list after skiping all the items not match the condition. this value will be in the structure as {list,text}, then {1} to get the "text" as the replacement.

Thanks a lot @wdx223_Daniel  for the clear explanation. 

Suppose the text string also contains sub-strings like 50,500 and 5 - is there a way we can extrat the "5" from the single "5", but not from "50 and "500"? That along with a remodification of this code can help extract the pack sizes and selling units from the [Title] string?

could you provide some sample data along with the desire output?

The source data remains same...

The code I am using to extract the form is as below :

Table.AddColumn(Ingredients, "Form", each if Text.Contains([Title],"tablets",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"tablet",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"500 pcs",Comparer.OrdinalIgnoreCase) 
        or Text.Contains([Title],"330",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"440",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"550",Comparer.OrdinalIgnoreCase)
        then "Tablets" else 
        if Text.Contains([Title],"sachet",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"sachets",Comparer.OrdinalIgnoreCase) then "Sachets" else 
        if Text.Contains([Title],"500 g",Comparer.OrdinalIgnoreCase) or  Text.Contains([Title],"500g",Comparer.OrdinalIgnoreCase)
            or Text.Contains([Title],"xylitol",Comparer.OrdinalIgnoreCase) then "Pouch" else 
        if Text.Contains([Title],"80",Comparer.OrdinalIgnoreCase)  or 
        Text.Contains([Title],"150g",Comparer.OrdinalIgnoreCase) or 
        Text.Contains([Title],"150 g",Comparer.OrdinalIgnoreCase) or 
        Text.Contains([Title],"150gm",Comparer.OrdinalIgnoreCase) or 
        Text.Contains([Title],"150 GRAM",Comparer.OrdinalIgnoreCase)
        
        then "Jar" else "Tablets"
                
        )

The code I am using to extract the sizes is as below :

a. First I remove all text from the [Title] string :

Table.AddColumn(MPU, "Remove Text", each Text.Remove([Title],{"A".."Z","a".."z"}))

b. This returns a column as below :

monojchakrab_0-1698824935381.png

c. from this column now, I feed a pre-prepared list :

monojchakrab_1-1698825030161.png

into the following code to extract the matching strings :

 Table.AddColumn(RemoveTextFromTitle, "Pack", each List.Select(PackList,
            (x)=>
            Text.Contains([Remove Text],x,Comparer.OrdinalIgnoreCase)
            ){0}?)

d. This returns a column as below :

monojchakrab_2-1698825292242.png

e. I now clean up the above the column, by feeding another list with corrected packlist :

monojchakrab_3-1698825429803.png

by using the following code :

List.Accumulate({0..Table.RowCount(PackSize)-1},
                                PackPosition,
                                        (x,y)=>
                                            Table.ReplaceValue(x, 
                                                PackSize[OldValue]{y}, PackSize[NewValue]{y},
                                                Replacer.ReplaceValue, {"Pack"})
                                
    )

Little long-winded, but I could not find a simpler way to get to this in fewer steps.

Any help appreciated.

it looks like more complicated.

i can not get the whole story depending on what you showed

@wdx223_Daniel ,

 

Sorry I confused you unintentionally. let me try and simplify this for you :

 

1. We are starting with the same column as above :

 

monojchakrab_0-1698830479484.png

2. This text string, contains terms like sachets, tablets and jars - which are the forms. I want to extract these sub-strings from this column and the desired output should be to add a column headed "Form" with the strings as below :

 

monojchakrab_1-1698830626655.png

3. Next if you notice, the text string column also includes number strings as - 100,300, 500, 80,50 etc. These are pack sizes. The desired output would be to extract these strings into a separate column headed "pack Size" :

monojchakrab_2-1698830865514.png

Not sure if that has simplified things a bit and helps you better with the solution I am seeking.

 

Table.SplitColumn(Custom2, "Title", each
            let 
                  fx=(x,y)=>List.RemoveItems(Splitter.SplitTextByAnyDelimiter(x)(y),{"",null}),
                  Ingredient=List.Skip(
                      {{{"50%","sugarlo"},"SugarLo"},
                       {{"Stevia","Green"},"Stevia"},
                       {{"classic","cook","zero calorie"},"Sucralose"},
                       {{"original","Low Calories"},"Aspartame"},
                       {{"xylitol"},"Xylitol"}
                      },
                      (x)=>not List.Contains(x{0},_,(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
                      ){0}?{1}?,
                  Form=fx(fx(YourFormList,_),_){0}?,
                  PackSize=fx(fx(YourPackSizeList,_),_){0}?,
                  MultiPack=fx(fx(YourMulitPackList,_),){0}?
                in {Ingredient,Form,PackSize,MultiPack}
               {"Ingredient","Form","PackSize","MultiPack"})

@wdx223_Daniel , I tried the code above only for ingredients :

Table.SplitColumn(ActiveNewWay.1, "Title", each
            let 
                  fx=(x,y)=>List.RemoveItems(Splitter.SplitTextByAnyDelimiter(x)(y),{"",null}),
                  Ingredient=List.Skip(
                      {{{"50%","sugarlo"},"SugarLo"},
                       {{"Stevia","Green"},"Stevia"},
                       {{"classic","cook","zero calorie"},"Sucralose"},
                       {{"original","Low Calories"},"Aspartame"},
                       {{"xylitol"},"Xylitol"}
                      },
                      (x)=>not List.Contains(x{0},_,(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
                      ){0}?{1}?
                //   Form=fx(fx(YourFormList,_),_){0}?,
                //   PackSize=fx(fx(YourPackSizeList,_),_){0}?,
                //   MultiPack=fx(fx(YourMulitPackList,_),){0}?
                in {Ingredient}
               {"Ingredient"})

And this is returning the following error :

monojchakrab_0-1698911243822.png

Am I missing something here?

regds.,

wdx223_Daniel_1-1698913925924.png

 

Thanks @wdx223_Daniel - the missing underscore probably is not causing the error since that is commented out. Will try the comma after {Ingredient} and check the result. 

One question though :

when you are feeding the variable (x){0} into the list.contains function, will it not always check against the 1st item in the list (which in this case is "SugarLo"? I am slightly confused here. Should not the list.contains check each item in the list within the [Title] column? but (x){0} will always check for the 1st item and not list.contains will always return false, right? I am not able to figure out that if its not a recursive function, then how for each row in column [Title], each item of the list will be checked for true or false?

this function will be ran in each row of your table.

it firstly get the value of [Title] in this row

then, test if it contains any value in the first list-item of each row in your pre-prepared list

it will skip all the row of which the first list-item did not be found in Title,

then get the first row of the remain pre-prepared list, and get the 2nd item of the row as the replacement.   

@wdx223_Daniel - I was trying out the code in a smaller bit as follows :

1. I have prepared a list called [strings} as follows :

let
    Source = {
{{"SugarLo","50%"},"SugarLo"},
{{"Original", "Low Calories"},"Aspartame"},
{{"Classic","cook","zero calorie"},"Sucralose"},
{{"Stevia","green"},"Stevia"},
{{"Xylitol"},"Xylitol"}
}
in
    Source

2. Then I am trying out your code on this list as follows :

 List.Skip(Strings, (x)=>not List.Contains(x{0},"SugarLo"))

Since the condition is false it returns all the 5 lists and if we use {0}{1} on this list it correctly returns "SugarLo"

3. the problem starts when I change "SugarLo" to "Aspartame" e.g. - 

List.Skip(Strings, (x)=>not List.Contains(x{0},"Aspartame"))

which returns an empty list and hence the {0}{1} does not work and returns an error.

Am I missing something in the logic here? I am really sorry but I am not able to work out how will (x){0} work on the list recursively?

can you help?

thanks and really appreciate and apologies for the bother!

 

wdx223_Daniel_0-1698967838387.png

so you get a blank list {}, so {}{0} is error, then {}{0}{1} is error.

can add a ? to tolerate the error, when there is no item on the index you want, it will give a null

 

Thanks @wdx223_Daniel - its a bit clearer now.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.