Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now

Reply
carlpaul153
Helper I
Helper I

How to remove text strings stored in a list from a column

I'm trying to remove all the substrings from the "substring" list, in column [column1] by adding the following custom column:

List.Accumulate(substring, [Column1],(state, current) => Text.Replace(state, substring{current}, ""))

but i get the follow error:

Expression.Error: We cannot apply indexing to the type List.

 

note that I'm not using Text.remove because it only works for single characters, not strings.

In the same sense, I also want a function that keeps me ONLY the substrings of the list, which could be done with text.select if instead of strings they were unique characters.
So once I have the desired function with list.accumulate (or whatever if there is a better one), I plan to conceal it twice as shown here to keep only those strings. Even so, it seems to me that there has to be a more efficient way, if you come up with something better, please let me know.

In short, I need an efficient way to remove the strings from a list from one column and another to keep only those strings.

1 ACCEPTED SOLUTION
Jimmy801
Community Champion
Community Champion

Hello @carlpaul153 

 

check out this solution. this code replaces all substrings defined in the step SubstringList transforms the column1 into a list with two items. first is new value, second is the removed part. Then duplicated the column and transform them again. Extracting in the column1 the first item, in the changes-column the 2nd item

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WckxKVorViVZKSU0D0+kZmWC6AgTArCoQUIqNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    SubstringList = {"bc", "xx", "zzz"},
    TranformColumn1 = Table.TransformColumns
    (
        #"Changed Type",
        {
            {
                "Column1",
                (cell)=> List.Accumulate(SubstringList, {cell, ""}, (old, current)=> 
                let 
                    CreateNew = Text.Replace(old{0},current,""),
                    CreateSubstringsRemoved = try Text.Combine(List.Difference(Text.ToList(old{0}), Text.ToList(CreateNew))) otherwise ""
                in 
                    {CreateNew, if CreateSubstringsRemoved <> "" and old{1}<> "" then old{1} & ", " & CreateSubstringsRemoved else if old{1} <>"" and CreateSubstringsRemoved = "" then  old{1} else CreateSubstringsRemoved }
                )
            }
        }
    ),
    Duplicated = Table.DuplicateColumn(TranformColumn1, "Column1", "changes"),
    
    Final = Table.TransformColumns
    (
        Duplicated,
        {
            {
                "Column1",
                each _{0}
            },
            {
                "changes",
                each _{1}
            }
        }
    )
in
    Final

 

Copy paste this code to the advanced editor in a new blank query to see how the solution works.

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

View solution in original post

9 REPLIES 9
MichaelJ64
Frequent Visitor

Thanks @carlpaul153 for the question, @Anonymous for comment and a solution, and @Jimmy801 for the code above.

I do something very similar.  I am processing e-mails to count topics defined by the subject line.  I want to remove duplicates - e-mails that contain either "FW: " or "RE:" ("Fw: " and "Re:"), place the trimmed subject in a new column, and then use 'Table.Distinct' on the new column to remove duplicates.

 

I have adapted @Jimmy801 code to a 'Table.Addcolumn'.  Here is the relevant code - [Subject] is the column with the subject line from the e-mails.  I found that 'List.Accumulate' returned a list, the first element was the processed code, the second was from the substring list.  Consequence of using @Jimmy801's code as his does a few extra things.

 

    SubstringList = {"FW: ", "Fw: ", "RE: ", "Re: "},

    AddTitle = Table.AddColumn(SetTypeCompactTable, "Title", each

                 List.Accumulate( SubstringList, {[Subject], ""}, (old, current) =>

                   let

                      CreateNew = Text.Replace(old{0}, current,""),

                      CreateSubstringsRemoved =

                        try Text.Combine (List.Difference(Text.ToList (old{0}), Text.ToList(CreateNew) )  )

                        otherwise ""

                   in

                      {CreateNew, if CreateSubstringsRemoved <> "" and old{1} <> "" then

                                       old{1} & ", " & CreateSubstringsRemoved

                                  else

                                    if old{1} <> "" and CreateSubstringsRemoved = "" then

                                       old{1}

                                    else

                                       CreateSubstringsRemoved

                      }

                 ) {0}, Text.Type   ),

 

This works. 

 

carlpaul153
Helper I
Helper I

@CNENFRNL I appreciate your effort. The problem with your solution is that for each row the function returns a list, when the expected result is a text.
@Jimmy801 thank you very much. Honestly, I've found a way to deal with the problem in another way that works for me, so I haven't been able to test your algorithm. Either way I mark it as a solution to serve others. If anyone finds an error in it, please report it.
@Anonymous I appreciate your effort to understand the problem with List.Accumulate. As I mentioned to Jimmy I already managed to solve the problem, but answering your question, substring is not a list of lists, just a list of texts.

Anonymous
Not applicable

hi @carlpaul153 

 

about your answer:

"@Rocco_sprmnt21 I appreciate your effort to understand the problem with List.Accumulate. As I mentioned to Jimmy I already managed to solve the problem, but answering your question, substring is not a list of lists, just a list of texts."

 

let me have doubts about what you say.
While waiting for you to document that type of error that you mentioned in correspondence with the code you posted in the first message, I explain the reasons for my doubts, hoping that they will be useful.

 

the variable substring=list of string as you claim:

image.png

 

 

image.png

 

the error message you got with your code and string list is different from the one you mentioned.

 

If you had modified your code as suggested by me (it was enough to delete only a small part of the text), you would have obtained this result:

 

image.png

 

if instead the variable substring contains lists inside it:

 

image.png

you get exactly the error you mentioned:

 

"List.Accumulate(substring, [Column1],(state, current) => Text.Replace(state, substring{current}, ""))

but i get the follow error:

Expression.Error: We cannot apply indexing to the type List."

 

image.png

 

 

PS

Apart from all these details, the solution with list.accumulate is very beautiful and elegant.

 

Anonymous
Not applicable

Hi @carlpaul153 

the error you got with the following expression

List.Accumulate(substring, [Column1],(state, current) => Text.Replace(state, substring{current}, ""))

is most likely due to the following fact.
Mine, for lack of complete information, is an attempt to reconstruct what you got based on what I know to be how the List.Accumulate function works.
the third parameter of the function, which is itself a function, takes the value of the avriable "current" from the list which is the first parameter of the function, in our case "substring".
If, as it likely is but we are waiting for your confirmation, the substring list is a list of lists, then you get the error you reported.
The syntactically correct form would be the following:

 

List.Accumulate(substring, [Column1],(state, current) => Text.Replace(state, current, ""))

in this way, in each step, you replace the i-th substring with "" inside the string contained in the current row of column1.
But this, if my supposition is correct, is not enough.
You probably need to change the structure of the list as well:
substring, which must be a list of string, like this:

{"ad", "b","hg"}

 

not a list of lists,

 

like this

{{"ad"}, {"b"},"hg"}

 

But all this is my guess.

Jimmy801
Community Champion
Community Champion

Hello @carlpaul153 

 

check out this solution. this code replaces all substrings defined in the step SubstringList transforms the column1 into a list with two items. first is new value, second is the removed part. Then duplicated the column and transform them again. Extracting in the column1 the first item, in the changes-column the 2nd item

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WckxKVorViVZKSU0D0+kZmWC6AgTArCoQUIqNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    SubstringList = {"bc", "xx", "zzz"},
    TranformColumn1 = Table.TransformColumns
    (
        #"Changed Type",
        {
            {
                "Column1",
                (cell)=> List.Accumulate(SubstringList, {cell, ""}, (old, current)=> 
                let 
                    CreateNew = Text.Replace(old{0},current,""),
                    CreateSubstringsRemoved = try Text.Combine(List.Difference(Text.ToList(old{0}), Text.ToList(CreateNew))) otherwise ""
                in 
                    {CreateNew, if CreateSubstringsRemoved <> "" and old{1}<> "" then old{1} & ", " & CreateSubstringsRemoved else if old{1} <>"" and CreateSubstringsRemoved = "" then  old{1} else CreateSubstringsRemoved }
                )
            }
        }
    ),
    Duplicated = Table.DuplicateColumn(TranformColumn1, "Column1", "changes"),
    
    Final = Table.TransformColumns
    (
        Duplicated,
        {
            {
                "Column1",
                each _{0}
            },
            {
                "changes",
                each _{1}
            }
        }
    )
in
    Final

 

Copy paste this code to the advanced editor in a new blank query to see how the solution works.

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

edhans
Super User
Super User

@carlpaul153 Consider using Text.Replace and replace with an empty space.

Text.Replace([Column1], "100", "")

That will turn "This Is The Number 100." to "This Is the Number ."



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

the problem is that in each cell of the column I want to delete a list of strings, not a single string.
For example, I would like Text.Replace ([Column1], {"100", "This", "The"}, "") to return "Is Number" in your example.
Of course, this is not possible because text.replace does not support lists as a parameter.

The procedure to replace each substring in a list is to be defined as a function. Pls refer to the code as follows,

 

let
    fn = (Str as text, SubstrList as list) =>

    let
        Replacing = List.Generate(
            () => [str = Str, counter = 0],
            each [counter] <= List.Count(SubstrList),
            each [str = Text.Replace([str], SubstrList{[counter]}, ""), counter = [counter] + 1],
            each Text.Trim([str])
        )
    in
        Replacing,

    SubstrList = {"Is", "This", "The"},
    Str = "This Is The Number 100.",
    Invocation = fn(Str, SubstrList)
in
    Invocation

 

result reads like this,

List
This Is The Number 100.
This  The Number 100.
The Number 100.
Number 100.

Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

Greg_Deckler
Super User
Super User

@edhans @ImkeF 



Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
OCT PBI Update Carousel

Power BI Monthly Update - October 2024

Check out the October 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors