Solved: Re: How to remove text strings stored in a list fr...

carlpaul153 · ‎09-09-2020

I'm trying to remove all the substrings from the "substring" list, in column [column1] by adding the following custom column:

List.Accumulate(substring, [Column1],(state, current) => Text.Replace(state, substring{current}, ""))

but i get the follow error:

Expression.Error: We cannot apply indexing to the type List.

note that I'm not using Text.remove because it only works for single characters, not strings.

In the same sense, I also want a function that keeps me ONLY the substrings of the list, which could be done with text.select if instead of strings they were unique characters.
So once I have the desired function with list.accumulate (or whatever if there is a better one), I plan to conceal it twice as shown here to keep only those strings. Even so, it seems to me that there has to be a more efficient way, if you come up with something better, please let me know.

In short, I need an efficient way to remove the strings from a list from one column and another to keep only those strings.

Jimmy801 · ‎09-09-2020

Hello @carlpaul153

check out this solution. this code replaces all substrings defined in the step SubstringList transforms the column1 into a list with two items. first is new value, second is the removed part. Then duplicated the column and transform them again. Extracting in the column1 the first item, in the changes-column the 2nd item

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WckxKVorViVZKSU0D0+kZmWC6AgTArCoQUIqNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    SubstringList = {"bc", "xx", "zzz"},
    TranformColumn1 = Table.TransformColumns
    (
        #"Changed Type",
        {
            {
                "Column1",
                (cell)=> List.Accumulate(SubstringList, {cell, ""}, (old, current)=> 
                let 
                    CreateNew = Text.Replace(old{0},current,""),
                    CreateSubstringsRemoved = try Text.Combine(List.Difference(Text.ToList(old{0}), Text.ToList(CreateNew))) otherwise ""
                in 
                    {CreateNew, if CreateSubstringsRemoved <> "" and old{1}<> "" then old{1} & ", " & CreateSubstringsRemoved else if old{1} <>"" and CreateSubstringsRemoved = "" then  old{1} else CreateSubstringsRemoved }
                )
            }
        }
    ),
    Duplicated = Table.DuplicateColumn(TranformColumn1, "Column1", "changes"),
    
    Final = Table.TransformColumns
    (
        Duplicated,
        {
            {
                "Column1",
                each _{0}
            },
            {
                "changes",
                each _{1}
            }
        }
    )
in
    Final

Copy paste this code to the advanced editor in a new blank query to see how the solution works.

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

View solution in original post

MichaelJ64 · ‎01-20-2021

Thanks @carlpaul153 for the question, @Anonymous for comment and a solution, and @Jimmy801 for the code above.

I do something very similar. I am processing e-mails to count topics defined by the subject line. I want to remove duplicates - e-mails that contain either "FW: " or "RE:" ("Fw: " and "Re:"), place the trimmed subject in a new column, and then use 'Table.Distinct' on the new column to remove duplicates.

I have adapted @Jimmy801 code to a 'Table.Addcolumn'. Here is the relevant code - [Subject] is the column with the subject line from the e-mails. I found that 'List.Accumulate' returned a list, the first element was the processed code, the second was from the substring list. Consequence of using @Jimmy801's code as his does a few extra things.

SubstringList = {"FW: ", "Fw: ", "RE: ", "Re: "},

AddTitle = Table.AddColumn(SetTypeCompactTable, "Title", each

List.Accumulate( SubstringList, {[Subject], ""}, (old, current) =>

let

CreateNew = Text.Replace(old{0}, current,""),

CreateSubstringsRemoved =

try Text.Combine (List.Difference(Text.ToList (old{0}), Text.ToList(CreateNew) ) )

otherwise ""

in

{CreateNew, if CreateSubstringsRemoved <> "" and old{1} <> "" then

old{1} & ", " & CreateSubstringsRemoved

else

if old{1} <> "" and CreateSubstringsRemoved = "" then

old{1}

else

CreateSubstringsRemoved

}

) {0}, Text.Type ),

This works.

carlpaul153 · ‎09-10-2020

@CNENFRNL I appreciate your effort. The problem with your solution is that for each row the function returns a list, when the expected result is a text.
@Jimmy801 thank you very much. Honestly, I've found a way to deal with the problem in another way that works for me, so I haven't been able to test your algorithm. Either way I mark it as a solution to serve others. If anyone finds an error in it, please report it.
@Anonymous I appreciate your effort to understand the problem with List.Accumulate. As I mentioned to Jimmy I already managed to solve the problem, but answering your question, substring is not a list of lists, just a list of texts.

Anonymous · ‎09-10-2020

hi @carlpaul153

about your answer:

"@Rocco_sprmnt21 I appreciate your effort to understand the problem with List.Accumulate. As I mentioned to Jimmy I already managed to solve the problem, but answering your question, substring is not a list of lists, just a list of texts."

let me have doubts about what you say.
While waiting for you to document that type of error that you mentioned in correspondence with the code you posted in the first message, I explain the reasons for my doubts, hoping that they will be useful.

the variable substring=list of string as you claim:

the error message you got with your code and string list is different from the one you mentioned.

If you had modified your code as suggested by me (it was enough to delete only a small part of the text), you would have obtained this result:

if instead the variable substring contains lists inside it:

you get exactly the error you mentioned:

"List.Accumulate(substring, [Column1],(state, current) => Text.Replace(state, substring{current}, ""))

but i get the follow error:

Expression.Error: We cannot apply indexing to the type List."

PS

Apart from all these details, the solution with list.accumulate is very beautiful and elegant.

Anonymous · ‎09-09-2020

Hi @carlpaul153

the error you got with the following expression

List.Accumulate(substring, [Column1],(state, current) => Text.Replace(state, substring{current}, ""))

is most likely due to the following fact.
Mine, for lack of complete information, is an attempt to reconstruct what you got based on what I know to be how the List.Accumulate function works.
the third parameter of the function, which is itself a function, takes the value of the avriable "current" from the list which is the first parameter of the function, in our case "substring".
If, as it likely is but we are waiting for your confirmation, the substring list is a list of lists, then you get the error you reported.
The syntactically correct form would be the following:

List.Accumulate(substring, [Column1],(state, current) => Text.Replace(state, current, ""))

in this way, in each step, you replace the i-th substring with "" inside the string contained in the current row of column1.
But this, if my supposition is correct, is not enough.
You probably need to change the structure of the list as well:
substring, which must be a list of string, like this:

{"ad", "b","hg"}

not a list of lists,

like this

{{"ad"}, {"b"},"hg"}

But all this is my guess.

Jimmy801 · ‎09-09-2020

Hello @carlpaul153

check out this solution. this code replaces all substrings defined in the step SubstringList transforms the column1 into a list with two items. first is new value, second is the removed part. Then duplicated the column and transform them again. Extracting in the column1 the first item, in the changes-column the 2nd item

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WckxKVorViVZKSU0D0+kZmWC6AgTArCoQUIqNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    SubstringList = {"bc", "xx", "zzz"},
    TranformColumn1 = Table.TransformColumns
    (
        #"Changed Type",
        {
            {
                "Column1",
                (cell)=> List.Accumulate(SubstringList, {cell, ""}, (old, current)=> 
                let 
                    CreateNew = Text.Replace(old{0},current,""),
                    CreateSubstringsRemoved = try Text.Combine(List.Difference(Text.ToList(old{0}), Text.ToList(CreateNew))) otherwise ""
                in 
                    {CreateNew, if CreateSubstringsRemoved <> "" and old{1}<> "" then old{1} & ", " & CreateSubstringsRemoved else if old{1} <>"" and CreateSubstringsRemoved = "" then  old{1} else CreateSubstringsRemoved }
                )
            }
        }
    ),
    Duplicated = Table.DuplicateColumn(TranformColumn1, "Column1", "changes"),
    
    Final = Table.TransformColumns
    (
        Duplicated,
        {
            {
                "Column1",
                each _{0}
            },
            {
                "changes",
                each _{1}
            }
        }
    )
in
    Final

Copy paste this code to the advanced editor in a new blank query to see how the solution works.

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

edhans · ‎09-09-2020

@carlpaul153 Consider using Text.Replace and replace with an empty space.

Text.Replace([Column1], "100", "")

That will turn "This Is The Number 100." to "This Is the Number ."

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

carlpaul153 · ‎09-09-2020

the problem is that in each cell of the column I want to delete a list of strings, not a single string.

For example, I would like Text.Replace ([Column1], {"100", "This", "The"}, "") to return "Is Number" in your example.

Of course, this is not possible because text.replace does not support lists as a parameter.

CNENFRNL · ‎09-09-2020

The procedure to replace each substring in a list is to be defined as a function. Pls refer to the code as follows,

let
    fn = (Str as text, SubstrList as list) =>

    let
        Replacing = List.Generate(
            () => [str = Str, counter = 0],
            each [counter] <= List.Count(SubstrList),
            each [str = Text.Replace([str], SubstrList{[counter]}, ""), counter = [counter] + 1],
            each Text.Trim([str])
        )
    in
        Replacing,

    SubstrList = {"Is", "This", "The"},
    Str = "This Is The Number 100.",
    Invocation = fn(Str, SubstrList)
in
    Invocation

result reads like this,

List

This Is The Number 100.

This The Number 100.

The Number 100.

Number 100.

Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

Greg_Deckler · ‎09-09-2020

@edhans @ImkeF

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: DAX For Humans

DAX is easy, CALCULATE makes DAX hard...