Solved: list to filter column using Text.Contains function

Anonymous · ‎11-05-2021

I have a query with a column that I want to filter based on a list that has multiple items. I want to use the filter logic "contains any" (including partial matches of the 'searchfor' list's items). The step is:

= Table.SelectRows(Source, each List.Contains(searchfor, [precinct_code]))

Why's the column filtering only for two items in my list (which has more items)? Thanks.

Anonymous · ‎03-13-2022

this is the code of the funcion not_cnt(...).

open an empty query and copy and paste inside this code.

The same if you want the cnt(...) function.

let
    tca = (txt, lst)=>List.Accumulate(lst,true, (s,c)=>not Text.Contains(txt,c) and s )
in
    tca

read the comment inside (text after "//")

"sf" is the name I used for your list "searchfor". Change it as need

    Origine = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\sprmn\Downloads\STATEWIDE_PRECINCT_SORT.txt"), null, null, 1252)}),
    #"Suddividi colonna in base al delimitatore" = Table.SplitColumn(Origine, "Column1", Splitter.SplitTextByDelimiter("#(tab)", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11", "Column1.12", "Column1.13", "Column1.14", "Column1.15", "Column1.16", "Column1.17", "Column1.18", "Column1.19"}),
    #"Modificato tipo" = Table.TransformColumnTypes(#"Suddividi colonna in base al delimitatore",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}, {"Column1.8", type text}, {"Column1.9", type text}, {"Column1.10", type text}, {"Column1.11", type text}, {"Column1.12", type text}, {"Column1.13", type text}, {"Column1.14", type text}, {"Column1.15", type text}, {"Column1.16", type text}, {"Column1.17", type text}, {"Column1.18", type text}, {"Column1.19", type text}}),
    #"Intestazioni alzate di livello" = Table.PromoteHeaders(#"Modificato tipo", [PromoteAllScalars=true]),
    #"Modificato tipo1" = Table.TransformColumnTypes(#"Intestazioni alzate di livello",{{"county_id", Int64.Type}, {"county", type text}, {"election_dt", type date}, {"result_type_lbl", type text}, {"result_type_desc", type text}, {"contest_id", Int64.Type}, {"contest_title", type text}, {"contest_party_lbl", type text}, {"contest_vote_for", Int64.Type}, {"precinct_code", type text}, {"precinct_name", type text}, {"candidate_id", Int64.Type}, {"candidate_name", type text}, {"candidate_party_lbl", type text}, {"group_num", Int64.Type}, {"group_name", type text}, {"voting_method_lbl", type text}, {"voting_method_rslt_desc", type text}, {"vote_ct", Int64.Type}}),
    #"Filtrate righe" = Table.SelectRows(#"Modificato tipo1", each ([county] = "BERTIE" or [county] = "DURHAM" or [county] = "MARTIN" or [county] = "NEW HANOVER" or [county] = "NORTHAMPTON" or [county] = "PITT" or [county] = "STOKES" or [county] = "SURRY" or [county] = "TYRRELL" or [county] = "WILKES")),
    #"Rimosse altre colonne" = Table.SelectColumns(#"Filtrate righe",{"precinct_code"}),
    #"Rimossi duplicati" = Table.Distinct(#"Rimosse altre colonne"),

// I changed only the followings lines 
// here not_cnt is called. If you want cnt(...), juats change the name of teh function called

    tsr=Table.SelectRows(#"Rimossi duplicati", each not_cnt(_[precinct_code],sf))
in
    tsr

View solution in original post

HotChilli · ‎03-13-2022

Delete the last step (the table select rows)

Add a column with the search i.e.

(x) => List.AnyTrue(List.Transform(searchfor, each Text.Contains(x[precinct_code], _)))

that should show true/false depending on if a list item is found in the text.

Retain the false records by filtering using the dropdown from the header of the column.

Anonymous · ‎03-13-2022

Ok @HotChili, I'm deleting the "Custom1" step. Then I'm adding a custom column with the step:

= Table.AddColumn(#"Removed Duplicates", "Custom", each (x) => List.AnyTrue(List.Transform(searchfor, each Text.Contains(x[precinct_code], _))))

This step creates a column with "function" as values. However, the custom column's format type won't let me change to "True/False". So I can't filter the column at all.

HotChilli · ‎03-14-2022

That's not the same as the code I provided. You've added something. Remove it and it'll work.

Anonymous · ‎03-14-2022

No @HotChilli, the rest was automatically added by PQ. My changes are:

Delete "Custom1" step.
Click on button for "custom column" in "add column" menu.
Copy-and-paste your formula into the custom column's dialog box.
Click on the format icon for the "custom" column.

Anonymous · ‎03-13-2022

see if any of these queries do what you are looking for.

Anonymous · ‎03-14-2022

Thanks @Anonymous, your functions work but extremely slow. So I'm hoping for a faster solution.

Anonymous · ‎03-15-2022

I don't think most of the time is spent running the lines I added.
This part of the code intervenes to search a table of 264 rows (after having filtered the initial table and removed the duplicates) using a list of 20 elements. This is done in no time.
Most of the time it is in the downloading of the huge table from the site.
Try to download the table locally and have the code run locally.

Anonymous · ‎03-13-2022

@Anonymous, list-to-filter-column-using-Text-Contains-function.pbix is correct. Although I don't understand the variables in your functions, I'm trying to recreate your functions. There's no parameter in your file when opened with PQ. When I right-click on the list and select "create function" in the Queries pane, the list and the created function are moved to another folder. Am I recreating your function in the right way? Thanks!

Anonymous · ‎03-13-2022

this is the code of the funcion not_cnt(...).

open an empty query and copy and paste inside this code.

The same if you want the cnt(...) function.

let
    tca = (txt, lst)=>List.Accumulate(lst,true, (s,c)=>not Text.Contains(txt,c) and s )
in
    tca

read the comment inside (text after "//")

"sf" is the name I used for your list "searchfor". Change it as need

    Origine = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\sprmn\Downloads\STATEWIDE_PRECINCT_SORT.txt"), null, null, 1252)}),
    #"Suddividi colonna in base al delimitatore" = Table.SplitColumn(Origine, "Column1", Splitter.SplitTextByDelimiter("#(tab)", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11", "Column1.12", "Column1.13", "Column1.14", "Column1.15", "Column1.16", "Column1.17", "Column1.18", "Column1.19"}),
    #"Modificato tipo" = Table.TransformColumnTypes(#"Suddividi colonna in base al delimitatore",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}, {"Column1.8", type text}, {"Column1.9", type text}, {"Column1.10", type text}, {"Column1.11", type text}, {"Column1.12", type text}, {"Column1.13", type text}, {"Column1.14", type text}, {"Column1.15", type text}, {"Column1.16", type text}, {"Column1.17", type text}, {"Column1.18", type text}, {"Column1.19", type text}}),
    #"Intestazioni alzate di livello" = Table.PromoteHeaders(#"Modificato tipo", [PromoteAllScalars=true]),
    #"Modificato tipo1" = Table.TransformColumnTypes(#"Intestazioni alzate di livello",{{"county_id", Int64.Type}, {"county", type text}, {"election_dt", type date}, {"result_type_lbl", type text}, {"result_type_desc", type text}, {"contest_id", Int64.Type}, {"contest_title", type text}, {"contest_party_lbl", type text}, {"contest_vote_for", Int64.Type}, {"precinct_code", type text}, {"precinct_name", type text}, {"candidate_id", Int64.Type}, {"candidate_name", type text}, {"candidate_party_lbl", type text}, {"group_num", Int64.Type}, {"group_name", type text}, {"voting_method_lbl", type text}, {"voting_method_rslt_desc", type text}, {"vote_ct", Int64.Type}}),
    #"Filtrate righe" = Table.SelectRows(#"Modificato tipo1", each ([county] = "BERTIE" or [county] = "DURHAM" or [county] = "MARTIN" or [county] = "NEW HANOVER" or [county] = "NORTHAMPTON" or [county] = "PITT" or [county] = "STOKES" or [county] = "SURRY" or [county] = "TYRRELL" or [county] = "WILKES")),
    #"Rimosse altre colonne" = Table.SelectColumns(#"Filtrate righe",{"precinct_code"}),
    #"Rimossi duplicati" = Table.Distinct(#"Rimosse altre colonne"),

// I changed only the followings lines 
// here not_cnt is called. If you want cnt(...), juats change the name of teh function called

    tsr=Table.SelectRows(#"Rimossi duplicati", each not_cnt(_[precinct_code],sf))
in
    tsr

HotChilli · ‎11-05-2021

The function from the initial post:

List.Contains(searchfor, [precinct_code]

is going to look in the list searchfor for exact matches of the precinct_code field but I think you want to look in the precinct_code field (text) in each row for any of the items in the list searchfor (I could be getting us both confused at this stage!)

Let me give you this M and see if it's what you need:

Table.SelectRows(#"Changed Type1", (x) => List.AnyTrue(List.Transform(searchFor, each Text.Contains(x[group_name], _))))

you will edit the Changed Type1 to be the previous step at your side. group_name was the field from the 2nd file you provided (I think you edited your post to remove it but I was fast) so you might need to change that.

I'm logging off now but if you need more help just post back and I'll look over the weekend.

All the best.

Anonymous · ‎11-08-2021

Thanks @HotChilli, my M knowledge is too simple so I didn't realize that I didn't even pick the correct function. I'm trying to understand just one of the functions in your solution. What are the arguments in Text.Contains?

AlexisOlson · ‎11-08-2021

Since there are two contexts/environments here (the row context and the list context), you have to distinguish them. The x variable is representing the row context and the _ is the default variable for the "each" shortcut expression.

You could rewrite it as:

(row) => List.AnyTrue(List.Transform(searchFor, (item) => Text.Contains(row[group_name], item)))

Here, row[group_name] is the value for the [group_name] column within that row and item represents each item in the list searchFor and the transformation says that for each item in this list, check if row[group_name] contains that item.

Recommended reading (part 3 in particular):
https://bengribaudo.com/blog/2017/11/17/4107

HotChilli · ‎11-05-2021

Is the requirement to look in the text of the column precinct_code for any of the items in the searchfor list? So using Text.Contains with each item in the list?

(I didn't download the file - too big)

Anonymous · ‎03-11-2022

@HotChilli, the data source is on the web. I've inserted the "not" operator into your solution. By doing so, I've inverted the filter logic to be: "not" and "contains any" (including partial matches of the items in the 'searchfor' list. The filter logic is the last transformation step ("Custom1") in the following M code:

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://s3.amazonaws.com/dl.ncsbe.gov/ENRS/2020_11_03/results_precinct_sort/STATEWIDE_PRECINCT_SORT.txt"), null, null, 1252)}),
    #"Split Column by Delimiter" = Table.SplitColumn(Source, "Column1", Splitter.SplitTextByDelimiter("#(tab)", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11", "Column1.12", "Column1.13", "Column1.14", "Column1.15", "Column1.16", "Column1.17", "Column1.18", "Column1.19"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Split Column by Delimiter", [PromoteAllScalars=true]),
    #"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each [county] = "BERTIE" or [county] = "DURHAM" or [county] = "MARTIN" or [county] = "NEW HANOVER" or [county] = "NORTHAMPTON" or [county] = "PITT" or [county] = "STOKES" or [county] = "SURRY" or [county] = "TYRRELL" or [county] = "WILKES"),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"precinct_code"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns"),
    Custom1 = Table.SelectRows(#"Removed Duplicates", (x) => List.AnyTrue(List.Transform(searchfor, each not Text.Contains(x[precinct_code], _))))
in
    Custom1

The 'searchfor' list (item 'PROVI ' has a trailing space character) is:

OS

ABS

CURBSIDE

ONE STOP

OS

OS-

OSAP

OSCA

OSCH

OSKD

OSLL

OSLOB

OSNR

OSOP

OSTA

OSWA

PROVI

PROVISIOINAL

PROVISIONAL

TRANS

However, it's not correctly filtering out the partial matches below:

precinct_code

ABSENTEE

ABSENTEE BY MAIL

DOB OS

ELK OS

MA OS

MCGEE OS

OFFICE OS

ONE STOP

ONE STOP ARB

ONE STOP CBC

ONE STOP CBR

ONE STOP CFC

ONE STOP CFN

ONE STOP DANBURY

ONE STOP EES

ONE STOP GVT

ONE STOP KING

ONE STOP MSL

ONE STOP NLB

ONE STOP PVL

ONE STOP SRC

ONE STOP WALNUT COVE

OS AG CENTER

OS CJRC

OS COMM SCHOOLS

OS DTM

OS DUKE

OS ENO

OS EPPES

OS ERL

OS FARMVILLE

OS GASTON

OS GETG

OS HVBC

OS JACKSON

OS LOB

OS LW

OS ML

OS MSUMC

OS NCCU

OS NRL

OS POW

OS RICHSQUARE

OS SEVERN

OS SHS

OS SRL

OS STUDENT CENTER

OS TRC

OS WILLIS BUILDING

OS WINTERVILLE

PM OS

PROVISIONAL

TRANSFER

These partial matches would also result if I were to delete the "not" operator from the last transformation step. So why isn't the "not" operator working in the last transformation step? Thanks (and same thanks if @AlexisOlson can troubleshoot)!

AlexisOlson · ‎11-05-2021

Bear in mind that M is case-sensitive. Without seeing an example, I can't say that this is the problem, but it does come up frequently with this sort of thing.

HotChilli · ‎11-05-2021

can you give an example of a precinct which is not found in the list, and show us the part of the list it should match with, please? Pictures are fine.