Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
@AlexisOlson or @BA_Pete Any chance you would have a thought on this?
I am trying to add a new column and new records to a table based on whether a list of keywords exists in the 'Title cleaned' column. The result I'm trying to create is the lower table. I'm new to writing M code and struggling with the way to do this, any help will be greatly appreciated.
Title Index | Title cleaned |
1 | big brown cat |
2 | shaggy golden dog |
3 | orange cat |
4 | black cat and brown dog |
Keywords (list)
cat
dog
brown
Title Index | Title cleaned | Contains Keyword |
1 | big brown cat | cat |
1 | big brown cat | brown |
2 | shaggy golden dog | dog |
3 | orange cat | cat |
4 | black cat and brown dog | cat |
4 | black cat and brown dog | dog |
4 | black cat and brown dog | brown |
Solved! Go to Solution.
You can link to a cloud file using e.g. Dropbox/Google Drive/SharePoint/OneDrive. Make sure the file doesn't contain any sensitive data before you make it public.
Hi @UBComma ,
Please refer to the following steps:
let
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText(
"i45WMlTSUUrKTFdIKsovz1NITixRitWJVjICihZnJKanVyqk5+ekpOYppOSng2WMgTL5RYl56alwxSYgI3ISk7NBIgqJeSlQw8BaYgE=",
BinaryEncoding.Base64
),
Compression.Deflate
)
),
let
_t = ((type nullable text) meta [Serialized.Text = true])
in
type table [#"Title Index" = _t, #"Title cleaned" = _t]
),
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Title Index", Int64.Type}, {"Title cleaned", type text}}
),
#"Added Custom" = Table.AddColumn(
#"Changed Type",
"Contains Keyword",
each List.Transform(
List.Transform(
{
List.Transform(
{0 .. List.Count(Keywords) - 1},
(x) => Number.From(List.Contains(Text.Split([Title cleaned], " "), Keywords{x}))
)
},
each List.PositionOf(_, 1, 2)
){0},
each Keywords{_}
)
),
#"Expanded Contains Keyword" = Table.ExpandListColumn(#"Added Custom", "Contains Keyword")
in
#"Expanded Contains Keyword"
Best Regards,
Gao
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly. If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
How to get your questions answered quickly -- How to provide sample data in the Power BI Forum
There's no need to create an index to iterate over. You can replace
each List.Transform(
List.Transform(
{
List.Transform(
{0 .. List.Count(Keywords) - 1},
(x) => Number.From(List.Contains(Text.Split([Title cleaned], " "), Keywords{x}))
)
},
each List.PositionOf(_, 1, 2)
){0},
each Keywords{_}
)
with
each List.Intersect({Keywords, Text.Split([Title cleaned], " ")})
That you very much for your insights and suggestions. This produces the results that I am looking for but it is causing the model to explode in size. This is a test model and only has 2,400 records. The model is under 1 megabyte but when I run the query it says it's loading for hours and hours and grows very large. I let the query run overnight, it finished. I tried loading to the model this morning and it has been running for about 3.5 hours and is currently almost 80 gigabytes.
Any ideas why this is happening and how to fix it?
Since your data is small, try buffering the table and list into memory before defining the custom column.
Working from cgao's code, this might look like:
let
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText(
"i45WMlTSUUrKTFdIKsovz1NITixRitWJVjICihZnJKanVyqk5+ekpOYppOSng2WMgTL5RYl56alwxSYgI3ISk7NBIgqJeSlQw8BaYgE=",
BinaryEncoding.Base64
),
Compression.Deflate
)
),
let
_t = ((type nullable text) meta [Serialized.Text = true])
in
type table [#"Title Index" = _t, #"Title cleaned" = _t]
),
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Title Index", Int64.Type}, {"Title cleaned", type text}}
),
BufferedKeywords = List.Buffer(Keywords),
BufferedTable = Table.Buffer(#"Changed Type"),
#"Added Custom" = Table.AddColumn(
BufferedTable,
"Contains Keyword",
each List.Intersect({BufferedKeywords, Text.Split([Title cleaned], " ")}),
type list
),
#"Expanded Contains Keyword" = Table.ExpandListColumn(#"Added Custom", "Contains Keyword")
in
#"Expanded Contains Keyword"
Thank you again @AlexisOlson - even with the buffering I'm finding Power Query too slow to work with on this query. The query works but takes hours and hours to run, and then hours to load. I'm going to try just using DAX which I'm more comfortable writing. I also might try doing it in Python which I just started dabbling in. I really appreciate your help with M.
Hmm. That's too bad. I'm pretty sure I can get it to work efficiently in Power Query but I'd need an example file to test against.
How can I send you a test file?
Here's an example test file that is really slow without buffering but fast with buffering.
Original source of data here:
https://www.kaggle.com/datasets/vstepanenko/disaster-tweets
That's a great example and very close to what I'm trying to accomplish. I'm going to try to mimic it. thanks!
Did you see the sample file I sent you via private message? I'll take a look at these that you sent.
You can link to a cloud file using e.g. Dropbox/Google Drive/SharePoint/OneDrive. Make sure the file doesn't contain any sensitive data before you make it public.
IN Dax Table
Table =
GENERATE (
src,
VAR str = src[Title cleaned]
RETURN
SELECTCOLUMNS (
FILTER ( Keywords, CONTAINSSTRING ( str, Keywords[Column1] ) ),
"val", Keywords[Column1]
)
)
in DAX MEasure
Measure =
CONCATENATEX (
GENERATE (
src,
VAR str = src[Title cleaned]
RETURN
SELECTCOLUMNS (
FILTER ( Keywords, CONTAINSSTRING ( str, Keywords[Column1] ) ),
"val", Keywords[Column1]
)
),
[val],
",",
[val], ASC
)
Thank you, I am looking for a Power Query solution
if you turn that list into a one column table, you can then add a column which uses each Table.FindText. This will bring in the whole row if it finds your text anywhere in the row.
--Nate