Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
UBComma
Helper III
Helper III

Create new records from keyword list

@AlexisOlson or @BA_Pete   Any chance you would have a thought on this?

I am trying to add a new column and new records to a table based on whether a list of keywords exists in the 'Title cleaned' column. The result I'm trying to create is the lower table. I'm new to writing M code and struggling with the way to do this, any help will be greatly appreciated.

 

Title Index

Title cleaned

1

big brown cat

2

shaggy golden dog

3

orange cat

4

black cat and brown dog

 

Keywords (list)

cat

dog

brown

 

Title Index

Title cleaned

Contains Keyword

1

big brown cat

cat

1

big brown cat

brown

2

shaggy golden dog

dog

3

orange cat

cat

4

black cat and brown dog

cat

4

black cat and brown dog

dog

4

black cat and brown dog

brown

 

1 ACCEPTED SOLUTION

You can link to a cloud file using e.g. Dropbox/Google Drive/SharePoint/OneDrive. Make sure the file doesn't contain any sensitive data before you make it public.

View solution in original post

14 REPLIES 14
v-cgao-msft
Community Support
Community Support

Hi @UBComma ,

 

Please refer to the following steps:

let
  Source = Table.FromRows(
    Json.Document(
      Binary.Decompress(
        Binary.FromText(
          "i45WMlTSUUrKTFdIKsovz1NITixRitWJVjICihZnJKanVyqk5+ekpOYppOSng2WMgTL5RYl56alwxSYgI3ISk7NBIgqJeSlQw8BaYgE=", 
          BinaryEncoding.Base64
        ), 
        Compression.Deflate
      )
    ), 
    let
      _t = ((type nullable text) meta [Serialized.Text = true])
    in
      type table [#"Title Index" = _t, #"Title cleaned" = _t]
  ), 
  #"Changed Type" = Table.TransformColumnTypes(
    Source, 
    {{"Title Index", Int64.Type}, {"Title cleaned", type text}}
  ), 
  #"Added Custom" = Table.AddColumn(
    #"Changed Type", 
    "Contains Keyword", 
    each List.Transform(
      List.Transform(
        {
          List.Transform(
            {0 .. List.Count(Keywords) - 1}, 
            (x) => Number.From(List.Contains(Text.Split([Title cleaned], " "), Keywords{x}))
          )
        }, 
        each List.PositionOf(_, 1, 2)
      ){0}, 
      each Keywords{_}
    )
  ), 
  #"Expanded Contains Keyword" = Table.ExpandListColumn(#"Added Custom", "Contains Keyword")
in
  #"Expanded Contains Keyword"

vcgaomsft_0-1675846279656.png

Best Regards,
Gao

Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly. If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly --  How to provide sample data in the Power BI Forum

There's no need to create an index to iterate over. You can replace

each List.Transform(
  List.Transform(
    {
      List.Transform(
        {0 .. List.Count(Keywords) - 1}, 
        (x) => Number.From(List.Contains(Text.Split([Title cleaned], " "), Keywords{x}))
      )
    }, 
    each List.PositionOf(_, 1, 2)
  ){0}, 
  each Keywords{_}
)

with

each List.Intersect({Keywords, Text.Split([Title cleaned], " ")})

 

That you very much for your insights and suggestions. This produces the results that I am looking for but it is causing the model to explode in size. This is a test model and only has 2,400 records. The model is under 1 megabyte but when I run the query it says it's loading for hours and hours and grows very large. I let the query run overnight, it finished. I tried loading to the model this morning and it has been running for about 3.5 hours and is currently almost 80 gigabytes.

 

Any ideas why this is happening and how to fix it?

UBComma_0-1676130486803.png

 

Since your data is small, try buffering the table and list into memory before defining the custom column.

 

Working from cgao's code, this might look like:

let
  Source = Table.FromRows(
    Json.Document(
      Binary.Decompress(
        Binary.FromText(
          "i45WMlTSUUrKTFdIKsovz1NITixRitWJVjICihZnJKanVyqk5+ekpOYppOSng2WMgTL5RYl56alwxSYgI3ISk7NBIgqJeSlQw8BaYgE=",
          BinaryEncoding.Base64
        ),
        Compression.Deflate
      )
    ),
    let
      _t = ((type nullable text) meta [Serialized.Text = true])
    in
      type table [#"Title Index" = _t, #"Title cleaned" = _t]
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    Source,
    {{"Title Index", Int64.Type}, {"Title cleaned", type text}}
  ),
  BufferedKeywords = List.Buffer(Keywords),
  BufferedTable = Table.Buffer(#"Changed Type"),
  #"Added Custom" = Table.AddColumn(
    BufferedTable,
    "Contains Keyword",
    each List.Intersect({BufferedKeywords, Text.Split([Title cleaned], " ")}),
    type list
  ),
  #"Expanded Contains Keyword" = Table.ExpandListColumn(#"Added Custom", "Contains Keyword")
in
  #"Expanded Contains Keyword"

 

Thank you again @AlexisOlson - even with the buffering I'm finding Power Query too slow to work with on this query. The query works but takes hours and hours to run, and then hours to load. I'm going to try just using DAX which I'm more comfortable writing. I also might try doing it in Python which I just started dabbling in. I really appreciate your help with M.

Hmm. That's too bad. I'm pretty sure I can get it to work efficiently in Power Query but I'd need an example file to test against.

How can I send you a test file?

Here's an example test file that is really slow without buffering but fast with buffering.

Original source of data here:
https://www.kaggle.com/datasets/vstepanenko/disaster-tweets

That's a great example and very close to what I'm trying to accomplish. I'm going to try to mimic it.  thanks!

Did you see the sample file I sent you via private message? I'll take a look at these that you sent.

You can link to a cloud file using e.g. Dropbox/Google Drive/SharePoint/OneDrive. Make sure the file doesn't contain any sensitive data before you make it public.

smpa01
Super User
Super User

IN Dax Table

 

Table =
GENERATE (
    src,
    VAR str = src[Title cleaned]
    RETURN
        SELECTCOLUMNS (
            FILTER ( Keywords, CONTAINSSTRING ( str, Keywords[Column1] ) ),
            "val", Keywords[Column1]
        )
)

 

in DAX MEasure

Measure = 
CONCATENATEX (
    GENERATE (
        src,
        VAR str = src[Title cleaned]
        RETURN
            SELECTCOLUMNS (
                FILTER ( Keywords, CONTAINSSTRING ( str, Keywords[Column1] ) ),
                "val", Keywords[Column1]
            )
    ),
    [val],
    ",",
    [val], ASC
)

 

smpa01_0-1675808337704.png

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Thank you, I am looking for a Power Query solution

watkinnc
Super User
Super User

if you turn that list into a one column table, you can then add a column which uses each Table.FindText. This will bring in the whole row if it finds your text anywhere in the row.

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors