Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
NickDSL
Helper I
Helper I

Text Analytics extraction based on a defined list of words

Hello.

 

I was wondering if it would be possible to give Power BI a long list of freeform text and also give it a "dictionary" of predefined values then have it search our long list of freeform text and extract those values from the "dictionary".

 

Example: A survey is sent out asking about competitors and a longform text is returned to us. We want to only extract that competitors name from that longform text.

 

If theres a why for it to "learn" the format that the competitors are put in and then adapt to learn new ones that would also be adventageous. 

 

 

1 ACCEPTED SOLUTION

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hVJBbsJADPyKlXNVIVBVcYRWapHghnqhHJysQyw2a+R1iOhr+pa+rJtAWqkq7c2enZn1jnezyWY1Htlhtr3ZZBgcaZQANdWiJyhQCXIqsIkEbBEKL5EUTGDH6ktWCi6JoMSa/an3WFM0Druzn+ai4BOjGaBZD61JFQuCxfIv0CeRSfh4P/cx1fCE6ijEa8gR/ZdgTrqPsCSOTXrGko/DDHPfEDyTSvjRDsJ/gVx5V1mLRtr3T0IRoil2cBfPywq6XKhG9uDSaXhtRqPxvUErur+4PEgwLAw4lHKV9Jt1W1GARVqP9xfWinRHDlq2CgbbxSNM7ibT6fT2fFuVdgdzFdk79N97lRKsIoj8NtSsMFvCODEcTCDH5IkHVKspWG8VDU8pzDQ4LIJjDOfv46VA47QSjhDEUgIghw7ItttP", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Data = _t]),
    DictList = List.Buffer(Dict[Word List]),
    #"Added Custom" = Table.AddColumn(Source, "Result", each List.First(List.RemoveNulls(List.Transform(DictList,(x)=>if Text.Contains([Data],x, Comparer.OrdinalIgnoreCase) then x else null))))
in
    #"Added Custom"

Code for table named Dict

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bVLbbsIwDP0Vi+f9BN1VE9UQTLwgHlxqSlQ3QU5a1r+fO5LSor3Z5zi2z3H2+8WyKKiHF2LYYA9b14bz4vA04CSWwrmHFbZC7CFHU5K9kSXaQDEsja2Cs/MMPtuLCSQP6JrxSP9h8MyEorPqO+0nTf1Q6IwNEXGCMRKDwRxjdjoZa0I/SyB3uqyIytwZZqziBMbGqWh7jIJZFYdOKxJftM2MV/UPuTRo4Qtrf0d8hckZph/jYakN7RR49UFL0hOGLaGqG3OqxE1jyFxvB58ywnH0jZl5HLE1Ss1oyxm4IYvGe909rcaDo39iIScs3TXNv5xRKIToJwfcmRLhzUmAvFd5EW+T/3bQnA4lhRtuaIfTTpBvdX+8u/D0v/hB2TuKehtbd8iRzEhqDysyXj8grEyXumbcEnyQpDox1TlccbSiwU6XXhwOvw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Word List" = _t])
in
    Source

 

View solution in original post

6 REPLIES 6
Vijay_A_Verma
Super User
Super User

Please post your data for us to have a look. 

I cannot post my exact data but here is some dummy data:

 

Data:

Longform DataExpected Result From Query
AmavidaAmavida
anderson memory care because its closer to girlfirendand familyAnderson
Testing 
arbor landuingArbor Landing
Arbor Terrace ILArbor Terrace
Arbor Terrace ILArbor Terrace
Arlington Arlington
Aston GardensAston Gardens
Aston GardensAston Gardens
Avalon Avalon
Berks Leisure LivingBerks Leisure Living
Blue HeronBlue Heron
Blue Heron Blue Heron
Blue Heron Blue Heron
Blue Heron Blue Heron
brightwaterBrightwater
Goes straight to VM and email doesn't work  
Contact info doesn't work  
Goes straight to VM when I call  
Merged with Contact ID 353999. 
Chose Brookdale because of the size of their AL 2 bed 3 bath apartmentBrookdale
staying in Indiana 
location is not an option 

 

"Dictionary"

This would be a long list of competitor/community names. Here is a short list for an example. It would be MUCH longer (1000+ items)

Abbey Del Ray South 
Abernethy Laurels Maiden
Adante
Addington
Addington Jupiter
Addington Place
Addington Place Clear Lake
Addison
Addison Point
Adora
Adriatica
Affinity
Affinity Monterrey Village
Alamo Ranch
Albertville
Album Ranch
Alden Ranch
Alderman Oaks
Aldersgate
Alexis Allen
Alexis Estates
All Seasons
Allegro
Allegro Boynton Beach
Allegro Jupiter
Allegro Parkland
Allegro Renaissance
Allisonville Meadows
Alpharetta
AltaVida Fort Myers
Alura

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hVJBbsJADPyKlXNVIVBVcYRWapHghnqhHJysQyw2a+R1iOhr+pa+rJtAWqkq7c2enZn1jnezyWY1Htlhtr3ZZBgcaZQANdWiJyhQCXIqsIkEbBEKL5EUTGDH6ktWCi6JoMSa/an3WFM0Druzn+ai4BOjGaBZD61JFQuCxfIv0CeRSfh4P/cx1fCE6ijEa8gR/ZdgTrqPsCSOTXrGko/DDHPfEDyTSvjRDsJ/gVx5V1mLRtr3T0IRoil2cBfPywq6XKhG9uDSaXhtRqPxvUErur+4PEgwLAw4lHKV9Jt1W1GARVqP9xfWinRHDlq2CgbbxSNM7ibT6fT2fFuVdgdzFdk79N97lRKsIoj8NtSsMFvCODEcTCDH5IkHVKspWG8VDU8pzDQ4LIJjDOfv46VA47QSjhDEUgIghw7ItttP", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Data = _t]),
    DictList = List.Buffer(Dict[Word List]),
    #"Added Custom" = Table.AddColumn(Source, "Result", each List.First(List.RemoveNulls(List.Transform(DictList,(x)=>if Text.Contains([Data],x, Comparer.OrdinalIgnoreCase) then x else null))))
in
    #"Added Custom"

Code for table named Dict

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bVLbbsIwDP0Vi+f9BN1VE9UQTLwgHlxqSlQ3QU5a1r+fO5LSor3Z5zi2z3H2+8WyKKiHF2LYYA9b14bz4vA04CSWwrmHFbZC7CFHU5K9kSXaQDEsja2Cs/MMPtuLCSQP6JrxSP9h8MyEorPqO+0nTf1Q6IwNEXGCMRKDwRxjdjoZa0I/SyB3uqyIytwZZqziBMbGqWh7jIJZFYdOKxJftM2MV/UPuTRo4Qtrf0d8hckZph/jYakN7RR49UFL0hOGLaGqG3OqxE1jyFxvB58ywnH0jZl5HLE1Ss1oyxm4IYvGe909rcaDo39iIScs3TXNv5xRKIToJwfcmRLhzUmAvFd5EW+T/3bQnA4lhRtuaIfTTpBvdX+8u/D0v/hB2TuKehtbd8iRzEhqDysyXj8grEyXumbcEnyQpDox1TlccbSiwU6XXhwOvw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Word List" = _t])
in
    Source

 

I'm sorry I dont think I really understand how to implement this solution. I'm not the most advanced with power BI and could use some guidance in the right direction.

NickDSL_0-1654782289341.png

In the above screenshot I have pasted in the first block of code and it seems to be working kinda. But this is the first time I've used the advanced editor so I'm not too sure on how it works.

 

How would I change the longform data in the "Data" column. My data is data that would change month to month. 

 

Where would I paste the second block of code? If I just paste it right below I'm left with a "Token Eof expected." error. How would I also update this list?

NickDSL_1-1654782401203.png

 

You need to get data from your sources into PQ..Delete everything after Source statement.
Let's assume you got it from Excel. Then Source statement generated will be like this

Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content]

Put a comma after source statement.

Copy the entire code provided by me starting DictList. 

Paste the code.

So your code will become like this

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    DictList = List.Buffer(Dict[Word List]),
    #"Added Custom" = Table.AddColumn(Source, "Result", each List.First(List.RemoveNulls(List.Transform(DictList,(x)=>if Text.Contains([Data],x, Comparer.OrdinalIgnoreCase) then x else null))))
in
    #"Added Custom"

You will need to change column names here. If your column name is not Word List, say it is MyColumn. Then [Word List] will need to be replaced with [MyColumn].

@NickDSL This solution looks good.

 

If you want to match multiple dictionary words, then you can update the #"Added Custom" step to concatenate them:

Table.AddColumn(Source, "Result", each Text.Combine(List.Select(DictList, (x) => Text.Contains([Data], x, Comparer.OrdinalIgnoreCase))))

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

PBI_Carousel_NL_June

Fabric Community Update - June 2024

Get the latest Fabric updates from Build 2024, key Skills Challenge voucher deadlines, top blogs, forum posts, and product ideas.

Top Solution Authors
Top Kudoed Authors