Microsoft Fabric Community Conference 2025, March 31 - April 2, Las Vegas, Nevada. Use code FABINSIDER for a $400 discount.
Register nowGet inspired! Check out the entries from the Power BI DataViz World Championships preliminary rounds and give kudos to your favorites. View the vizzies.
Hi
I have a table 'TwitterPosts' containing the text of Tweet messages in column [MIDText]. I have a second table 'MyKeywords' containing the columns [KeywordText] and [KeywordScore] - the former contains keyword text strings and the later decimal numbers.
I want to search the text of each Tweet in 'TwitterPosts'[MIDText] for any of the keywords in 'MyKeywords'[KeywordText] and return the keywords found in a new column. If a Tweet contains multiple keywords I want each keyword to be returned and separated from other returned keywords from the same Tweet by a comma delimiter. The process should not be case-sensitive and should find exact matches i.e. <Oxford> should not return the keyword <ford> etc.
Furthermore, if a distinct keyword appears two or more times in a Tweet it should only be returned (counted) once i.e. the return confirms a keyword is present in a Tweet but does not reflect how many times it appeared within the tweet.
Finally, each keyword has a corresponding score contained within 'MyKeywords'[KeywordScore]. Ideally would like another column that contains the SUM of all the scores for each keyword found within a Tweet, but a column containing comma delimited keyword scores would suffice. e.g.
'TwitterPosts'[MIDText]: "The quick brown fox jumps over the lazy fox"
Keywords detected:
Fox [0.7]
Brown [1.2]
Quick [0.3]
Desired output A
Keyword Score SUM = 0.7+1.2+0.3 = 2.2
Desired output B [0.7,1.2,0.3]
My ideal solution would be a custom function that operates exclusively in Power Query. I have a bunch of DAX formulas and Power Query functions that perform some of these tasks, but it's beyond my current capabilities to create something more elegant that ties everything together into a more efficient process.
Any help would be much appreciated!
Jason
Hi @Colourtext ,
There might be mor elegant solution, but here is one alternative:
1) in power query, copy column of twitter post and then split it by delimited (space delimiter)
2) select all new columns except original column and go to transform, choose unpivot columns. In this step, we created row per each word
3) exit Power Query, go to relationships and create join between main table and Keywords Score table; make join between [TwitterPost].Value (new column with distinct words) and [Keywords Score].Keyword
4) in table [TwitterPost] add several columns:
- Keyword value = RELATED(Keywords[Score]) -- to get score for each word
- IsDuplicate = CALCULATE(COUNT(TwitterPost[Value]),ALLEXCEPT(TwitterPost,TwitterPost[MidText],TwitterPost[Value])) -- to check if words is shown multiple times
- Final Score = TwitterPost[Keyword value]/TwitterPost[IsDuplicate] -- final calculation to be displayed (measure)
Cheers,
Nemanja
Hi Nemanja
Thanks so much for looking at this problem. Your solution works well for single word keywords but not with multi-word entities like "brown dog" or "quick fox". I get a random number of multi-word entities in the keyword lists I work with, so sadly I'll have to keep looking.
My best solution for far in Dax works as follows:
OnTopic Keyword Scores =
CONCATENATEX(
//Iterate through keyword table
'My Keywords',
IF(
SEARCH(FIRSTNONBLANK('My Keywords'[Keywords],1),'My Table'[TwitterPost],,999) <> 999,
//Add a delimiter to separate individual keyword scores within the output
CONCATENATE('My Keywords'[Keywords Score],","),
""
)
)
Once this calculated column is done I then have to output the table to a .csv via DaxStudio, then re-import this .csv file and split the previously calculated column in Query Editor by the comma delimiters. I have to do this because I can't split a calculatedcoloumn directly within Dax or the Query Editor.
If you can think of any improvements on this process I would be delighted to receive them!
Best
Jason
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the February 2025 Power BI update to learn about new features.
User | Count |
---|---|
29 | |
26 | |
23 | |
13 | |
10 |
User | Count |
---|---|
24 | |
21 | |
17 | |
11 | |
9 |