Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

View all the Fabric Data Days sessions on demand. View schedule

Reply
TheOriginalDeeb
Frequent Visitor

Is Word Co-Occurence possible in Power Query?

I'm hoping to re-create a solution that was achieved using an R script (because R makes no sense to me) : https://stackoverflow.com/questions/27153320/build-word-co-occurence-edge-list-in-r

Here's the original input and desired output from the R solution

 

DF:

sentence_id text
1           a b c d e
2           a b b e
3           b c d
4           a e
5           a
6           a a a

OUTPUT

word1 word2 freq
a     b     2
a     c     1
a     d     1
a     e     3
b     c     2
b     d     2
b     e     2
c     d     2
c     e     1
d     e     1

 

The only difference in my data is that it would be a table column of actual words/phrases but the desied output is basically the same.

Call all experts to see if this would be possible and any tips on how to do it.

Thanks!

1 ACCEPTED SOLUTION
RobertSlattery
Responsive Resident
Responsive Resident

Here is a solution that is in one query in M.

 image.png

 

 

let
    Source = DF,
    sentences = List.Transform(DF[text], each Text.Lower(_)),
    minSentences = List.Transform(sentences, 
        each List.Distinct(Text.Split(_, " "))
    ),
    wordsList = List.Combine(minSentences),
    wordsDistinct = List.Distinct(wordsList),
    #"Converted to Table" = Table.FromList(wordsDistinct, Splitter.SplitByNothing(), {"word"}, null, ExtraValues.Error),
    #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"word", type text}}),
    crossTab = List.Accumulate(wordsDistinct, #"Changed Type", 
        (table, col) => Table.AddColumn(table, col, each 0, Int64.Type)
    ),
    records = Table.TransformRows(crossTab, each
        let
            currentWord = [word],
            currentWordIndex = List.PositionOf(wordsDistinct, currentWord)
        in
            // input the current row into an accumulator that iterates over the sentences
            List.Accumulate(minSentences, _, (r, s) =>
                // if the current word is in the sentence
                if List.Contains(s, currentWord) then
                    let
                        coWords = List.RemoveMatchingItems(s, {currentWord})
                    in
                    //  increment co-word columns
                        if not List.IsEmpty(coWords) then
                            // iterate the co-words operating on the row to increment the column
                            List.Accumulate(coWords, r, (row, coWord) =>
                                if List.PositionOf(wordsDistinct, coWord) > currentWordIndex then
                                    Record.TransformFields(row, {coWord, each _ + 1})
                                else
                                    row
                            )
                        else
                            r
                else
                    r
            )
    ),
    crossTabPopulated = Table.FromRecords(records),
    #"Changed Type1" = Table.TransformColumnTypes(crossTabPopulated,{{"word", type text}, {"a", Int64.Type}, {"b", Int64.Type}, {"c", Int64.Type}, {"d", Int64.Type}, {"e", Int64.Type}}),
    #"Unpivoted Only Selected Columns" = Table.Unpivot(#"Changed Type1", wordsDistinct, "coWord", "Count"),
    output = Table.SelectRows(#"Unpivoted Only Selected Columns", each [word] <> [coWord] and [Count] <> 0)
in
    output

 

View solution in original post

6 REPLIES 6
TheOriginalDeeb
Frequent Visitor

Lots of great solutions. I'm going with the M code.

 

But many thanks to everybody who took the time to provide help.

RobertSlattery
Responsive Resident
Responsive Resident

Here is a solution that is in one query in M.

 image.png

 

 

let
    Source = DF,
    sentences = List.Transform(DF[text], each Text.Lower(_)),
    minSentences = List.Transform(sentences, 
        each List.Distinct(Text.Split(_, " "))
    ),
    wordsList = List.Combine(minSentences),
    wordsDistinct = List.Distinct(wordsList),
    #"Converted to Table" = Table.FromList(wordsDistinct, Splitter.SplitByNothing(), {"word"}, null, ExtraValues.Error),
    #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"word", type text}}),
    crossTab = List.Accumulate(wordsDistinct, #"Changed Type", 
        (table, col) => Table.AddColumn(table, col, each 0, Int64.Type)
    ),
    records = Table.TransformRows(crossTab, each
        let
            currentWord = [word],
            currentWordIndex = List.PositionOf(wordsDistinct, currentWord)
        in
            // input the current row into an accumulator that iterates over the sentences
            List.Accumulate(minSentences, _, (r, s) =>
                // if the current word is in the sentence
                if List.Contains(s, currentWord) then
                    let
                        coWords = List.RemoveMatchingItems(s, {currentWord})
                    in
                    //  increment co-word columns
                        if not List.IsEmpty(coWords) then
                            // iterate the co-words operating on the row to increment the column
                            List.Accumulate(coWords, r, (row, coWord) =>
                                if List.PositionOf(wordsDistinct, coWord) > currentWordIndex then
                                    Record.TransformFields(row, {coWord, each _ + 1})
                                else
                                    row
                            )
                        else
                            r
                else
                    r
            )
    ),
    crossTabPopulated = Table.FromRecords(records),
    #"Changed Type1" = Table.TransformColumnTypes(crossTabPopulated,{{"word", type text}, {"a", Int64.Type}, {"b", Int64.Type}, {"c", Int64.Type}, {"d", Int64.Type}, {"e", Int64.Type}}),
    #"Unpivoted Only Selected Columns" = Table.Unpivot(#"Changed Type1", wordsDistinct, "coWord", "Count"),
    output = Table.SelectRows(#"Unpivoted Only Selected Columns", each [word] <> [coWord] and [Count] <> 0)
in
    output

 

Amazing. Thank you so much!

Ashish_Mathur
Super User
Super User

Hi,

 

You may refer to my solution here.

 

Hope this helps.

 

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
v-yulgu-msft
Microsoft Employee
Microsoft Employee

Hi @TheOriginalDeeb,

 

After loading original table into desktop, first duplicate it in Query Editor mode.

 

1. Split column [text].

    1.PNG2.PNG

 

2. You need to convert above table into below structure via Power Query (select columns one by one and union). Unfortunately, I haven't worked it out as I am not very familiar with M code. And remove duplicate values.

 3.PNG  4.PNG

 

3. Remove unnecessary columns. Sort [Text] in Ascending order. Then, add an index column.

6.PNG

 

4. Apply all above changes. Switch to report view mode. You then need to create several auxilliary tables like below:

   

Test Table3 = 
FILTER (
    CROSSJOIN (
        SELECTCOLUMNS (
            'Test Table2',
            "Word1", 'Test Table2'[Text],
            "Index1", 'Test Table2'[Index]
        ),
        SELECTCOLUMNS (
            'Test Table2',
            "Word2", 'Test Table2'[Text],
            "Index2", 'Test Table2'[Index]
        )
    ),
    [Index1] < [Index2]
)

7.PNG

 

test Table4 =
ADDCOLUMNS (
    CROSSJOIN (
        'Test Table',
        SELECTCOLUMNS (
            'Test Table3',
            "Word1", 'Test Table3'[Word1],
            "Word2", 'Test Table3'[Word2]
        )
    ),
    "Find Word1", AND (
        NOT ( ISERROR ( FIND ( [Word1], 'Test Table'[text] ) ) ),
        NOT ( ISERROR ( FIND ( [Word2], 'Test Table'[text] ) ) )
    )
)

8.PNG

 

Test Table5 =
SUMMARIZE (
    'test Table4',
    'test Table4'[Word1],
    'test Table4'[Word2],
    "freq", CALCULATE (
        COUNTROWS ( 'test Table4' ),
        FILTER ( 'test Table4', 'test Table4'[Find Word1] = TRUE () )
    )
)

9.PNG

 

Best regards,

Yuliana Gu

Community Support Team _ Yuliana Gu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks so much for replying. BUt... I'm having a very hard time following your steps.

 

Can you upload your smple file?

Helpful resources

Announcements
November Power BI Update Carousel

Power BI Monthly Update - November 2025

Check out the November 2025 Power BI update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors
Top Kudoed Authors