Solved: Re: Is Word Co-Occurence possible in Power Query?

TheOriginalDeeb · ‎02-28-2018

I'm hoping to re-create a solution that was achieved using an R script (because R makes no sense to me) : https://stackoverflow.com/questions/27153320/build-word-co-occurence-edge-list-in-r

Here's the original input and desired output from the R solution

DF:

sentence_id text
1           a b c d e
2           a b b e
3           b c d
4           a e
5           a
6           a a a

OUTPUT

word1 word2 freq
a     b     2
a     c     1
a     d     1
a     e     3
b     c     2
b     d     2
b     e     2
c     d     2
c     e     1
d     e     1

The only difference in my data is that it would be a table column of actual words/phrases but the desied output is basically the same.

Call all experts to see if this would be possible and any tips on how to do it.

Thanks!

RobertSlattery · ‎03-03-2018

Here is a solution that is in one query in M.

let
    Source = DF,
    sentences = List.Transform(DF[text], each Text.Lower(_)),
    minSentences = List.Transform(sentences, 
        each List.Distinct(Text.Split(_, " "))
    ),
    wordsList = List.Combine(minSentences),
    wordsDistinct = List.Distinct(wordsList),
    #"Converted to Table" = Table.FromList(wordsDistinct, Splitter.SplitByNothing(), {"word"}, null, ExtraValues.Error),
    #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"word", type text}}),
    crossTab = List.Accumulate(wordsDistinct, #"Changed Type", 
        (table, col) => Table.AddColumn(table, col, each 0, Int64.Type)
    ),
    records = Table.TransformRows(crossTab, each
        let
            currentWord = [word],
            currentWordIndex = List.PositionOf(wordsDistinct, currentWord)
        in
            // input the current row into an accumulator that iterates over the sentences
            List.Accumulate(minSentences, _, (r, s) =>
                // if the current word is in the sentence
                if List.Contains(s, currentWord) then
                    let
                        coWords = List.RemoveMatchingItems(s, {currentWord})
                    in
                    //  increment co-word columns
                        if not List.IsEmpty(coWords) then
                            // iterate the co-words operating on the row to increment the column
                            List.Accumulate(coWords, r, (row, coWord) =>
                                if List.PositionOf(wordsDistinct, coWord) > currentWordIndex then
                                    Record.TransformFields(row, {coWord, each _ + 1})
                                else
                                    row
                            )
                        else
                            r
                else
                    r
            )
    ),
    crossTabPopulated = Table.FromRecords(records),
    #"Changed Type1" = Table.TransformColumnTypes(crossTabPopulated,{{"word", type text}, {"a", Int64.Type}, {"b", Int64.Type}, {"c", Int64.Type}, {"d", Int64.Type}, {"e", Int64.Type}}),
    #"Unpivoted Only Selected Columns" = Table.Unpivot(#"Changed Type1", wordsDistinct, "coWord", "Count"),
    output = Table.SelectRows(#"Unpivoted Only Selected Columns", each [word] <> [coWord] and [Count] <> 0)
in
    output

View solution in original post

TheOriginalDeeb · ‎03-05-2018

Lots of great solutions. I'm going with the M code.

But many thanks to everybody who took the time to provide help.

RobertSlattery · ‎03-03-2018

Here is a solution that is in one query in M.

let
    Source = DF,
    sentences = List.Transform(DF[text], each Text.Lower(_)),
    minSentences = List.Transform(sentences, 
        each List.Distinct(Text.Split(_, " "))
    ),
    wordsList = List.Combine(minSentences),
    wordsDistinct = List.Distinct(wordsList),
    #"Converted to Table" = Table.FromList(wordsDistinct, Splitter.SplitByNothing(), {"word"}, null, ExtraValues.Error),
    #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"word", type text}}),
    crossTab = List.Accumulate(wordsDistinct, #"Changed Type", 
        (table, col) => Table.AddColumn(table, col, each 0, Int64.Type)
    ),
    records = Table.TransformRows(crossTab, each
        let
            currentWord = [word],
            currentWordIndex = List.PositionOf(wordsDistinct, currentWord)
        in
            // input the current row into an accumulator that iterates over the sentences
            List.Accumulate(minSentences, _, (r, s) =>
                // if the current word is in the sentence
                if List.Contains(s, currentWord) then
                    let
                        coWords = List.RemoveMatchingItems(s, {currentWord})
                    in
                    //  increment co-word columns
                        if not List.IsEmpty(coWords) then
                            // iterate the co-words operating on the row to increment the column
                            List.Accumulate(coWords, r, (row, coWord) =>
                                if List.PositionOf(wordsDistinct, coWord) > currentWordIndex then
                                    Record.TransformFields(row, {coWord, each _ + 1})
                                else
                                    row
                            )
                        else
                            r
                else
                    r
            )
    ),
    crossTabPopulated = Table.FromRecords(records),
    #"Changed Type1" = Table.TransformColumnTypes(crossTabPopulated,{{"word", type text}, {"a", Int64.Type}, {"b", Int64.Type}, {"c", Int64.Type}, {"d", Int64.Type}, {"e", Int64.Type}}),
    #"Unpivoted Only Selected Columns" = Table.Unpivot(#"Changed Type1", wordsDistinct, "coWord", "Count"),
    output = Table.SelectRows(#"Unpivoted Only Selected Columns", each [word] <> [coWord] and [Count] <> 0)
in
    output

TheOriginalDeeb · ‎03-05-2018

Amazing. Thank you so much!

Ashish_Mathur · ‎03-02-2018

Hi,

You may refer to my solution here.

Hope this helps.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

v-yulgu-msft · ‎03-02-2018

Hi @TheOriginalDeeb,

After loading original table into desktop, first duplicate it in Query Editor mode.

1. Split column [text].

2. You need to convert above table into below structure via Power Query (select columns one by one and union). Unfortunately, I haven't worked it out as I am not very familiar with M code. And remove duplicate values.

3. Remove unnecessary columns. Sort [Text] in Ascending order. Then, add an index column.

4. Apply all above changes. Switch to report view mode. You then need to create several auxilliary tables like below:

Test Table3 = 
FILTER (
    CROSSJOIN (
        SELECTCOLUMNS (
            'Test Table2',
            "Word1", 'Test Table2'[Text],
            "Index1", 'Test Table2'[Index]
        ),
        SELECTCOLUMNS (
            'Test Table2',
            "Word2", 'Test Table2'[Text],
            "Index2", 'Test Table2'[Index]
        )
    ),
    [Index1] < [Index2]
)

test Table4 =
ADDCOLUMNS (
    CROSSJOIN (
        'Test Table',
        SELECTCOLUMNS (
            'Test Table3',
            "Word1", 'Test Table3'[Word1],
            "Word2", 'Test Table3'[Word2]
        )
    ),
    "Find Word1", AND (
        NOT ( ISERROR ( FIND ( [Word1], 'Test Table'[text] ) ) ),
        NOT ( ISERROR ( FIND ( [Word2], 'Test Table'[text] ) ) )
    )
)

Test Table5 =
SUMMARIZE (
    'test Table4',
    'test Table4'[Word1],
    'test Table4'[Word2],
    "freq", CALCULATE (
        COUNTROWS ( 'test Table4' ),
        FILTER ( 'test Table4', 'test Table4'[Find Word1] = TRUE () )
    )
)

Best regards,

Yuliana Gu

Community Support Team _ Yuliana Gu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

TheOriginalDeeb · ‎03-02-2018

Thanks so much for replying. BUt... I'm having a very hard time following your steps.

Can you upload your smple file?

Is Word Co-Occurence possible in Power Query?

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta