Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello, I need some guidance on creating a correlation matrix between two multi-value variables.
Sample data:
1. Desired end-result (this was manually counted, incomplete and might have errors, just a rough illustration for clarity):
I want to have a correlation matrix which displays the number of times a Y-value (e.g. Y1) appears when an X-value (e.g. X1) appears.
What I have tried: I separated the two columns into two new tables (Table_X, Table_Y) and unpivoted each:
I also created an Index_Lookup Table with the indices. Then, I made a relationship between the index columns of Table_X / Table_Y and the Index_Lookup Table. These relationships are one-to-many from Index_Lookup (one) to Table_X / Table_Y (many). There're also Variable X_Lookup and Variable Y_Lookup that lists the possible values of X and Y respectively. Again, one-to-many relationship made from Variable X_Lookup / Variable Y_Lookup (one) to Table_X / Table_Y (many).
I selected VariableX_Name for rows and VariableY_Name for columns, then tried to put Count of Index from either Table_X or Table_Y as value, but it doesn't seem to work. I understand that in this sense the "flow" of the relationship arrows do not line up, so the matrix wouldn't work. How should I resolve this? Would a many-to-many relationship between the index columns of Table_X and Table_Y resolve the issue?
---
Another form of correlation I am trying to achieve is within-variable correlation.
2. Desired end result:
What I have tried: I followed this https://www.daxpatterns.com/basket-analysis/ and https://www.youtube.com/watch?v=8oJ7U0KvAI8.
Basket analysis might have worked (I got percentage values in the matrix) but I can't figure out how to modify the code to do a count instead of percentage. Also, not too sure if I'm inputting the right columns into the codes. Generally, I replaced:
Sales[ProductKey] = Table_Y[Variable_Y]
Sales[Order Number] = Table_Y[Index]
'And Product'[And ProductKey] = 'And Variable Y_Lookup'[And VariableY_Key]
The YouTube example didn't work as I couldn't figure out which column to put into value.
Highly appreciate any form of directions!
You're on the right track, sort of. First step is to create all permutations between x and y values. The index column is interesting but not really needed (except maybe for the self correlation you mentioned).
So you go from
to
by using two simple splits
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("NU7LFcAwCNrFM4dG82lnycv+azQQc9AnCOicVgxWELs7AtUWpjnRrsCAiwmhLlUXU+lTEbU9kalouae2wdVP6mBG3jmu9zjkJf5ScfflyUcUun4=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Index = _t, #"Variable X" = _t, #"Variable Y" = _t]),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Variable X", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Variable X"),
#"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Split Column by Delimiter", {{"Variable Y", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Variable Y"),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Index", Int64.Type}, {"Variable X", Int64.Type}, {"Variable Y", Int64.Type}})
in
#"Changed Type"
Then in DAX you can do a simple Pearson correlation from a quick measure if you like.
, you can modify the auto generated code, or you can roll your own.
Check out the July 2025 Power BI update to learn about new features.
User | Count |
---|---|
73 | |
72 | |
39 | |
31 | |
26 |