- Microsoft Power BI Community
- Welcome to the Community!
- News & Announcements
- Get Help with Power BI
- Desktop
- Service
- Report Server
- Power Query
- Mobile Apps
- Developer
- DAX Commands and Tips
- Custom Visuals Development Discussion
- Health and Life Sciences
- Power BI Spanish Community
- Translated Spanish Desktop
- Power Platform Integration - Better Together!
- Power Platform Integrations
- Power Platform and Dynamics 365 Integrations
- Training and Consulting
- Instructor Led Training
- Galleries
- Community Connections & How-To Videos
- COVID-19 Data Stories Gallery
- Themes Gallery
- Data Stories Gallery
- R Script Showcase
- Webinars and Video Gallery
- Quick Measures Gallery
- 2021 MSBizAppsSummit Gallery
- 2020 MSBizAppsSummit Gallery
- 2019 MSBizAppsSummit Gallery
- Events
- Ideas
- Custom Visuals Ideas
- Issues
- Issues
- Events
- Upcoming Events
- Community Engagement
- T-Shirt Design Challenge 2023
- Community Blog
- Power BI Community Blog
- Custom Visuals Community Blog
- Community Support
- Community Accounts & Registration
- Using the Community
- Community Feedback

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Microsoft Power BI Community
- Forums
- Get Help with Power BI
- Desktop
- Re: Correlation between same column, different ite...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Correlation between same column, different items for time periods

04-23-2018
10:03 PM

Afternoon,

I'm trying to find the correlation between the [close] column values of the 'StockbarDataExample' table for different companies in the 'StockSymbolExchangeCode' column.

I'm not sure how to compare the two time series given they share the same column for values.

Pbix & Source Link: https://1drv.ms/f/s!At8Q-ZbRnAj8iF5lSSIdn2oi2tGj

Solved! Go to Solution.

1 ACCEPTED SOLUTION

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

04-25-2018
06:13 AM

In this sort of "pairwise comparison" scenario, where you have multiple entities in the same table (in this case distinguished by **StockSymbolCurrency**) I would normally follow the below steps.

My modifed copy of your pbix is here:

https://www.dropbox.com/s/y9h2ncitj46ee6a/PowerBiForumExample2%20Owen%20edit.pbix?dl=0

- Create a copy of the entity dimension table, in your case a copy of
**ReferenceTable**which I would call**ReferenceTableComparison** - Create a relationship between
**StockBarDatExample**and**ReferenceTableComparison**, but make it inactive - Create the appropriate value measure that will be used for the company selected in
**ReferenceTable**. For testing purposes I createdAverage Close = AVERAGE ( StockBarDatExample[close] )

- Create the same measure for the Comparison Company, which activates the inactive relationship, and clears the filter on
**ReferenceTable**:Average Close Comparison = CALCULATE ( [Average Close], ALL ( ReferenceTable ), USERELATIONSHIP ( StockBarDatExample[StockSymbolCurrency], ReferenceTableComparison[StockSymbolCurrency] ) )

- You can then selected Company & Comparison Company using slicers, and use
**Average Close**&**Average Close Comparison**together in visuals. - The Pearson Correlation Coefficient can be calculated using a method similar to that used here. This relies on having the above two measures set up.

Here is the measure I tested with your data:Pearson Correlation Coefficient = VAR DateTimes = // Create a table of date/times where both stocks have a Close value FILTER ( SUMMARIZE ( StockBarDatExample, DateTable[DateKey], TimeTable[Column1] ), // Date & Time columns AND ( NOT ( ISBLANK ( [Average Close] ) ), NOT ( ISBLANK ( [Average Close Comparison] ) ) ) ) // Construct table of pairs of Close values VAR Known = SELECTCOLUMNS ( DateTimes, "Known[X]", [Average Close], "Known[Y]", [Average Close Comparison] ) // Calculate correlation coefficient VAR Count_Items = COUNTROWS ( Known ) VAR Average_X = AVERAGEX ( Known, Known[X] ) VAR Average_X2 = AVERAGEX ( Known, Known[X] ^ 2 ) VAR Average_Y = AVERAGEX ( Known, Known[Y] ) VAR Average_Y2 = AVERAGEX ( Known, Known[Y] ^ 2 ) VAR Average_XY = AVERAGEX ( Known, Known[X] * Known[Y] ) VAR CorrelationCoefficient = DIVIDE ( Average_XY - Average_X * Average_Y, SQRT ( ( Average_X2 - Average_X ^ 2 ) * ( Average_Y2 - Average_Y ^ 2 ) ) ) RETURN CorrelationCoefficient

The test report page in the above pbix looks like this:

Hopefully that helps. 🙂

Regards,

Owen

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

04-24-2018
07:54 PM

HI @ElliotP,

You can use below formula to get diff between two legend:

Diff = VAR c1 = FIRSTNONBLANK ( ALL ( ReferenceTable[CompanyName] ), [CompanyName] ) VAR c2 = LASTNONBLANK ( ALL ( ReferenceTable[CompanyName] ), [CompanyName] ) RETURN ABS ( CALCULATE ( SUM ( StockBarDatExample[close] ), ReferenceTable[CompanyName] = c1 ) - CALCULATE ( SUM ( StockBarDatExample[close] ), ReferenceTable[CompanyName] = c2 ) )

AFAIK, current line chart not support use multiple value field and legend at same time, I'd like to suggest you use 'line and clustered column chart' to instead.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin

If this post**helps**, please consider **accept as solution** to help other members find it more quickly.

If this post

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

04-25-2018
03:42 AM

@v-shex-msftThanks for the response.

I'm not trying to calculate the difference between the different company's closes, but I think the principles might be able to work. I would like to be able to calculate the correlation between the time series data for each company; but I'm unable to at the moment as both time series values are in the same column ([close]), but they two time series are distinguishable by the 'StockSymbolCurrency' column string value.

The idea of using variables to seperate the time series values before returning a correlation figure is interesting, but I'm not sure how to get each variable to filter over subsequent different string values in the 'StockSymbolCurrency' column.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

04-25-2018
06:13 AM

In this sort of "pairwise comparison" scenario, where you have multiple entities in the same table (in this case distinguished by **StockSymbolCurrency**) I would normally follow the below steps.

My modifed copy of your pbix is here:

https://www.dropbox.com/s/y9h2ncitj46ee6a/PowerBiForumExample2%20Owen%20edit.pbix?dl=0

- Create a copy of the entity dimension table, in your case a copy of
**ReferenceTable**which I would call**ReferenceTableComparison** - Create a relationship between
**StockBarDatExample**and**ReferenceTableComparison**, but make it inactive - Create the appropriate value measure that will be used for the company selected in
**ReferenceTable**. For testing purposes I createdAverage Close = AVERAGE ( StockBarDatExample[close] )

- Create the same measure for the Comparison Company, which activates the inactive relationship, and clears the filter on
**ReferenceTable**:Average Close Comparison = CALCULATE ( [Average Close], ALL ( ReferenceTable ), USERELATIONSHIP ( StockBarDatExample[StockSymbolCurrency], ReferenceTableComparison[StockSymbolCurrency] ) )

- You can then selected Company & Comparison Company using slicers, and use
**Average Close**&**Average Close Comparison**together in visuals. - The Pearson Correlation Coefficient can be calculated using a method similar to that used here. This relies on having the above two measures set up.

Here is the measure I tested with your data:Pearson Correlation Coefficient = VAR DateTimes = // Create a table of date/times where both stocks have a Close value FILTER ( SUMMARIZE ( StockBarDatExample, DateTable[DateKey], TimeTable[Column1] ), // Date & Time columns AND ( NOT ( ISBLANK ( [Average Close] ) ), NOT ( ISBLANK ( [Average Close Comparison] ) ) ) ) // Construct table of pairs of Close values VAR Known = SELECTCOLUMNS ( DateTimes, "Known[X]", [Average Close], "Known[Y]", [Average Close Comparison] ) // Calculate correlation coefficient VAR Count_Items = COUNTROWS ( Known ) VAR Average_X = AVERAGEX ( Known, Known[X] ) VAR Average_X2 = AVERAGEX ( Known, Known[X] ^ 2 ) VAR Average_Y = AVERAGEX ( Known, Known[Y] ) VAR Average_Y2 = AVERAGEX ( Known, Known[Y] ^ 2 ) VAR Average_XY = AVERAGEX ( Known, Known[X] * Known[Y] ) VAR CorrelationCoefficient = DIVIDE ( Average_XY - Average_X * Average_Y, SQRT ( ( Average_X2 - Average_X ^ 2 ) * ( Average_Y2 - Average_Y ^ 2 ) ) ) RETURN CorrelationCoefficient

The test report page in the above pbix looks like this:

Hopefully that helps. 🙂

Regards,

Owen

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

06-23-2022
09:17 AM

Hi @OwenAuger,

Thank you for posting this dashboard!

I am trying to do something similar with multiple stocks. Your appraoch looks promising however I think the result you are getting is not correct in your example I believe the correlation should be 0.97714 instead of 0.94. It seems to be linked to the issue discussed in this thread

When adjusting some of the underlying calculations I get the correct results for some of them but not for all. (for instance

VAR Average_X2 = IF(COUNTROWS(Known)<>1,AVERAGEX(Known, Known[X] ^ 2 )) ) gives the expected result but the same adjustment on VAR Average_XY is not working.

Do you have any idea how to fix the calculation?

I have been looking into it but cannot wrap my head around it..

Léo

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

07-04-2022
06:25 AM

Thanks for your reply on this topic 🙂

My original reply created a measure that calculation the correlation coefficient based only on records where X and Y are both nonblank.

With that assumption, I believe the correlation coefficient of 0.94 is correct (can be tested by pasting data to Excel, removing rows where either X or Y is blank, and applying the CORREL function).

For your particular case, I'm not sure where the calculation is going wrong. It could possibly relate to how the **Known** table is constructed. **Known** should include only the rows to be included in the correlation coefficient calculation.

Could you post some more detail, even dummy data in a PBIX that illustrates when the calculation doesn't produce the expected result?

Regards,

Owen

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

04-26-2018
02:23 AM

@OwenAugerThank you so much, that's amazing.

As an extension, from an idea, would it be possible to do this for a large number of comparisions? I get the feeling that might be better done with python and then exploring the resultant table and data from that instead of trying to use a tabular model to achieve that?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

04-27-2018
02:36 AM

You're welcome 🙂

There's nothing to stop you having an arbitrary number of stocks in your source table...and you could use a matrix visual to show the correlation of every combination, or use that Correlation Plot custom visual. Or create your own R visual I guess

Perhaps performance might be better if you prepare the data with python (or R?) rather than computing on the fly - though don't have much experience with that myself.

Featured Topics

Top Solution Authors

User | Count |
---|---|

139 | |

84 | |

62 | |

60 | |

55 |

Top Kudoed Authors

User | Count |
---|---|

211 | |

108 | |

88 | |

75 | |

72 |