Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Serious Issue with New Correlation Coefficient Quick Measure

Status: New
Comments
v-jiascu-msft
Employee

Hi @Greg_Deckler,

 

This should be concerned indeed. But I think it could be the job of the end users. I created a simpler example here

1. As we assign a Category to calculate over, it isn't proper to remove the NULLs by default.

Serious_Issue_with_New_Correlation_Coefficient_Quick_Measure

Serious_Issue_with_New_Correlation_Coefficient_Quick_Measure2

2. The most important thing is what if the original data contains Nulls. If the dataset is big, some null values are possible. For instance, the forecast of one year by day. 

Day                  Forecast          Actual

2018-01-01       100               99

2018-01-02       100               52

2018-01-03       100               (null)

2018-01-04       100               150

...                        ...

2018-01-23       100               (null)

 

So I think it's better to leave this flexibility to the end users.

 

Best Regards,

Dale

Greg_Deckler
Super User

I don't think that it is valid to include null values in a Pearson Correlation, you will get incorrect results. The methods used in Pearson Correlation to address missing values is either to impute/interpolate the missing values or to drop them from the calculation:

 

https://stats.stackexchange.com/questions/188432/pearson-correlation-with-missing-values

 

 What I do not believe is valid is to keep the null values (unmatched values) because you will absolutely get an incorrect result.