Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
I have seen some DAX for doing Pearson's Correlation, but it is not really designed for comparing two time series.
What would you suggest for two columns (same table) with values that look highly correlated (I want a single metric as output).
Solved! Go to Solution.
Here is my rolling Pearsons Correlation Coefficient solution:
CorrTable =
ADDCOLUMNS(
SUMMARIZE(
'Traces',
'Traces'[CorrID],
'Traces'[DateStamp]
),
"OutlookLatency", CALCULATE(
AVERAGE('Traces'[ResponseTime]),
'Traces'[AppName] = "Microsoft Outlook"
),
"TokenLatency", CALCULATE(
AVERAGE('Traces'[ResponseTime]),
'Traces'[AppName] = "AAD Token Broker Plugin"
),
"CredentialLatency", CALCULATE(
AVERAGE('Traces'[ResponseTime]),
'Traces'[AppName] = "Credential Manager UI Host (Windows)"
),
"EntraLatency", CALCULATE(
AVERAGE('Traces'[ResponseTime]),
'Traces'[AppName] = "Microsoft Entra"
)
)
TokenCorr =
VAR cTable =
FILTER(
ADDCOLUMNS(
VALUES('CorrTable'[CorrID]),
"X", CALCULATE(AVERAGE('CorrTable'[OutlookLatency])),
"Y", CALCULATE(AVERAGE('CorrTable'[TokenLatency]))
),
AND(
NOT (ISBLANK([X])),
NOT (ISBLANK([Y]))
)
)
VAR Count_Items = COUNTROWS(cTable)
VAR Sum_X = SUMX(cTable,[X])
VAR Sum_X2 = SUMX(cTable,[X] ^ 2)
VAR Sum_Y = SUMX(cTable,[Y])
VAR Sum_Y2 =SUMX(cTable,[Y] ^ 2)
VAR Sum_XY =SUMX(cTable,[X] * [Y])
VAR Pearson_Numerator = Count_Items * Sum_XY - Sum_X * Sum_Y
VAR Pearson_Denominator_X = Count_Items * Sum_X2 - Sum_X ^ 2
VAR Pearson_Denominator_Y = Count_Items * Sum_Y2 - Sum_Y ^ 2
VAR Pearson_Denominator = SQRT(Pearson_Denominator_X * Pearson_Denominator_Y)
VAR TokenCorr =
DIVIDE(
Pearson_Numerator,
Pearson_Denominator
)
RETURN TokenCorr
Hi @DoctorYSG ,
May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.
Thank you.
It cannot be an aggregate over time. In narritive form the question is: At the point in time (approximately) when there is a Outlook wait event, what is the related token wait (on it's timeline). That is we are doing sliding windows on both timelines and comparing.
Hi @DoctorYSG ,
Thank you for providing additional context to your scenario, but as you have mentioned there's a significant amount of coding with DAX, and it is not ideally set up for this scenario.
So as an alternative please try to utilize Python/R visual in Power BI and try to create visualization using advanced statistics
Create Power BI visuals using Python in Power BI Desktop - Power BI | Microsoft Learn
Thank you
Hi @DoctorYSG ,
Thanks for reaching out to the Microsoft fabric community forum.
I came across a few correlation metrics that I believe are well suited for your needs.
Please try out Cosine similarity and spearman correlation, and let us know if they help in checking correlation of your data.
If you find this post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thank you
thank you Madadi,
I searched in the DAX documentation for Spearmen correlation, but I don't see any function for it. DAX function reference - DAX | Microsoft Learn nor a special Cosine correlation. There are some articles in the user forums. I did see CoPilot suggest the below code, but I don't think that is really the correct mathematical way to tackle this. (we don't have sine waves here).
what would be best is a cross-correlation function, but that is a good deal of coding to ge there, and DAX is not really well setup for this:
Cross-correlation - Wikipedia
```
The cosine correlation (or cosine similarity) is a measure used to determine the similarity between two vectors in a multi-dimensional space. It is widely used in fields like machine learning, natural language processing, and recommendation systems. The formula for cosine similarity is:
$$\text{Cosine Similarity} = \frac{\mathbf{A} \cdot \mathbf{B}}{|\mathbf{A}| |\mathbf{B}|}$$
Where:
In DAX (Data Analysis Expressions), commonly used in Power BI or Excel, you can calculate cosine similarity between two vectors (e.g., columns of data) by following these steps:
Assume you have two columns, ColumnA and ColumnB, in a table called DataTable.
Dot Product:
Copy the codeDotProduct = SUMX(DataTable, DataTable[ColumnA] * DataTable[ColumnB])
Magnitude of Each Vector:
Copy the codeMagnitudeA = SQRT(SUMX(DataTable, DataTable[ColumnA] * DataTable[ColumnA]))
MagnitudeB = SQRT(SUMX(DataTable, DataTable[ColumnB] * DataTable[ColumnB]))
Cosine Similarity:
Copy the codeCosineSimilarity = DIVIDE([DotProduct], [MagnitudeA] * [MagnitudeB], 0)
If you need further clarification or help implementing this, feel free to ask! 😊
```
Hi @DoctorYSG ,
Thanks for reaching out to the Microsoft fabric community forum.
If DAX doesn't meet your data transformation needs, consider using a Fabric Notebook with PySpark. Notebooks offer an environment where you can process, filter, and reshape data before visualizing it in Power BI. This is useful when you're working with logic or operations that are difficult to implement in DAX.
If this post helps, then please consider Accepting as solution to help the other members find it more quickly and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thank you
Right now, I am in the GovCloud version of PowerBI (FedRamp aproval of Fabric is still in the process). But I tend to agree with you. What I am probably looking for is rolling correlations, or cross correlations which is available in Pandas for Python. I am going to leave this open so that others who have tried to do rolling correlations in DAX can reply (and there are some use cases where delaying the processing till DAX makes sense).
Hi @DoctorYSG ,
Accepting the helpful reply as solution may help some community members in the future who may come across this post.
You could always reopen this thread or create a new one later.
Thank you.
I will post my rolling correlations coeficients when it is done.
Here is my rolling Pearsons Correlation Coefficient solution:
CorrTable =
ADDCOLUMNS(
SUMMARIZE(
'Traces',
'Traces'[CorrID],
'Traces'[DateStamp]
),
"OutlookLatency", CALCULATE(
AVERAGE('Traces'[ResponseTime]),
'Traces'[AppName] = "Microsoft Outlook"
),
"TokenLatency", CALCULATE(
AVERAGE('Traces'[ResponseTime]),
'Traces'[AppName] = "AAD Token Broker Plugin"
),
"CredentialLatency", CALCULATE(
AVERAGE('Traces'[ResponseTime]),
'Traces'[AppName] = "Credential Manager UI Host (Windows)"
),
"EntraLatency", CALCULATE(
AVERAGE('Traces'[ResponseTime]),
'Traces'[AppName] = "Microsoft Entra"
)
)
TokenCorr =
VAR cTable =
FILTER(
ADDCOLUMNS(
VALUES('CorrTable'[CorrID]),
"X", CALCULATE(AVERAGE('CorrTable'[OutlookLatency])),
"Y", CALCULATE(AVERAGE('CorrTable'[TokenLatency]))
),
AND(
NOT (ISBLANK([X])),
NOT (ISBLANK([Y]))
)
)
VAR Count_Items = COUNTROWS(cTable)
VAR Sum_X = SUMX(cTable,[X])
VAR Sum_X2 = SUMX(cTable,[X] ^ 2)
VAR Sum_Y = SUMX(cTable,[Y])
VAR Sum_Y2 =SUMX(cTable,[Y] ^ 2)
VAR Sum_XY =SUMX(cTable,[X] * [Y])
VAR Pearson_Numerator = Count_Items * Sum_XY - Sum_X * Sum_Y
VAR Pearson_Denominator_X = Count_Items * Sum_X2 - Sum_X ^ 2
VAR Pearson_Denominator_Y = Count_Items * Sum_Y2 - Sum_Y ^ 2
VAR Pearson_Denominator = SQRT(Pearson_Denominator_X * Pearson_Denominator_Y)
VAR TokenCorr =
DIVIDE(
Pearson_Numerator,
Pearson_Denominator
)
RETURN TokenCorr
Hi @DoctorYSG ,
Thank you for the update, we are going to keep this thread open as per your request.
For any further discussions or questions, please post in this post or create a new post in the Microsoft Fabric Community Forum, we’ll be happy to assist.
Thank you for being part of the Microsoft Fabric Community.
Averages are not good. One needs to see if one curve tracks the peaks and valleys of the other in a scale free manner.
hello @DoctorYSG
how about percent of normalized value?
something like percent of Token Wait to Outlook Wait or to total value (Outlook Wait + Token Wait).
Thank you.
User | Count |
---|---|
16 | |
14 | |
13 | |
12 | |
11 |
User | Count |
---|---|
19 | |
15 | |
15 | |
11 | |
10 |