Solved: Find 95% Confidence Interval for Difference of Mea...

Nankaina · ‎04-03-2023

Client has asked for a graph of two columns' distributions to change color if the difference in means is ever statistically significant (or by my understanding, the 95% confidence interval doesn't contain 0).

The changing color part is fairly idiomatic to Power BI. The 95% CI for Difference of Means? Not so much.

I've got more math background than stats background. If nothing else, I can make one big fat DAX function using the mathematical formula for difference of means Confidence Interval, which is a doozie. That feels like it's pretty computation-heavy, requiring two averages, two standard deviations, a T.INV, and a square root to be done just as the precursors to a final calculation. I'm wondering if there's a more efficient DAX approach to this than just brute-forcing the equation.

I looked into CONFIDENCE.T and CONFIDENCE.NORM, which are single-mean tests, and my gut tells me there might be a way to manipulate one of those into working, but I don't have the stats background to know what it would be.

Anonymous · ‎04-05-2023

Hi @Nankaina ,

Please try:

Statistical Significance = 
    VAR mean1 = calculate(average('Data'[Column 1]), 'Data'[Column 2]="No")
    VAR mean2 = calculate(average('Data'[Column 1]), 'Data'[Column 2]="Yes")
    VAR count1 = calculate(count('Data'[Column 1]), 'Data'[Column 2]="No")
    VAR count2 = calculate(count('Data'[Column 1]), 'Data'[Column 2]="Yes")
    VAR stdv1 = calculate(STDEV.P('Data'[Column 1]), 'Data'[Column 2]="No")
    VAR stdv2 = calculate(STDEV.P('Data'[Column 1]), 'Data'[Column 2]="Yes")
    VAR alpha = 0.05
    VAR diffmean = ABS(mean1-mean2)
    VAR stderror = T.INV(alpha/2, count1+count2-2)*SQRT((stdv1*stdv1/count1)+(stdv2*stdv2/count2))
    VAR CI_min = min((diffmean-stderror), (diffmean+stderror))
    VAR CI_max = max((diffmean-stderror), (diffmean+stderror))
RETURN
    IF(CImin<0, if(0<CImax, False, True), True)

For details on the use of DEFINE, please refer to

DAX Queries - DAX | Microsoft Learn

DEFINE keyword (DAX) - DAX | Microsoft Learn

Best Regards,

Neeko Tang

If this post  helps, then please consider Accept it as the solution  to help the other members find it more quickly.

View solution in original post

Nankaina · ‎04-05-2023

I decided to make a brute force version, assigning variables so that I could copy/paste and edit this later down the line if I want to. For some reason it's now telling me that my syntax with DEFINE is incorrect? I'm not sure where the mistake is.

Statistical Significance = 
DEFINE
    VAR mean1 = calculate(average('Data'[Column 1]), 'Data'[Column 2]="No")
    VAR mean2 = calculate(average('Data'[Column 1]), 'Data'[Column 2]="Yes")
    VAR count1 = calculate(count('Data'[Column 1]), 'Data'[Column 2]="No")
    VAR count2 = calculate(count('Data'[Column 1]), 'Data'[Column 2]="Yes")
    VAR stdv1 = calculate(STDEV.P('Data'[Column 1]), 'Data'[Column 2]="No")
    VAR stdv2 = calculate(STDEV.P('Data'[Column 1]), 'Data'[Column 2]="Yes")
    VAR alpha = 0.05
    VAR diffmean = ABS(mean1-mean2)
    VAR stderror = T.INV(alpha/2, count1+count2-2)*SQRT((stdv1*stdv1/count1)+(stdv2*stdv2/count2))
    VAR CI_min = min((diffmean-stderror), (diffmean+stderror))
    VAR CI_max = max((diffmean-stderror), (diffmean+stderror))
RETURN
    IF(CImin<0, if(0<CImax, False, True), True)

Anonymous · ‎04-05-2023

Hi @Nankaina ,

Please try:

Statistical Significance = 
    VAR mean1 = calculate(average('Data'[Column 1]), 'Data'[Column 2]="No")
    VAR mean2 = calculate(average('Data'[Column 1]), 'Data'[Column 2]="Yes")
    VAR count1 = calculate(count('Data'[Column 1]), 'Data'[Column 2]="No")
    VAR count2 = calculate(count('Data'[Column 1]), 'Data'[Column 2]="Yes")
    VAR stdv1 = calculate(STDEV.P('Data'[Column 1]), 'Data'[Column 2]="No")
    VAR stdv2 = calculate(STDEV.P('Data'[Column 1]), 'Data'[Column 2]="Yes")
    VAR alpha = 0.05
    VAR diffmean = ABS(mean1-mean2)
    VAR stderror = T.INV(alpha/2, count1+count2-2)*SQRT((stdv1*stdv1/count1)+(stdv2*stdv2/count2))
    VAR CI_min = min((diffmean-stderror), (diffmean+stderror))
    VAR CI_max = max((diffmean-stderror), (diffmean+stderror))
RETURN
    IF(CImin<0, if(0<CImax, False, True), True)

For details on the use of DEFINE, please refer to

DAX Queries - DAX | Microsoft Learn

DEFINE keyword (DAX) - DAX | Microsoft Learn

Best Regards,

Neeko Tang

If this post  helps, then please consider Accept it as the solution  to help the other members find it more quickly.

Anonymous · ‎04-04-2023

Hi @Nankaina ,

Confidence functions in DAX are used to help compute confidence intervals about the average value called the sample mean. The value that gets returned is the margin of error (ME ) defined as:

For a 95% confidence interval, alpha is set to 0.05.

Please refer to the following documents for more information.

CONFIDENCE.NORM function (DAX) - DAX | Microsoft Learn

Confidence Interval for the Difference Between Means - Statology

Confidence Functions in Power BI | Dash-Intel

Finding Confidence Interval - Microsoft Power BI Community

Best Regards,

Neeko Tang

If this post  helps, then please consider Accept it as the solution  to help the other members find it more quickly.