Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
SO
Helper III
Helper III

Standard Deviation - HELP Please!! Challenge with Z Scores

Hello All, 

Any help with Standard Deviation is appreciated.  As non-expert, I have literally spent days trying to navigate this challenge.


I am trying to see which locations have the lowest / highest Z-scores.  I'm doing this because when people look at Rankx, they don't often appreciate the severity of the locations with high R counts (aka outliers).   In this situation, I'm examining that some locations have high R counts, in comparison to their total count.  I'm working with Live data so I can not create new tables or columns like in excel.  Below is a screen shot, but not the full table, some locations have much lower counts, but but higher R counts. 

The Z scores (in this table below is the measure {'* Measure Z#s Score'} and its results should roughly be between +2 and -2 (aka +/- 2 Standard Deviations) so I can tell the numbers are off.    

I know that the R Count is correct.  I know that the Average R Count (333.8) is correct, but I do not think the standard deviation is correct. 

 

SO_0-1654466683965.png
Here are my Measures and I've adjusted the field names becuase I am in a public sector organization. 

 

* Measure Count =COUNTROWS ('SAchievement' )  // renamed "Count"

*R Count = CALCULATE ( [* Measure Count],'SAchievement'[NumericEquivalent] < 50 ) // renamed "R Count"

* Measure Avg R in #s = AVERAGEX ( ALLSELECTED( 'Location'[LocationName] ), [* Measure Rs] )

* Measure Std Dev R2s =CALCULATE(STDEV.P(SAchievement[NumericEquivalent]), ALLEXCEPT(SAchievement, SAchievement[year], SAchievement[ReportPeriod]), SAchievement[NumericEquivalent] <50)

* Measure Z#s Score = DIVIDE ([* Measure Rs] - [* Measure Avg R in #s], [* Measure Std Dev R2s])

 

 

Any help would be appreciated.   People often discount data rank becuase they see the last entries as close to the next entries.  These outliers need to be considered to ensure equitable resources are allocated.  
 
Many thanks 
6 REPLIES 6
SO
Helper III
Helper III

@v-chenwuz-msft thanks again for the support.  As I'm working with live data a few of the essential PBI functions to support statisitcs e.g., percentile do not work (e.g., Percentile.inc, or Earlier).  Is there a simple work around to achieve percentile?  I can use rankx, but not the Earlier function which is often used in the work arounds to manually work out percentiles.    Thanks and please let me know if this should be a new discussion?    

Hi @SO ,

 

I suggest you start a new discussion.

Or you can refer this function PERCENTILEX.INC

 

Best Regards

Community Support Team _ chenwu zhu

 

 

SO
Helper III
Helper III

I also tried to derive the Standard Deviation formula from scratch, but the PBI wheel keeps spinning without giving a result.  We have 32 locations so I just wrote in the total N=32


 *SD = //Standard Deviation Formula (From Scratch) VAR _PopVal = [* Measure Rs] VAR _Mean = [* Measure Avg R in #s] RETURN SQRT ( DIVIDE ( SUMX ( ALLEXCEPT( 'SAchievement', SAchievement[Year], SAchievement[ReportPeriod]), ( _PopVal - _Mean ) ^ 2 ), 32))  

The data for the measures is in the table above.  

 

Any thoughts on why this wouldn't work?  When placing this measure in my table, I thought it would go through each value:  e.g. for Location 1:  (483-333.8)^2/32  and then SQRTed = 26.37, then Location 2, ....etc...

Correction to above... I do get a result in about 15 minutes... however, the results are not accurrate, for example, locaiton 1 has a result of 101280.6, location 2, 6931.7,  etc... 

 

 

 

 

 

Hi @SO ,

 

According to your description, I have written the following formula for you to try.

*SD =
VAR _sum =
    SUMMARIZE (
        ALLSELECTED ( 'SAchievement' ),
        SAchievement[Location],
        SAchievement[Year],
        "_xn", CALCULATE ( [* Measure Count], 'SAchievement'[NumericEquivalent] < 50 ),
        "_x",
            AVERAGEX (
                FILTER (
                    ALLEXCEPT ( SAchievement, SAchievement[Location], SAchievement[Year] ),
                    [NumericEquivalent] < 50
                ),
                [* Measure Count]
            )
    )
RETURN
    SQRT ( DIVIDE ( SUMX ( _sum, ( [_xn] - [_x] ) ^ 2 ), 32 ) )

 

Best Regards

Community Support Team _ chenwu zhu

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Hello Again, 

@v-chenwuz-msft 

 

So Sorry again.  

I was looking at my results and comparing Excel and PBI with different results.  The measure was either showing Zero or the a number that was not possible.  To simply the data I was only looking at 3 lines.   I have slicers that filter the Year, Manager, and Location, and LocationType  

In Excel I have 

Location R12FTE#%# R12/FTEMean of R12/FTE*StandDev {STDEV.P(D2:D4)}Z Score {STANDARDIZE(D2,E2,$F$2)}
BR1952260.86280.7322666670.0932439571.399911986
MG2613820.68320.7322666670.093243957-0.526218193
ML1922950.65080.7322666670.093243957-0.873693793

 

In PowerBi I can generate the Standard Deviation, but to get the Z Score, I have to physically type in the value 0.0932.  

Location R12FTE#%# R12/FTEZ score Mean for R12/FTE*SD Z Score 
BR1952260.86280.7322666670 
MG2613820.68320.7322666670 
ML1922950.65080.7322666670 
     0.093243957 

 

Also to get the Standard Deviation to work I needed to alter the code otherwise PBI would not stop spinning.  

 

First I needed to create a measure to count the locations.  I realize I had manually typed this in my first example (32).  There are hundreds of locations, but I was only looking at 32.  In the case above, I'm slicing to 3.
 

 

* Location count =
CALCULATE(COUNTA('Location'[Location]), ALLSELECTED('Location'[Location]))

 

 

 

*SD = 
VAR _sum =
SUMMARIZE ('AProfile',
Location[Location],
"_xn", [#%# R12/FTE],
"_x",[** Z score Mean for $/FTE]
)
RETURN

//SQRT ( DIVIDE ( SUMX ( _sum, ( [_xn] - [_x] ) ^ 2 ) ,2 )) // not working
SQRT ( DIVIDE ( SUMX ( _sum, ( [#%# R12/FTE] - [** Z score Mean for $/FTE] ) ^ 2 ), [* location count]))

 


Sorry again... I think my issue was with the allexcept, or that I'm working In Service? ?  Here is the code I used.  I've modified the field names for confidentialty. 

 

 

SD = 

Var _sum = SUMMARIZE (
        ALLSELECTED ( 'AProfile' ),
        Location[Location],
        A[Year],
        "_xn", [#%# R12/FTE] ,
        "_x",
            AVERAGEX (
                FILTER (
                    ALLEXCEPT( AProfile, Location[Location], AProfile[Year]), AProfile[NumericEquivalent] <70  // In this scenario the value [#%# R12/FTE] has already taken into account <70 so this is not needed here.  I've tried with it in and out.
                ),
                [#%# R12/FTE]
            )
    )

RETURN

SQRT ( DIVIDE ( SUMX ( _sum, ( [_xn] - [_x] ) ^ 2 ), 3 ) )

 

 

Any reason why this might not work?  

 

 

 

Thank you very much !!  This is great!  I am still trying to get to a point where I can use the standard deviation funtion, but this seems to be my only work around.  You are very kind (and skilled)!

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.

Jan25PBI_Carousel

Power BI Monthly Update - January 2025

Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.