Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
SO
Helper III
Helper III

Standard Deviation - HELP Please!! Challenge with Z Scores

Hello All, 

Any help with Standard Deviation is appreciated.  As non-expert, I have literally spent days trying to navigate this challenge.


I am trying to see which locations have the lowest / highest Z-scores.  I'm doing this because when people look at Rankx, they don't often appreciate the severity of the locations with high R counts (aka outliers).   In this situation, I'm examining that some locations have high R counts, in comparison to their total count.  I'm working with Live data so I can not create new tables or columns like in excel.  Below is a screen shot, but not the full table, some locations have much lower counts, but but higher R counts. 

The Z scores (in this table below is the measure {'* Measure Z#s Score'} and its results should roughly be between +2 and -2 (aka +/- 2 Standard Deviations) so I can tell the numbers are off.    

I know that the R Count is correct.  I know that the Average R Count (333.8) is correct, but I do not think the standard deviation is correct. 

 

SO_0-1654466683965.png
Here are my Measures and I've adjusted the field names becuase I am in a public sector organization. 

 

* Measure Count =COUNTROWS ('SAchievement' )  // renamed "Count"

*R Count = CALCULATE ( [* Measure Count],'SAchievement'[NumericEquivalent] < 50 ) // renamed "R Count"

* Measure Avg R in #s = AVERAGEX ( ALLSELECTED( 'Location'[LocationName] ), [* Measure Rs] )

* Measure Std Dev R2s =CALCULATE(STDEV.P(SAchievement[NumericEquivalent]), ALLEXCEPT(SAchievement, SAchievement[year], SAchievement[ReportPeriod]), SAchievement[NumericEquivalent] <50)

* Measure Z#s Score = DIVIDE ([* Measure Rs] - [* Measure Avg R in #s], [* Measure Std Dev R2s])

 

 

Any help would be appreciated.   People often discount data rank becuase they see the last entries as close to the next entries.  These outliers need to be considered to ensure equitable resources are allocated.  
 
Many thanks 
6 REPLIES 6
SO
Helper III
Helper III

@v-chenwuz-msft thanks again for the support.  As I'm working with live data a few of the essential PBI functions to support statisitcs e.g., percentile do not work (e.g., Percentile.inc, or Earlier).  Is there a simple work around to achieve percentile?  I can use rankx, but not the Earlier function which is often used in the work arounds to manually work out percentiles.    Thanks and please let me know if this should be a new discussion?    

Hi @SO ,

 

I suggest you start a new discussion.

Or you can refer this function PERCENTILEX.INC

 

Best Regards

Community Support Team _ chenwu zhu

 

 

SO
Helper III
Helper III

I also tried to derive the Standard Deviation formula from scratch, but the PBI wheel keeps spinning without giving a result.  We have 32 locations so I just wrote in the total N=32


 *SD = //Standard Deviation Formula (From Scratch) VAR _PopVal = [* Measure Rs] VAR _Mean = [* Measure Avg R in #s] RETURN SQRT ( DIVIDE ( SUMX ( ALLEXCEPT( 'SAchievement', SAchievement[Year], SAchievement[ReportPeriod]), ( _PopVal - _Mean ) ^ 2 ), 32))  

The data for the measures is in the table above.  

 

Any thoughts on why this wouldn't work?  When placing this measure in my table, I thought it would go through each value:  e.g. for Location 1:  (483-333.8)^2/32  and then SQRTed = 26.37, then Location 2, ....etc...

Correction to above... I do get a result in about 15 minutes... however, the results are not accurrate, for example, locaiton 1 has a result of 101280.6, location 2, 6931.7,  etc... 

 

 

 

 

 

Hi @SO ,

 

According to your description, I have written the following formula for you to try.

*SD =
VAR _sum =
    SUMMARIZE (
        ALLSELECTED ( 'SAchievement' ),
        SAchievement[Location],
        SAchievement[Year],
        "_xn", CALCULATE ( [* Measure Count], 'SAchievement'[NumericEquivalent] < 50 ),
        "_x",
            AVERAGEX (
                FILTER (
                    ALLEXCEPT ( SAchievement, SAchievement[Location], SAchievement[Year] ),
                    [NumericEquivalent] < 50
                ),
                [* Measure Count]
            )
    )
RETURN
    SQRT ( DIVIDE ( SUMX ( _sum, ( [_xn] - [_x] ) ^ 2 ), 32 ) )

 

Best Regards

Community Support Team _ chenwu zhu

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Hello Again, 

@v-chenwuz-msft 

 

So Sorry again.  

I was looking at my results and comparing Excel and PBI with different results.  The measure was either showing Zero or the a number that was not possible.  To simply the data I was only looking at 3 lines.   I have slicers that filter the Year, Manager, and Location, and LocationType  

In Excel I have 

Location R12FTE#%# R12/FTEMean of R12/FTE*StandDev {STDEV.P(D2:D4)}Z Score {STANDARDIZE(D2,E2,$F$2)}
BR1952260.86280.7322666670.0932439571.399911986
MG2613820.68320.7322666670.093243957-0.526218193
ML1922950.65080.7322666670.093243957-0.873693793

 

In PowerBi I can generate the Standard Deviation, but to get the Z Score, I have to physically type in the value 0.0932.  

Location R12FTE#%# R12/FTEZ score Mean for R12/FTE*SD Z Score 
BR1952260.86280.7322666670 
MG2613820.68320.7322666670 
ML1922950.65080.7322666670 
     0.093243957 

 

Also to get the Standard Deviation to work I needed to alter the code otherwise PBI would not stop spinning.  

 

First I needed to create a measure to count the locations.  I realize I had manually typed this in my first example (32).  There are hundreds of locations, but I was only looking at 32.  In the case above, I'm slicing to 3.
 

 

* Location count =
CALCULATE(COUNTA('Location'[Location]), ALLSELECTED('Location'[Location]))

 

 

 

*SD = 
VAR _sum =
SUMMARIZE ('AProfile',
Location[Location],
"_xn", [#%# R12/FTE],
"_x",[** Z score Mean for $/FTE]
)
RETURN

//SQRT ( DIVIDE ( SUMX ( _sum, ( [_xn] - [_x] ) ^ 2 ) ,2 )) // not working
SQRT ( DIVIDE ( SUMX ( _sum, ( [#%# R12/FTE] - [** Z score Mean for $/FTE] ) ^ 2 ), [* location count]))

 


Sorry again... I think my issue was with the allexcept, or that I'm working In Service? ?  Here is the code I used.  I've modified the field names for confidentialty. 

 

 

SD = 

Var _sum = SUMMARIZE (
        ALLSELECTED ( 'AProfile' ),
        Location[Location],
        A[Year],
        "_xn", [#%# R12/FTE] ,
        "_x",
            AVERAGEX (
                FILTER (
                    ALLEXCEPT( AProfile, Location[Location], AProfile[Year]), AProfile[NumericEquivalent] <70  // In this scenario the value [#%# R12/FTE] has already taken into account <70 so this is not needed here.  I've tried with it in and out.
                ),
                [#%# R12/FTE]
            )
    )

RETURN

SQRT ( DIVIDE ( SUMX ( _sum, ( [_xn] - [_x] ) ^ 2 ), 3 ) )

 

 

Any reason why this might not work?  

 

 

 

Thank you very much !!  This is great!  I am still trying to get to a point where I can use the standard deviation funtion, but this seems to be my only work around.  You are very kind (and skilled)!

Helpful resources

Announcements
Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

February 2024 Update Carousel

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Fabric Career Hub

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.

Top Solution Authors