cancel
Showing results for
Did you mean:

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a \$100 discount. Register Now

Helper III

Standard Deviation - HELP Please!! Challenge with Z Scores

Hello All,

Any help with Standard Deviation is appreciated.  As non-expert, I have literally spent days trying to navigate this challenge.

I am trying to see which locations have the lowest / highest Z-scores.  I'm doing this because when people look at Rankx, they don't often appreciate the severity of the locations with high R counts (aka outliers).   In this situation, I'm examining that some locations have high R counts, in comparison to their total count.  I'm working with Live data so I can not create new tables or columns like in excel.  Below is a screen shot, but not the full table, some locations have much lower counts, but but higher R counts.

The Z scores (in this table below is the measure {'* Measure Z#s Score'} and its results should roughly be between +2 and -2 (aka +/- 2 Standard Deviations) so I can tell the numbers are off.

I know that the R Count is correct.  I know that the Average R Count (333.8) is correct, but I do not think the standard deviation is correct.

Here are my Measures and I've adjusted the field names becuase I am in a public sector organization.

``````* Measure Count =COUNTROWS ('SAchievement' )  // renamed "Count"

*R Count = CALCULATE ( [* Measure Count],'SAchievement'[NumericEquivalent] < 50 ) // renamed "R Count"

* Measure Avg R in #s = AVERAGEX ( ALLSELECTED( 'Location'[LocationName] ), [* Measure Rs] )

* Measure Std Dev R2s =CALCULATE(STDEV.P(SAchievement[NumericEquivalent]), ALLEXCEPT(SAchievement, SAchievement[year], SAchievement[ReportPeriod]), SAchievement[NumericEquivalent] <50)

* Measure Z#s Score = DIVIDE ([* Measure Rs] - [* Measure Avg R in #s], [* Measure Std Dev R2s])``````

Any help would be appreciated.   People often discount data rank becuase they see the last entries as close to the next entries.  These outliers need to be considered to ensure equitable resources are allocated.

Many thanks
6 REPLIES 6
Helper III

@v-chenwuz-msft thanks again for the support.  As I'm working with live data a few of the essential PBI functions to support statisitcs e.g., percentile do not work (e.g., Percentile.inc, or Earlier).  Is there a simple work around to achieve percentile?  I can use rankx, but not the Earlier function which is often used in the work arounds to manually work out percentiles.    Thanks and please let me know if this should be a new discussion?

Community Support

Hi @SO ,

I suggest you start a new discussion.

Or you can refer this function PERCENTILEX.INC

Best Regards

Community Support Team _ chenwu zhu

Helper III

I also tried to derive the Standard Deviation formula from scratch, but the PBI wheel keeps spinning without giving a result.  We have 32 locations so I just wrote in the total N=32

*SD = //Standard Deviation Formula (From Scratch) VAR _PopVal = [* Measure Rs] VAR _Mean = [* Measure Avg R in #s] RETURN SQRT ( DIVIDE ( SUMX ( ALLEXCEPT( 'SAchievement', SAchievement[Year], SAchievement[ReportPeriod]), ( _PopVal - _Mean ) ^ 2 ), 32))

The data for the measures is in the table above.

Any thoughts on why this wouldn't work?  When placing this measure in my table, I thought it would go through each value:  e.g. for Location 1:  (483-333.8)^2/32  and then SQRTed = 26.37, then Location 2, ....etc...

Correction to above... I do get a result in about 15 minutes... however, the results are not accurrate, for example, locaiton 1 has a result of 101280.6, location 2, 6931.7,  etc...

Community Support

Hi @SO ,

According to your description, I have written the following formula for you to try.

``````*SD =
VAR _sum =
SUMMARIZE (
ALLSELECTED ( 'SAchievement' ),
SAchievement[Location],
SAchievement[Year],
"_xn", CALCULATE ( [* Measure Count], 'SAchievement'[NumericEquivalent] < 50 ),
"_x",
AVERAGEX (
FILTER (
ALLEXCEPT ( SAchievement, SAchievement[Location], SAchievement[Year] ),
[NumericEquivalent] < 50
),
[* Measure Count]
)
)
RETURN
SQRT ( DIVIDE ( SUMX ( _sum, ( [_xn] - [_x] ) ^ 2 ), 32 ) )
``````

Best Regards

Community Support Team _ chenwu zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helper III

Hello Again,

So Sorry again.

I was looking at my results and comparing Excel and PBI with different results.  The measure was either showing Zero or the a number that was not possible.  To simply the data I was only looking at 3 lines.   I have slicers that filter the Year, Manager, and Location, and LocationType

In Excel I have

 Location R12 FTE #%# R12/FTE Mean of R12/FTE *StandDev {STDEV.P(D2:D4)} Z Score {STANDARDIZE(D2,E2,\$F\$2)} BR 195 226 0.8628 0.732266667 0.093243957 1.399911986 MG 261 382 0.6832 0.732266667 0.093243957 -0.526218193 ML 192 295 0.6508 0.732266667 0.093243957 -0.873693793

In PowerBi I can generate the Standard Deviation, but to get the Z Score, I have to physically type in the value 0.0932.

 Location R12 FTE #%# R12/FTE Z score Mean for R12/FTE *SD Z Score BR 195 226 0.8628 0.732266667 0 MG 261 382 0.6832 0.732266667 0 ML 192 295 0.6508 0.732266667 0 0.093243957

Also to get the Standard Deviation to work I needed to alter the code otherwise PBI would not stop spinning.

First I needed to create a measure to count the locations.  I realize I had manually typed this in my first example (32).  There are hundreds of locations, but I was only looking at 32.  In the case above, I'm slicing to 3.

``````* Location count =
CALCULATE(COUNTA('Location'[Location]), ALLSELECTED('Location'[Location]))``````

``````*SD =
VAR _sum =
SUMMARIZE ('AProfile',
Location[Location],
"_xn", [#%# R12/FTE],
"_x",[** Z score Mean for \$/FTE]
)
RETURN

//SQRT ( DIVIDE ( SUMX ( _sum, ( [_xn] - [_x] ) ^ 2 ) ,2 )) // not working
SQRT ( DIVIDE ( SUMX ( _sum, ( [#%# R12/FTE] - [** Z score Mean for \$/FTE] ) ^ 2 ), [* location count]))``````

Sorry again... I think my issue was with the allexcept, or that I'm working In Service? ?  Here is the code I used.  I've modified the field names for confidentialty.

``````SD =

Var _sum = SUMMARIZE (
ALLSELECTED ( 'AProfile' ),
Location[Location],
A[Year],
"_xn", [#%# R12/FTE] ,
"_x",
AVERAGEX (
FILTER (
ALLEXCEPT( AProfile, Location[Location], AProfile[Year]), AProfile[NumericEquivalent] <70  // In this scenario the value [#%# R12/FTE] has already taken into account <70 so this is not needed here.  I've tried with it in and out.
),
[#%# R12/FTE]
)
)

RETURN

SQRT ( DIVIDE ( SUMX ( _sum, ( [_xn] - [_x] ) ^ 2 ), 3 ) )``````

Any reason why this might not work?

Helper III

Thank you very much !!  This is great!  I am still trying to get to a point where I can use the standard deviation funtion, but this seems to be my only work around.  You are very kind (and skilled)!

Announcements

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.

Top Solution Authors
Top Kudoed Authors