Solved: Create Outlier Flag Column with Date Range Filter

cverink · ‎01-09-2017

I have 2 columns: DateSent (Date/Time) and TurnaroundTime (Integer).

I want to create a 3rd column that would flag rows (1 or 0) whos turnaround times are outliers from rows that were sent within the last 30 days from the rows date sent.

i.e flag the row if the turnaround time is > 1 standard deviation of all reports sent within 30 days from this rows DateSent.

I have tried a number of calculate functions with filters, but I just can't seem to get this working.

Any help would be greatly appreciated!

TomMartens · ‎01-11-2017

Hey,

maybe this measure calculates what you want

StdDev = if(and(hasonevalue('DimDate'[FullDateAlternateKey]),not(ISBLANK(sum('FactInternetSales'[SalesAmount])))),

calculate(
STDEV.P('FactInternetSales'[SalesAmount])
,allexcept('FactInternetSales','FactInternetSales'[DueDate]),
DATESBETWEEN('DimDate'[FullDateAlternateKey],
  FIRSTDATE(
   datesinperiod('DimDate'[FullDateAlternateKey],FIRSTDATE('FactInternetSales'[DueDate]),-30,DAY)
  ),
  lastdate('FactInternetSales'[DueDate]))
)
,blank())

Please be aware that the dax statement contains an if clause, the sole purpose of this clause prevents unnecessary calculations.

The allexcept clause may not be nessary if your facttable just contains the date-column and the the measure.

Please be aware that I'm using a dedicated Date-Table (Dimension), this is always a good idea, as @Greg_Deckler already mentioned.

This date relates to your fact table, make sure you the value of the relationship-property "cross filter direction" is set to single.

So

whenever you read 'DimDate' replace DimDate with the name of your Calendar-Table

whenever you read 'FactInterSales' replace FactInternetSales with the table-name from your example

Whenever you read [FullDateAlternateKey] replace FullDateAlternateKey with column name from you date-table that contains the day (make sure that the data type is set date) this column relates to the date column in your fact table.

Whenever you read [DueDate] replace DueDate with the name of the column that contains the datecolumn in your facttable.

Whenever you read [SalesAmount] replace SalesAmount with the column from your fact-table turnaround...

Hope this helps

P.S.:

I guess you are aware of the fact that there are books outside (from the statistics / data minig / data science section 🙂 ) where the Standard Deviation is not necessarily considered for detecting outliers as the premier method, due to the implicit assumptions that are made about the distribution of values in your dataset and the amount of rows (nowadays often called observations). But maybe using the Standard Deviation can be used to flag "interesting" values for your dataset.

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

Baskar · ‎01-09-2017

Can u please some sample data with your expected result. that will help us to solve your problem

cverink · ‎01-09-2017

Thanks for the Replies. Here is some sample data. I have the two left columns and I want the two right columns.

v-ljerr-msft · ‎01-10-2017

Hi @cverink,

Could you explain a little more on how you get values for the right two columns with the sample data? I just cannot figure out the logic of these calculations with the sample data.

Regards

cverink · ‎01-11-2017

I wasn't sure how best to post the sample data, because even this short snippet is really 1000+ rows. Basically for each row, I am calculating the standard deviation in turnaround time of all the rows from the last 30 days. So for the first row of data (Row 2), the formula in the 30 day standard deviation column is "=STDEV.S(A2:A315)" where A315 is the last row containing the date 30 days from the row 2's Date_sent.

The reason I want to do it this way, is because the normal turnaround time varries quite a bit throughout the year, but within a 30 day window it doesn't varry nearly as much. So this allows for a tighter standard deviation, thus being better at eliminating the outliers.

The "Outlier" column is just checking to see if the TurnaroundWorkDays is greater than the 30 Day Standard Deviation. So if the turnaround time is greater than 1 standard devication of the samples within the last 30 days, then it is an outlier.

TomMartens · ‎01-11-2017

Hey,

maybe this measure calculates what you want

StdDev = if(and(hasonevalue('DimDate'[FullDateAlternateKey]),not(ISBLANK(sum('FactInternetSales'[SalesAmount])))),

calculate(
STDEV.P('FactInternetSales'[SalesAmount])
,allexcept('FactInternetSales','FactInternetSales'[DueDate]),
DATESBETWEEN('DimDate'[FullDateAlternateKey],
  FIRSTDATE(
   datesinperiod('DimDate'[FullDateAlternateKey],FIRSTDATE('FactInternetSales'[DueDate]),-30,DAY)
  ),
  lastdate('FactInternetSales'[DueDate]))
)
,blank())

Please be aware that the dax statement contains an if clause, the sole purpose of this clause prevents unnecessary calculations.

The allexcept clause may not be nessary if your facttable just contains the date-column and the the measure.

Please be aware that I'm using a dedicated Date-Table (Dimension), this is always a good idea, as @Greg_Deckler already mentioned.

This date relates to your fact table, make sure you the value of the relationship-property "cross filter direction" is set to single.

So

whenever you read 'DimDate' replace DimDate with the name of your Calendar-Table

whenever you read 'FactInterSales' replace FactInternetSales with the table-name from your example

Whenever you read [FullDateAlternateKey] replace FullDateAlternateKey with column name from you date-table that contains the day (make sure that the data type is set date) this column relates to the date column in your fact table.

Whenever you read [DueDate] replace DueDate with the name of the column that contains the datecolumn in your facttable.

Whenever you read [SalesAmount] replace SalesAmount with the column from your fact-table turnaround...

Hope this helps

P.S.:

I guess you are aware of the fact that there are books outside (from the statistics / data minig / data science section 🙂 ) where the Standard Deviation is not necessarily considered for detecting outliers as the premier method, due to the implicit assumptions that are made about the distribution of values in your dataset and the amount of rows (nowadays often called observations). But maybe using the Standard Deviation can be used to flag "interesting" values for your dataset.

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Greg_Deckler · ‎01-09-2017

I *think* that you would need a separate Date table essentially and calculate the standard deviations there, otherwise I think you would run into a circular dependency. Then, if you related the tables on the DateSent column, you could pull the relevant standard deviation into your visual.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

cverink · ‎01-09-2017

Hmm interesting thought. I do have a Calendar table that is linked so I might play around with this idea.

Create Outlier Flag Column with Date Range Filter

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly