Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
cverink
Regular Visitor

Create Outlier Flag Column with Date Range Filter

I have 2 columns: DateSent (Date/Time) and TurnaroundTime (Integer).

 

I want to create a 3rd column that would flag rows (1 or 0) whos turnaround times are outliers from rows that were sent within the last 30 days from the rows date sent.

 

i.e flag the row if the turnaround time is > 1 standard deviation of all reports sent within 30 days from this rows DateSent.

 

I have tried a number of calculate functions with filters, but I just can't seem to get this working.

 

Any help would be greatly appreciated!

1 ACCEPTED SOLUTION

Hey,

 

maybe this measure calculates what you want

 

StdDev = if(and(hasonevalue('DimDate'[FullDateAlternateKey]),not(ISBLANK(sum('FactInternetSales'[SalesAmount])))),

calculate(
 STDEV.P('FactInternetSales'[SalesAmount])
 ,allexcept('FactInternetSales','FactInternetSales'[DueDate]),
 DATESBETWEEN('DimDate'[FullDateAlternateKey],
  FIRSTDATE(
   datesinperiod('DimDate'[FullDateAlternateKey],FIRSTDATE('FactInternetSales'[DueDate]),-30,DAY)
  ),
  lastdate('FactInternetSales'[DueDate]))
)
,blank())

 

Please be aware that the dax statement contains an if clause, the sole purpose of this clause prevents unnecessary calculations.

The allexcept clause may not be nessary if your facttable just contains the date-column and the the measure.

 

Please be aware that I'm using a dedicated Date-Table (Dimension), this is always a good idea, as @Greg_Deckler already mentioned.

This date relates to your fact table, make sure you the value of the relationship-property "cross filter direction" is set to single.

 

So

whenever you read 'DimDate' replace DimDate with the name of your Calendar-Table

whenever you read 'FactInterSales' replace FactInternetSales with the table-name from your example

 

Whenever you read [FullDateAlternateKey] replace FullDateAlternateKey with column name from you date-table that contains the day (make sure that the data type is set date) this column relates to the date column in your fact table.

 

Whenever you read [DueDate] replace DueDate with the name of the column that contains the datecolumn in your facttable.

 

Whenever you read [SalesAmount] replace SalesAmount with the column from your fact-table turnaround...

 

Hope this helps

 

P.S.:

I guess you are aware of the fact that there are books outside (from the statistics / data minig / data science section 🙂 ) where the Standard Deviation is not necessarily considered for detecting outliers as the premier method, due to the implicit assumptions that are made about the distribution of values in your dataset and the amount of rows (nowadays often called observations). But maybe using the Standard Deviation can be used to flag "interesting" values for your dataset. 



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

7 REPLIES 7
Baskar
Resident Rockstar
Resident Rockstar

Can u please some sample data with your expected result. that will help us to solve your problem

Thanks for the Replies. Here is some sample data. I have the two left columns and I want the two right columns.

Capture.PNG

Hi @cverink,

 

Could you explain a little more on how you get values for the right two columns with the sample data? I just cannot figure out the logic of these calculations with the sample data.Smiley Happy

 

Regards

I wasn't sure how best to post the sample data, because even this short snippet is really 1000+ rows. Basically for each row, I am calculating the standard deviation in turnaround time of all the rows from the last 30 days. So for the first row of data (Row 2), the formula in the 30 day standard deviation column is "=STDEV.S(A2:A315)" where A315 is the last row containing the date 30 days from the row 2's Date_sent.

 

The reason I want to do it this way, is because the normal turnaround time varries quite a bit throughout the year, but within a 30 day window it doesn't varry nearly as much. So this allows for a tighter standard deviation, thus being better at eliminating the outliers.

 

The "Outlier" column is just checking to see if the TurnaroundWorkDays is greater than the 30 Day Standard Deviation. So if the turnaround time is greater than 1 standard devication of the samples within the last 30 days, then it is an outlier.

Hey,

 

maybe this measure calculates what you want

 

StdDev = if(and(hasonevalue('DimDate'[FullDateAlternateKey]),not(ISBLANK(sum('FactInternetSales'[SalesAmount])))),

calculate(
 STDEV.P('FactInternetSales'[SalesAmount])
 ,allexcept('FactInternetSales','FactInternetSales'[DueDate]),
 DATESBETWEEN('DimDate'[FullDateAlternateKey],
  FIRSTDATE(
   datesinperiod('DimDate'[FullDateAlternateKey],FIRSTDATE('FactInternetSales'[DueDate]),-30,DAY)
  ),
  lastdate('FactInternetSales'[DueDate]))
)
,blank())

 

Please be aware that the dax statement contains an if clause, the sole purpose of this clause prevents unnecessary calculations.

The allexcept clause may not be nessary if your facttable just contains the date-column and the the measure.

 

Please be aware that I'm using a dedicated Date-Table (Dimension), this is always a good idea, as @Greg_Deckler already mentioned.

This date relates to your fact table, make sure you the value of the relationship-property "cross filter direction" is set to single.

 

So

whenever you read 'DimDate' replace DimDate with the name of your Calendar-Table

whenever you read 'FactInterSales' replace FactInternetSales with the table-name from your example

 

Whenever you read [FullDateAlternateKey] replace FullDateAlternateKey with column name from you date-table that contains the day (make sure that the data type is set date) this column relates to the date column in your fact table.

 

Whenever you read [DueDate] replace DueDate with the name of the column that contains the datecolumn in your facttable.

 

Whenever you read [SalesAmount] replace SalesAmount with the column from your fact-table turnaround...

 

Hope this helps

 

P.S.:

I guess you are aware of the fact that there are books outside (from the statistics / data minig / data science section 🙂 ) where the Standard Deviation is not necessarily considered for detecting outliers as the premier method, due to the implicit assumptions that are made about the distribution of values in your dataset and the amount of rows (nowadays often called observations). But maybe using the Standard Deviation can be used to flag "interesting" values for your dataset. 



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany
Greg_Deckler
Super User
Super User

I *think* that you would need a separate Date table essentially and calculate the standard deviations there, otherwise I think you would run into a circular dependency. Then, if you related the tables on the DateSent column, you could pull the relevant standard deviation into your visual.

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Hmm interesting thought. I do have a Calendar table that is linked so I might play around with this idea.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.