Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have 2 columns: DateSent (Date/Time) and TurnaroundTime (Integer).
I want to create a 3rd column that would flag rows (1 or 0) whos turnaround times are outliers from rows that were sent within the last 30 days from the rows date sent.
i.e flag the row if the turnaround time is > 1 standard deviation of all reports sent within 30 days from this rows DateSent.
I have tried a number of calculate functions with filters, but I just can't seem to get this working.
Any help would be greatly appreciated!
Solved! Go to Solution.
Hey,
maybe this measure calculates what you want
StdDev = if(and(hasonevalue('DimDate'[FullDateAlternateKey]),not(ISBLANK(sum('FactInternetSales'[SalesAmount])))),
calculate(
STDEV.P('FactInternetSales'[SalesAmount])
,allexcept('FactInternetSales','FactInternetSales'[DueDate]),
DATESBETWEEN('DimDate'[FullDateAlternateKey],
FIRSTDATE(
datesinperiod('DimDate'[FullDateAlternateKey],FIRSTDATE('FactInternetSales'[DueDate]),-30,DAY)
),
lastdate('FactInternetSales'[DueDate]))
)
,blank())
Please be aware that the dax statement contains an if clause, the sole purpose of this clause prevents unnecessary calculations.
The allexcept clause may not be nessary if your facttable just contains the date-column and the the measure.
Please be aware that I'm using a dedicated Date-Table (Dimension), this is always a good idea, as @Greg_Deckler already mentioned.
This date relates to your fact table, make sure you the value of the relationship-property "cross filter direction" is set to single.
So
whenever you read 'DimDate' replace DimDate with the name of your Calendar-Table
whenever you read 'FactInterSales' replace FactInternetSales with the table-name from your example
Whenever you read [FullDateAlternateKey] replace FullDateAlternateKey with column name from you date-table that contains the day (make sure that the data type is set date) this column relates to the date column in your fact table.
Whenever you read [DueDate] replace DueDate with the name of the column that contains the datecolumn in your facttable.
Whenever you read [SalesAmount] replace SalesAmount with the column from your fact-table turnaround...
Hope this helps
P.S.:
I guess you are aware of the fact that there are books outside (from the statistics / data minig / data science section 🙂 ) where the Standard Deviation is not necessarily considered for detecting outliers as the premier method, due to the implicit assumptions that are made about the distribution of values in your dataset and the amount of rows (nowadays often called observations). But maybe using the Standard Deviation can be used to flag "interesting" values for your dataset.
Can u please some sample data with your expected result. that will help us to solve your problem
Thanks for the Replies. Here is some sample data. I have the two left columns and I want the two right columns.
Hi @cverink,
Could you explain a little more on how you get values for the right two columns with the sample data? I just cannot figure out the logic of these calculations with the sample data.
Regards
I wasn't sure how best to post the sample data, because even this short snippet is really 1000+ rows. Basically for each row, I am calculating the standard deviation in turnaround time of all the rows from the last 30 days. So for the first row of data (Row 2), the formula in the 30 day standard deviation column is "=STDEV.S(A2:A315)" where A315 is the last row containing the date 30 days from the row 2's Date_sent.
The reason I want to do it this way, is because the normal turnaround time varries quite a bit throughout the year, but within a 30 day window it doesn't varry nearly as much. So this allows for a tighter standard deviation, thus being better at eliminating the outliers.
The "Outlier" column is just checking to see if the TurnaroundWorkDays is greater than the 30 Day Standard Deviation. So if the turnaround time is greater than 1 standard devication of the samples within the last 30 days, then it is an outlier.
Hey,
maybe this measure calculates what you want
StdDev = if(and(hasonevalue('DimDate'[FullDateAlternateKey]),not(ISBLANK(sum('FactInternetSales'[SalesAmount])))),
calculate(
STDEV.P('FactInternetSales'[SalesAmount])
,allexcept('FactInternetSales','FactInternetSales'[DueDate]),
DATESBETWEEN('DimDate'[FullDateAlternateKey],
FIRSTDATE(
datesinperiod('DimDate'[FullDateAlternateKey],FIRSTDATE('FactInternetSales'[DueDate]),-30,DAY)
),
lastdate('FactInternetSales'[DueDate]))
)
,blank())
Please be aware that the dax statement contains an if clause, the sole purpose of this clause prevents unnecessary calculations.
The allexcept clause may not be nessary if your facttable just contains the date-column and the the measure.
Please be aware that I'm using a dedicated Date-Table (Dimension), this is always a good idea, as @Greg_Deckler already mentioned.
This date relates to your fact table, make sure you the value of the relationship-property "cross filter direction" is set to single.
So
whenever you read 'DimDate' replace DimDate with the name of your Calendar-Table
whenever you read 'FactInterSales' replace FactInternetSales with the table-name from your example
Whenever you read [FullDateAlternateKey] replace FullDateAlternateKey with column name from you date-table that contains the day (make sure that the data type is set date) this column relates to the date column in your fact table.
Whenever you read [DueDate] replace DueDate with the name of the column that contains the datecolumn in your facttable.
Whenever you read [SalesAmount] replace SalesAmount with the column from your fact-table turnaround...
Hope this helps
P.S.:
I guess you are aware of the fact that there are books outside (from the statistics / data minig / data science section 🙂 ) where the Standard Deviation is not necessarily considered for detecting outliers as the premier method, due to the implicit assumptions that are made about the distribution of values in your dataset and the amount of rows (nowadays often called observations). But maybe using the Standard Deviation can be used to flag "interesting" values for your dataset.
I *think* that you would need a separate Date table essentially and calculate the standard deviations there, otherwise I think you would run into a circular dependency. Then, if you related the tables on the DateSent column, you could pull the relevant standard deviation into your visual.
Hmm interesting thought. I do have a Calendar table that is linked so I might play around with this idea.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
112 | |
100 | |
76 | |
74 | |
49 |
User | Count |
---|---|
145 | |
108 | |
107 | |
89 | |
61 |