The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi Guys!
I'm trying to calculate cumulative attrition rate by month before cancellation of subscription (what percentage of the subscriptors cancelled after certain number of months).
attrition = number of cancelled/number of established
Sample file: https://we.tl/t-uYaSyAvWXZ
I think the nominator - number of cancelled is calculated correctly in the file.
However, I think there's a problem with my calculation of denominator as I do not disregard the donations that were established before they could have been cancelled in the specific timeframe.
E.g. if I'm looking at the attrition for 12 months, it does make sense to only take into account the donations that were established more than (or at) 12 months ago and disregard all the donations that were established later than that ago as they did not have a full chance to get cancelled within 12 months period.
Could you please advise on how to fix the formula?
Thank you!
Agnieszka
Solved! Go to Solution.
Here's the file:
Probabilities of Cancelling Subscriptions
Of course, it's easy to create 2 other measures. One will give you the numerator and the other the denominator of the probability calculated for a given cell. Just dissect the probabilities and extract from the formula the numerator and denominator as 2 distinct measures. I would warn you against using these probabilities where the number in the denominator is smaller than at least 30. Ideally, it should be much much greater than this. I'm saying "ideally" because 30 is about the size of a sample where the Law(s) of Large Numbers starts to kick in.
Please have a look at this file and tell me how it goes. Bear in mind that the probabilities are in fact conditional because for each cancellation month the denominator changes (the populations differ). If you, for instance, consider one establishment month (like 2020-07), then you'll be only able to calculate the probability that subscriptions that originated in this very month were cancelled in the same month. For 1 year, you'll be able to calculate such probs for month=0,1,....,11 but the estimates will be getting worse and worse as the month number grows. This does show in the pivot table I created in the file.
Best
D
Hi Dan!
Thank you so much for your help!
I made minor adjustements to your code and it now works for me.
Thank you,
Agnieszka
Something like this? Sorry, I can't right now retrieve your file as I'm at work... but it should be pretty understandable. You have to pay a very close attention to the defined quantities in order to get this right. You might need to adjust this to you model or, better still, change your model to suit this description.
// Say you have a parameter table,
// call it 'Cancellation Months', that
// lists the numbers that are the
// month numbers of cancellation after
// establishment.
// 0, 1, 2,..., N
// where N is as big as you define.
// So, selecting 2 means that cancellation
// happened in the 2nd month after
// establishment.
// Say that this number is in your table
// with subscriptions in the column
// [Cancellation Month]. If a subscription
// is still active, the value is BLANK(),
// Also there is a 1:* relationship
// from 'Cancellation Months' to 'Subscriptions'
// on the columns mentioned above.
// Probability that a subscription was
// cancelled in exactly the n'th month
// after subscription.
[Prob of cancellation in Nth month] =
var __maxDate = MAX( EstablishmentCalendar[Date] )
var __month = SELECTEDVALUE( 'Cancellation Months'[Month] )
return
if( NOT ISBLANK( __month ) ,
var __estblshmntBndry =
DATEADD( __maxDate, -__month, MONTH)
var __universe =
FILTER(
// These Subscriptions have had enough
// time to get cancelled in the nth
// month after their establishment.
ALL( Subscriptions ),
Subscriptions[Established]
<= __estblshmntBndry
)
var __cancelled =
COUNTROWS(
FILTER(
__universe,
Subscriptions[Cancellation Month] = __month
)
)
var __all = COUNTROWS( __universe )
return
divide( __cancelled, __all )
)
From this, you can derive other measures like [Probability of cancellation happening before or in the Nth month after establishement]. This would be your YTD...
Best
D
Thank you so much for this!
I've adjusted the code a little (e.g. selectedvalue function was not recognised in this context) and added additional column with Update Date equal to max([Date Established]). The calculation works now if I do not use any dimensions for filtering. However, I was not able to make it work when I use filters (e.g. by date established, country etc.) as ALL(Managed) disregards them. I have tried disregarding filtering only on the 'Months before cancellation' using ALL(Managed[Months before cancellation]), but it led to error with identifying single value in the Managed[Date Established] in the subsequent line and in the end I was not able to make it work. Could you please have a look and advise how I should adjust the code?
Prob of cancellation in Nth month:=var __month = max('Managed'[Months before cancellation])
return
if( NOT ISBLANK( __month ) ,
var __estblshmntBndry =
DATE(YEAR(max(Managed[Update Date])),MONTH(max(Managed[Update Date]))-__month,DAY(max(Managed[Update Date])))
var __universe =
FILTER(
// These Subscriptions have had enough
// time to get cancelled in the nth
// month after their establishment.
ALL( Managed),
Managed[Date Established]
<= __estblshmntBndry
)
var __cancelled =
calculate(COUNT(Managed[Months before cancellation]),
FILTER(
__universe,
Managed[Months before cancellation] = __month
)
)
var __all = COUNTROWS( __universe )
return
divide(__cancelled,__all))
Hi.
First, SELECTEDVALUE is a function:
The function SELECTEDVALUE returns the value of the column reference passed as the first argument if it is the only value available in the filter context; otherwise, it returns blank or the default value passed as the second argument.
So, in my formula the use of it was deliberate. If it returned BLANK, it meant nothing was selected in the dimension or more than 1 value was selected.
Second, you have to have a calendar table in your model. Bear in mind that you should never, ever, ever expose your fact tables to the user. All slicing MUST BE done through dimensions. All columns in the fact table must be hidden. ALWAYS. No exceptions. Only measures can be made visible in the fact tables. YOU HAVE TO STICK TO THIS DESIGN if you want to have an easy time working with DAX and tabular models.
Your changed formula does not work because you're not doing it right.
So, let me recap what your model has to look like. First, you have to have a date table, marked as such in the model, called [Establishment Calendar] connected to Subscriptions (the fact table) via the Date field (1:*). There can't be subscriptions without an establishment date. Second, you have to have a dimension called [Cancellation Month] which will have 2 columns: one, hidden, will be an int storing numbers -1 (yes, -1), 0, 1, 2...., N, (call the column: MonthNumber) and the second (call it Month) will be a text column that will store the corresponding values: No Cancellation Month, 0th Month, 1st Month, 2nd Month, 3rd Month, 4th Month, 5th Month,..., Nth Month. Of course 'N' in the text is a number---the maximum number of cancellation month you want to consider. This one column will be exposed to the users. The first column, with integers, will be connected to Subscriptions (1:*) and hidden.
You should adjust your model to the above, not the other way round. This is a piece of good advice. Trust me.
Here's the measure you're after:
// This is a simple measure that just counts
// the number of subscriptions in the current
// context.
[Subscription Count] = COUNTROWS( Subscriptions )
// Probability that a subscription was
// cancelled in exactly the n'th month
// after subscription.
// 'visible' means: in the current context.
[Prob of Cancellation in Nth Month] =
// __maxDate is the last visible date in the
// 'Establishment Calendar' table.
var __maxDate = MAX( 'Establishment Calendar'[Date] )
var __month = SELECTEDVALUE( 'Cancellation Month'[MonthNumber] )
// This condition checks if one and only one value
// has been selected in the 'Cancellation Month'
// dimension and that it's not -1. -1 means
// No Cancellation Month.
var __shouldCalculate =
NOT ISBLANK( __month )
&& __month <> -1
return
if( __shouldCalculate,
var __establishUpperBound =
DATEADD( __maxDate, -__month, MONTH)
var __cancelled =
CALCULATE(
[Subscription Count] + 0, // this 0 must be here
'Establishment Calendar'[Date]
<= __establishUpperBound,
// This condition below makes sure
// that only subscriptions in the
// current context are filtered by the
// above condition. If you want to
// consider all subscriptions before
// the upper bound disregarding the selection
// in the calendar, then remove this
// condition below.
VALUES( 'Establishment Calendar'[Date] )
)
var __all =
CALCULATE(
[Subscription Count],
'Establishment Calendar'[Date]
<= __establishUpperBound,
// Same remark applies to this VALUES
// condition as above.
VALUES( 'Establishment Calendar'[Date] ),
ALL( 'Cancellation Month' )
)
return
divide( __cancelled, __all )
)
This should calculate what you need... Well, almost. If you need to have the probability that cancellation will happen in no more than N months, then you'll have to do this:
Pr(Cancellation happens in no more than N months) = Summation of Pr(Cancellation happens in k-th month) over k=0,1,2,...,N.
This can be coded as:
[Prob Cancellation happens before or in Nth month] =
var __month = SELECTEDVALUE( 'Cancellation Month'[MonthNumber] )
var __shouldCalculate =
NOT ISBLANK( __month )
&& __month <> -1
var __monthsToIterate =
FILTER(
ALL( 'Cancellation Month' ),
AND(
'Cancellation Month'[MonthNumber] <= __month,
'Cancellation Month'[MonthNumber] > -1
)
)
var __probs =
addcolumns(
__monthsToIterate,
"Prob", [Prod of Cancellation in Nth Month]
)
var __allProbsExist =
ISEMPTY(
FILTER(
__probs,
ISBLANK([Prob])
)
)
var __cumulativeProb = sumx( __probs, [Prob] )
return
if( __shouldCalculate && _allProbsExist,
__cumulativeProb
)
Of course, bear in mind that the probs are only ESTIMATES and it's possible that the sum might at times be above 1. This is because the universe for each k is different. We could make it the same but that would reduce the number of cases and hence the quality of the estimates of almost all the constituent probabilities.
Hint: If you get a value > 1, then add an IF condition to the above measure to cap it at 1.
Let me know how it goes.
Best
D
Hi!
Thank you for your message.
First, I am aware SELECTEDVALUE is a function and was used deliberately. However, in the context that the function is used, I am getting an error: Semantic Error: Failed to resolve name 'SELECTEDVALUE'. It is not a valid table, variable, or function name.
Second, I have already had a calendar table in my model and it is connected to Subscriptions on Date field (1:*). There are no subscriptions without an establishement date. I also already had the dimension table with Months before cancellation. I have changed the names of certain columns/metrics now and added the 'Month' column to the dimension table so that it now exactly reflects the model as you describe it.
It still does not work.
When I try to include the [Prob of Cancellation in Nth Month] measure, I'm getting an error as below:
Additonally, the SELECTEDVALUE and __maxDate are underlined in red.
I have uplaoded the model with the new nomenclature here.
Here's the file:
Probabilities of Cancelling Subscriptions
Of course, it's easy to create 2 other measures. One will give you the numerator and the other the denominator of the probability calculated for a given cell. Just dissect the probabilities and extract from the formula the numerator and denominator as 2 distinct measures. I would warn you against using these probabilities where the number in the denominator is smaller than at least 30. Ideally, it should be much much greater than this. I'm saying "ideally" because 30 is about the size of a sample where the Law(s) of Large Numbers starts to kick in.
Please have a look at this file and tell me how it goes. Bear in mind that the probabilities are in fact conditional because for each cancellation month the denominator changes (the populations differ). If you, for instance, consider one establishment month (like 2020-07), then you'll be only able to calculate the probability that subscriptions that originated in this very month were cancelled in the same month. For 1 year, you'll be able to calculate such probs for month=0,1,....,11 but the estimates will be getting worse and worse as the month number grows. This does show in the pivot table I created in the file.
Best
D
Agnieszko, sadly, you are using Excel instead of Power BI and this is code for Power BI. In Excel you'll need to reformulate the code as it is temperamental. SELECTEDVALUE indeed does not exist in Excel (which version do you have?) and hence you'll need to replicate this function using others. Also please bear in mind that Excel does not like comments interspersed with DAX.
I'll have a look at your file and try to adjust the measures. Please bear with me...
Best
D
Agnieszko,
First of all, you have to know exactly what you want to calculate. In maths there is no room for being sloppy or guessing - either you know exactly the definitions of your entities and operations... or you don't and can't do anything.
Here's how I understand it. Let's say you've selected the year of 2020 as the year for which you consider all the donations having been made in this very year. This is the universe of donations for this period of time. Now, you select a number, say, 4 to be the number of months that a donation had lasted for before it got canceled. With these definitions in place you can calculate:
1. The number of established donations A =
calculate(
countrows(Managed),
all('Months before cancelation')
)
2. The number of canceled donations B =
calculate(
countrows(Managed),
keepfilters(not(isblank(Managed[Cancellation Date])))
)
3. Attrition = divide(B, A)
Attrition is the rate for any selected period of time. If you want to have [Attrition YTD], then it's easy:
Attrition YTD =
calculate(
[Attrition],
datesytd('Calendar'[Date])
)
and Attrition Cumulative =
var __lastVisibleDay = max('Calendar'[Date])
return
calculate(
[Attrition],
'Calendar'[Date] <= __lastVisibleDay
)
In the pivot table you have in the file, you should mark the possibility that a donation has not been canceled at all. Please remember that good models are those in which there are no BLANKS exposed to the user. Everything should have a value. For instance, you could have created a table 'Months before cancellation' with 2 columns where one would store a number (hidden) and the second the name of the number which you'd expose in the model. The number would join to Managed[Months before cancelation] and you then could mark the still active donations with a number like -1 and its name would be, say, Active or Not Canceled. But there's not enough room here to explain all the best practices of creating good tabular/multidimensional models.
Best
D
Hi!
Thank you for the reply and suggestions.
However, what I meant by cumulative attrition by month before cancellation is different. For 'month before cancellation = 4' what I need is a metric that accumulates the cancellations which happened within 0, 1, 2, 3, 4 months after establishment and divides it by the number of subscriptions that had a full chance to be cancelled within that period. I. e. For year 2020 this metric would not be applicable at all, since we are only in the second month of the year and the largest 'month before cancellation' possible is 1 -- the subscriptions established this year didn't have a full chance to be cancelled within 4 months period and hence should not be considered -- for subscription to be considered for the 'month before cancellation = 4' it would have to be established in October 2019 or earlier.
Thank you again,
Agnieszka
@Anonymous wrote:
Hi!
Thank you for the reply and suggestions.
However, what I meant by cumulative attrition by month before cancellation is different. For 'month before cancellation = 4' what I need is a metric that accumulates the cancellations which happened within 0, 1, 2, 3, 4 months after establishment and divides it by the number of subscriptions that had a full chance to be cancelled within that period. I. e. For year 2020 this metric would not be applicable at all, since we are only in the second month of the year and the largest 'month before cancellation' possible is 1 -- the subscriptions established this year didn't have a full chance to be cancelled within 4 months period and hence should not be considered -- for subscription to be considered for the 'month before cancellation = 4' it would have to be established in October 2019 or earlier.
I don't fully agree with this way of calculation. Let's say you have selected year 2020 and cancellation month number is 4. For YTD you say you want all the cancellations that happened in month number 0, 1, 2, 3 and 4 after the establishment. It does not matter at all that in the year in question there can't be cancellations where month number is 2 or 3. You are just summing up the subscriptions in 2020 that have resulted in cancellations within AT MOST 4 months after their establishment (it's a well-defined number). That would be your numerator. Now for the denominator. It stands to reason that the denominator should in this case be the count of ALL the subscriptions within the selected period of time regardless of their status (canceled or not). Doesn't it?
If you want to calculate according to the rules you are trying to push through... there's no point in selecting years at all. Can you see it? That's because in your example you'd like to disregard the year of establishment and only base your analysis on the cancellation month.
Well?
Best
D
@Anonymous wrote:I don't fully agree with this way of calculation. Let's say you have selected year 2020 and cancellation month number is 4. For YTD you say you want all the cancellations that happened in month number 0, 1, 2, 3 and 4 after the establishment. It does not matter at all that in the year in question there can't be cancellations where month number is 2 or 3. You are just summing up the subscriptions in 2020 that have resulted in cancellations within AT MOST 4 months after their establishment (it's a well-defined number). That would be your numerator. Now for the denominator. It stands to reason that the denominator should in this case be the count of ALL the subscriptions within the selected period of time regardless of their status (canceled or not). Doesn't it?
The way described above is how the measures are currently calculated in the file. However, I would like the cumulative attrition to reflect the probability of subscription being in a cancelled state after certain number of months. As I include in the denominator all the subscriptions regardless if they had a full chance of being cancelled after certain number of months, the results are not what I am looking for.
@Anonymous wrote:If you want to calculate according to the rules you are trying to push through... there's no point in selecting years at all. Can you see it? That's because in your example you'd like to disregard the year of establishment and only base your analysis on the cancellation month.
As you rightfully noticed, I do not use the years in my calculations -- there are added to the pivot as that is the view requested by client, but my measures are based on cancellation months, not year of establishment.
Thank you again!
Unfortunately not, I'm still struggling to get the right results...
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
21 | |
21 | |
18 | |
17 | |
13 |
User | Count |
---|---|
41 | |
38 | |
24 | |
20 | |
20 |