Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I'm having an issue with a DAX calc for lost customers and I'm hoping someone can spot where we went wrong. I have followed the pattern described in http://www.daxpatterns.com/new-and-returning-customers/ to get lost customers, but it is returning unexpected results. I'm hoping there is just a mistake in my formula that I am overlooking. I'm curious if anyone else has tried this and experienced a similar issue.
I have tables for Customer, Date, Sales, which are relevant here. Sales are at the invoice line level, and each line is related to customer by CustomerKey (surrogate key to a type 2 SCD) and date through InvoiceDateKey (surrogate key to date dim). The InvoiceDate field was added as a calculated column to the Sales table.
We are defining lost customers as any customer who hasn't made a purchase in 60 consecutive days ending in the selected timeframe. I have a customer that had purchases on 18-Dec-15, 29-Jan-16, and 12-Feb-16, but is somehow being counted as a lost customer in February. The only thing I noticed about them is that the Dec sales records are related to a different customer key than the Jan and Feb sales because someone corrected the customer name. I don't think this should happen since I'm counting unique [Customer No] values rather than [CustomerKey] values.
Here's my formula:
Lost Customers := IF ( NOT ( MIN ( 'Date'[Full Date] ) > CALCULATE ( MAX ( Sales[Invoice Date] ), ALL ( Sales ) ) ), COUNTROWS ( FILTER ( ADDCOLUMNS ( FILTER ( CALCULATETABLE ( ADDCOLUMNS ( VALUES ( Sales[CustomerNO] ), "CustomerLostDate", CALCULATE ( MAX ( Sales[Invoice Date] ) ) + [Lost Days Limit] ), FILTER ( ALL ( 'Date' ), AND ( 'Date'[Full Date] < MIN ( 'Date'[Full Date] ), 'Date'[Full Date] >= MIN ( 'Date'[Full Date] ) - [Lost Days Limit] ) ) ), AND ( AND ( [CustomerLostDate] >= MIN ( 'Date'[Full Date] ), [CustomerLostDate] <= MAX ( 'Date'[Full Date] ) ), [CustomerLostDate] <= CALCULATE ( MAX ( Sales[Invoice Date] ), ALL ( Sales ) ) ) ), "FirstBuyInPeriod", CALCULATE ( MIN ( Sales[Invoice Date] ) ) ), OR ( ISBLANK ( [FirstBuyInPeriod] ), [FirstBuyInPeriod] > [CustomerLostDate] ) ) ) )
Any ideas where my issue is?
Edit:
Here are the results in a pivot table. Notice the first pivot has only Customer No and seems to work. The second includes customer name and provides unexpected results.
Solved! Go to Solution.
Good point - you have to remove some filter from the CustomerLostDate column expression.
Try this one replacing the ADDCOLUMNS in original formula:
ADDCOLUMNS ( CALCULATETABLE ( VALUES ( Customer[CustomerCode] ), Sales ), "CustomerLostDate", CALCULATE ( MAX ( Sales[Invoice Date] ), ALLEXCEPT ( Customer, Customer[CustomerCode] ) ) + [Lost Days Limit] )
I have followed the pattern and it worked for me. The only obvious thing I see is that you have an extract IF statement at the start. Have you tried it without this?
I removed it and got the same results. Thanks for the suggestion, though.
I'm thinking that it has to do with the customer dimension being a type 2 SCD. The Dec sales are on one customer key (surrogate key) and the Jan and Feb sales are on another. I don't really understand how that could affect this since I never use the customer key anywhere, and the customer number (business key) is the same for all the customer keys for that customer.
So you have 2 IDs for the customer? That sounds suspicious. I think (eg I am not 100% sure but give it a try) that your dim table needs to connect to a primary key for the customer, and then use the same primary key for all your calcs. If I understand you correctly (which I may not be), you are joining on one key (SCD Surrogate) and then doing the calc on a different ID. If this is what you are doing, I am thinking this could be the culprit.
Slowly changing dimensions (SCDs) are a common data warehousing practice used to capture history of what an entity looked like at a particular time. http://www.kimballgroup.com/2008/09/slowly-changing-dimensions-part-2/
I cannot connect to just the customer number (the natural/business key) because there are multiple rows in my customer table that have this key. The customer table is tied to the sales fact table via the surrogate key (an autoincremented number that provides no real value other than to uniquely identify a row and serve as a primary key for the dimension table). So I cannot simply count customer keys because that would result in incorrect numbers. To count the number of lost/recovered/new/any customers, I must do a distinct count on the customer number, which would normally be the same as the customer key if I weren't keeping history (and is the same in the DAX pattern).
So basically, I'm trying to figure out how to alter the pattern to work with a slowly changing dimension.
Yes I know what a SCO is and I know why they exist. My point is that I think this is causing the issue. I believe the lost customer formula expects the dim table to have 1 record per customer and not multiple (best guess). @marcorusso would have to tell you for sure.
I am not saying it is right or wrong, but when I create models, I always take the last view of the customer as my dim table. If I need a historical view of the customer then I assume you can just load a fact table with the historical records for those customers that have changed and use DAX to extract the historical data.
Once again, this is just an assumption and hopefully Marco can give the definitive answer.
If you have an SCD2, instead of
VALUES ( Sales[CustomerNO] )
you should use
CALCULATETABLE ( VALUES ( Customer[CustomerCode] ), Sales )
where CustomerCode is the key for the customer that is the same across all the versions (surrogate keys) of the customer itself.
Please note this will slow down the performance and it is the reason why we used the column in Sales.
A best practice is to denormalize the customer code in the fact table, just to use that in this calculation.
Please note that if you are using Excel 2016 or Power BI Desktop, it is possible to write the same pattern using the new set functions (INTERSECT, EXCEPT, UNION) in a much faster way. We'll update the patterns for these versions, but I'm worried we'll not have time until this autumn.
Thank you, Marco. That gives me the correct answer if I only have the Customer No in a pivot table. If I include the Customer Name in the pivot table (this is the attribute that changed over time), it counts them as lost. Is there a way to alter the calculation to change this behavior so I get consistent results in both pivots?
For example, I have a customer who bought something in December and then their name changed, then they had purchases in January and February.
Customer CustomerKey Customer No Customer Name EffectiveStart EffectiveEnd RowIsCurrent 1 1 Person A 1/1/1900 1 2/31/2015 0 2 1 Person A2 1/1/2016 1
SalesInvoice InvoiceDateKey InvoiceDate CustomerKey ProductKey SalesAmount 20151218 12/18/2015 1 3 10 20160129 1/29/2016 2 4 15 20160216 2/12/2016 2 5 20
Their pivot with just Customer No looks good (they are not counted as lost). But if I include Customer Name after the Customer Number, they are counted as lost in February, despite the fact that I am not using the CustomerKey (surrogate key) anywhere in the formula.
Good point - you have to remove some filter from the CustomerLostDate column expression.
Try this one replacing the ADDCOLUMNS in original formula:
ADDCOLUMNS ( CALCULATETABLE ( VALUES ( Customer[CustomerCode] ), Sales ), "CustomerLostDate", CALCULATE ( MAX ( Sales[Invoice Date] ), ALLEXCEPT ( Customer, Customer[CustomerCode] ) ) + [Lost Days Limit] )
This is a great calc. The problem Im having is that its counting a customer as lost that didnt buy for 12 consecutive months more than once in their 5 year history. Only the last 12 months with no sales shoudl be counted as lost. How do I get it to only grab the last purchase date. Wehn I wabt to displayit by month it shows as lost in prior years. thanks
There is a new version of the pattern that considers new, lost, and recovered customers - it is more flexible and you should be able to adapt it to your requirements:
Thanks so much, Marco. That did the trick. For anyone else who happens to need it, here is my full DAX calculation:
Lost Customers:=IF ( NOT ( MIN ( 'Date'[Full Date] ) > CALCULATE ( MAX ( Sales[Invoice Date] ), ALL ( Sales ) ) ), COUNTROWS ( FILTER ( ADDCOLUMNS ( FILTER ( CALCULATETABLE ( ADDCOLUMNS ( CALCULATETABLE ( VALUES ( Customer[Customer No] ), Sales ), "CustomerLostDate", CALCULATE ( MAX ( Sales[Invoice Date] ), ALLEXCEPT ( Customer, Customer[Customer No] ) ) + [Lost Days Limit] ), FILTER ( ALL ( 'Date' ), AND ( 'Date'[Full Date] < MIN ( 'Date'[Full Date] ), 'Date'[Full Date] >= MIN ( 'Date'[Full Date] ) - [Lost Days Limit] ) ) ), AND ( AND ( [CustomerLostDate] >= MIN ( 'Date'[Full Date] ), [CustomerLostDate] <= MAX ( 'Date'[Full Date] ) ), [CustomerLostDate] <= CALCULATE ( MAX ( Sales[Invoice Date] ), ALL ( Sales ) ) ) ), "FirstBuyInPeriod", CALCULATE ( MIN ( Sales[Invoice Date] ) ) ), OR ( ISBLANK ( [FirstBuyInPeriod] ), [FirstBuyInPeriod] > [CustomerLostDate] ) ) ) )
Hi Guys,
I need help understanding how to find lost customers. All I need is to find out customers who had sales last period, but not currnet period depending on the relative date filter selected. I don't understand concept used with "Lost Days Limit".
I have read http://www.daxpatterns.com/new-and-returning-customers/ multiple times. I have no issues with Return and New Customers Concept.
Thank you. Appreciated.
Hi @zkazimov,
Share a dataset and show the expected result.
Hi @Ashish_Mathur,
I want lost customers to be like in the image below for new customers. Links below for sample file and data source.
https://1drv.ms/u/s!ArZK-htpGmgl7iruP47UKkZYs0yR
Hi @zkazimov,
To keep things simple, try this calculated field formula
=if(AND([TOTAL SALES PREV PERIOD]>0,[TOTAL SALES]=0),1,0)
In the Visual level filters section, apply a criteria of 1.
Hope this helps.
That works as a workaround, but not able to aggregate that measure to show on for example usign Card Visual. Also in the table see below how it is not aggreagted for total lost customers.
I need same effect as i have in the table for new customers.
Hi @zkazimov
Try this calculated field formula
=COUNTROWS(FILTER(SUMMARIZE(CALCULATETABLE(VALUES(SALES[CUSTOMER]),DATESBETWEEN('CALENDAR'[Date],EDATE(MIN('CALENDAR'[Date]),-1),max('CALENDAR'[Date]))),[CUSTOMER],"ABCD",[TOTAL SALES],"EFGH",[TOTAL SALES PREV PERIOD]),[EFGH]>0&&ISBLANK([ABCD])))
Hope this helps.
This is great! Summarize Function did the job. Thanks a lot for time spent on this Appreciated.
I am having same issue, could you please share the DAX you created ?
Hi @zkazimov,
If my reply helped, please mark it as Answer.
Check out the September 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.