Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi all,
I have written my problem below. I hope with enough clarity. I would prefer a solution in Power Query/M, but I'll gladly settle for a solution in DAX.
The use case:
My organization manages "subscriptions" for health care products in various categories and I need to determine whether clients are NEW, RETURN or CONTINUED clients. A subscription is the period from Start Date up to and including the End Date for a particular product within a specific product category. A client is NEW when there (s)he has not had a previous subscription with the organization. A client is a RETURN customer when the date difference between the latest Start Date and the previous End Date is more than 31 days. A client is CONTINUED when this difference is less than 31 days.
My issue:
A client can have subscriptions to multiple products within a specific product category with overlapping Start/End Dates. For example. A client can have a subscription for a electric bicycle from 1-1-2021 to 30-6-2021 and later have a concurrent subscription for a taxi service from 1-4-2021 to 31-12-2021. From a business perspective we consider this subscription to last from 1-1-2021 to 31-12-2021.
If these data issues didn't exist, I would Merge the table onto itself using the Index / Index - 1 trick, and then calculate the DateDiff between the previous End Date and the later Start Date. However, this does not work without smoothing over the above mentioned issue.
Sample data
My data looks like this. Note that column 'Product' is irrelevant for my requirements but kept in for clarity (as to how the issue is introduced into the data). Column 'Serial#' is not native to the data, but a group index I added to join the table onto itself. Maybe somewhat unintuitively it indexes from the most recent subscription. This can be changed of course.
ID | Client ID | ProdCat ID | Prod ID | Start Date | End Date | Serial# |
1 | 007 | A | A01 | 1-8-2022 | 31-12-2022 | 1 |
2 | 007 | A | A01 | 1-1-2022 | 31-7-2022 | 2 |
3 | 007 | A | A01 | 1-1-2021 | 30-6-2021 | 3 |
4 | 007 | A | A02 | 1-6-2020 | 1-3-2021 | 4 |
5 | 008 | A | A01 | 1-1-2022 | 31-12-2022 | 1 |
6 | 008 | A | A01 | 1-1-2021 | 31-12-2021 | 2 |
7 | 008 | A | A02 | 1-6-2020 | 30-9-2020 | 3 |
My issue is illustrated by the rows with ID 3 and 4. From a business perspective these rows are actually one single subscription starting 1-6-2020 (row 4) and ending 30-6-2021 (row 3). In this example there are two overlapping row. In practice there is theoretically no limit to the number of overlapping products.
Any idea/suggestions on how to achieve the transformation/modelling of these overlapping rows to show the subscription length of the overarching product category rather than the individual products?
To be sure, I ultimately need to determine whether clients are NEW, RETURN or CONTINUED based on Product Category subscriptions. So if there are alternative solutions, I'm open to those.
Thanks in advance!
Solved! Go to Solution.
Hi @Anonymous ,
Please add these custom columns in Power Query Editor.
= List.Dates([Start Date], Duration.Days([End Date]-[Start Date])+1 ,#duration(1, 0, 0, 0))
= let CID = [Client ID], LDate = [List_Dates],
mylist = Table.SelectRows(#"Added Custom", each [Client ID] = CID and
List.ContainsAny([List_Dates], LDate))[#" Serial#"]
in Text.Combine(List.Transform(mylist, Text.From), ",")
Then calculate the maximum and minimum dates by grouping in [Client ID] and [Custom] columns.
If the problem is still not resolved, please provide detailed error information or the expected result you expect. Let me know immediately, looking forward to your reply.
Best Regards,
Winniz
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @Anonymous ,
Please add these custom columns in Power Query Editor.
= List.Dates([Start Date], Duration.Days([End Date]-[Start Date])+1 ,#duration(1, 0, 0, 0))
= let CID = [Client ID], LDate = [List_Dates],
mylist = Table.SelectRows(#"Added Custom", each [Client ID] = CID and
List.ContainsAny([List_Dates], LDate))[#" Serial#"]
in Text.Combine(List.Transform(mylist, Text.From), ",")
Then calculate the maximum and minimum dates by grouping in [Client ID] and [Custom] columns.
If the problem is still not resolved, please provide detailed error information or the expected result you expect. Let me know immediately, looking forward to your reply.
Best Regards,
Winniz
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thanks @v-kkf-msft
It solves the problem, so I accept your solution. I also learned a lot from it, so that's even better. It does wreak havoc on the data load/memory with my dataset, eventually loading 9GB of data on a 4mb source file, with calculation lasting almost a day.
But that's a secondary concern. Primary concern has been solved. Thanks!
Hi @Anonymous ,
I will try to solve it, please be patient.
Check out the September 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
112 | |
105 | |
95 | |
38 | |
30 |