Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
Hi guys!
I need some help solving an issue on PowerBI/Powerquery. I need to clean data and I can't work directly into the datasource so it must be done with Powerquery. Basically I have several time intervals that overlap and I would like to replace the start date/end date so they stop overlaping.
Here is a simplified version of my dataset :
user ID membership type start date end date
1 1 02/01/2020 03/01/2020
1 2 02/20/2020 03/20/2020
2 3 02/05/2020 02/05/2021
3 2 etc.
Users have a membership for a given period (start date, end date), there are several kind of memberships identifiable by their type (1,2 or 3). Every time that a membership is created, a line is added in the database. In theory a user can only have one membership at a time, but this is not what is happening in the database. For example in the sample above when user 1 bought a membership of type 2 he was already under a membership 1, the database created a new line. As a result in the database we read that between the 20/02/2020 (start date of membership 2) and 03/01/2020 (end date of membership 1), user 1 have two memberships (one type 1 and one type 2), but in reality this is not what happens, the new one take the place of the previous one and I need that to appear in the database also. So I need to clean the data in order to have the unicity 1 user = 1 membership at a time.
I see two options to achieve that (maybe there are more if you see them please don't hesitate to share!) :
1/ finding a way with PowerQuery to identify these duplicates and replace the dates so the intervals of the memberships for a given user never overlap (in the example the end date of membership 1 would become 19/02/2020 - one day before the start day of membership 2) --> I have no clue on how to achieve that, that would be the best solution in my opinion
2/ as it is mostly ascending changes in membership types (from membership 1 to 2 to 3 but not 3 to 2 or 1), I tried to create a new column that find the maximum value per user (creating a new table grouped by user ID with the max value in Membership type) BUT it was not dynamic in PowerBI. Indeed I use a date filter based on start date and end date columns and the max value created did not adapt to this filter.
Does somebody have a solution to clean the data and have a unicity 1 user = 1 membership at time T (start date<T<end date)?
If you need any precision don't hesitate to ask I realize the issue may not be well described!
Thanks a lot!!!
Léa
@LéaGr , check if this file can help
https://www.dropbox.com/s/1mlq21o1xjuw8il/overlappingdates.pbix?dl=0
Hi @amitchandak ,
Thanks a lot for the answer.
It helps as it allows me to identify the overlaps but I still do not know how to clean them. Would you know a way to replace the end date of an event in conflict with another by the start date of this other event for example?
Thanks a lot,
Best,
Léa
Hi @amitchandak ,
Thanks for your answer! Yes it heps because it allows me to identify clearly overlaps but now I would need to "clean" these overlaps by replacing the dates when there is a conflict. I would need to automatically be able to replace the end date of the oldest event by the start date minus one day of the other event. Would you know how to achieve that?
Thanks a lot!
Best,
Léa