Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello, I used this WONDERFUL youtube video to find duplicate/repeat records in my data.
https://www.youtube.com/watch?v=xN2IRXQ2CvI
High level it Looks up rows with the same value and tells me if there is a repeated issue within 5 days.
Issue | Date | Repeat? | Previous Record |
Issue 1 | 12/5/19 | No | (null |
Issue 2 | 12/6/19 | Yes | Issue 1 |
Issue 3 | 12/25/2019 | No | Issue 2 |
It works perfectly. But I'm trying to add some logic that will add a column that tells me what issue caused the repeat. (see example below) I tried performing another merge AND using List Contains but i'm working with 100K+ rows. It either crashes excel or takes 30+ minutes to run. IS there any other work around?
Issue | Date | Repeat? | Previous Record | Caused a Repeat |
Issue 1 | 12/5/19 | No | (null) | Yes |
Issue 2 | 12/6/19 | Yes | Issue 1 | No |
Issue 3 | 12/25/2019 | No | Issue 2 | No |
Solved! Go to Solution.
Hi @Anonymous
If my previous reply helped you, could you kindly accept it as a solution so people may find the solutions quickly?
If the problem still exists, please check the workarounds below:
Create calcualated columns below:
repeated =
VAR re =
CALCULATE (
COUNT ( 'Table 2'[Issue] ),
FILTER (
'Table 2',
DATEDIFF ( 'Table 2'[Date], EARLIER ( 'Table 2'[Date] ), DAY ) <= 5
&& 'Table 2'[Date] <= EARLIER ( 'Table 2'[Date] )
)
)
RETURN
IF ( re > 1, "Yes", "No" )
previous record = CALCULATE(MAX('Table 2'[Issue]),FILTER('Table 2','Table 2'[Date]<EARLIER('Table 2'[Date])))
cause a repeat =
VAR c =
CALCULATE (
MAX ( 'Table 2'[previous record] ),
FILTER ( 'Table 2', 'Table 2'[repeated] = "Yes" )
)
RETURN
IF ( [Issue] = c, c, "No" )
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @Anonymous
If my previous reply helped you, could you kindly accept it as a solution so people may find the solutions quickly?
If the problem still exists, please check the workarounds below:
Create calcualated columns below:
repeated =
VAR re =
CALCULATE (
COUNT ( 'Table 2'[Issue] ),
FILTER (
'Table 2',
DATEDIFF ( 'Table 2'[Date], EARLIER ( 'Table 2'[Date] ), DAY ) <= 5
&& 'Table 2'[Date] <= EARLIER ( 'Table 2'[Date] )
)
)
RETURN
IF ( re > 1, "Yes", "No" )
previous record = CALCULATE(MAX('Table 2'[Issue]),FILTER('Table 2','Table 2'[Date]<EARLIER('Table 2'[Date])))
cause a repeat =
VAR c =
CALCULATE (
MAX ( 'Table 2'[previous record] ),
FILTER ( 'Table 2', 'Table 2'[repeated] = "Yes" )
)
RETURN
IF ( [Issue] = c, c, "No" )
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
THANK YOU SO MUCH!
When you extract, try extracting additional columns. You would then be able to compare the prior record's columns to the current record's columns with an add column. You would not have to perform additional lookups.
Also, I would be wary of using the technique in the video. PowerBI behavoir only guarantees a sort order if Sort is the last step within a query OR you issue a Table.Buffer after the Table.Sort.
Regards,
Mike
Hi @Anonymous
Beside the merge as the video shown, merge as below,
After expand, we can rename the columns as "next issue" and "next date",
Then add custom column
cause a repeat=if Duration.Days([next date]-[Date])<=7 and [next date]<>null then "yes" else "no"
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thank you! @v-juanli-msft But because there are so many rows of data, adding another index and merging again causes Power Query/PBI to load for a long time. I was hoping there was a custom column that could be added to perform this
Hello @Anonymous,
Not able to help, as I didn't really get the idea what the database is about and what exactly should be calculated
BR
Jimmy
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.