Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ConnieMaldonado
Responsive Resident
Responsive Resident

Filter by Dups to Perform Analysis

I have a table that includes the following:

 

Last_RegisterEmailNoVersionStatusAccountLast_TripEECount EmailsRemove
11/15/2022 13:42person1@gmail.com123456781not avail12345 1232FALSE
3/9/2023 7:09person1@gmail.com123456782registered234563/9/2023 11:391232FALSE
2/11/2023 13:05person2@gmail.com234567891not avail345672/11/2023 17:453453FALSE
3/9/2023 7:28person2@gmail.com234567891registered456783/9/2023 11:593453FALSE
3/9/2023 7:30person2@gmail.com345678902registered56789 3453FALSE
3/7/2023 16:40person3@gmail.com456789011pending678903/7/2023 21:166782FALSE
3/9/2023 7:26person3@gmail.com456789011registered78901 6782FALSE

 

I am trying to create a "Remove" filter to determine whether the record should be removed from the data.

 

So first I need to identify dup records based on email.  Then, based on certain criteria involving No, status and registered date, I need to build logic to determine which record(s) to remove.  I created a "Remove" column which is FALSE for now (until I build the logic).

 

I was able to identify records with duplicate emails by creating the following column:

 

Count Emails =
Var Emails = Table[Email]
RETURN

CALCULATE(
    COUNTROWS(Table),
    all(Table),
    Table[Email] = Emails
)
 
I have no idea where to begin to isolate a "set" of dups and determine which to remove.
 
For example, for a set of "dups", let's say person1@gmail.com, I want to keep the record with the latest "Last_Register" date and remove the others.  So I would set Remove = TRUE for the first record with Last_Register = 11/15/2022 13:42. 
 
Here's what the results would look like:
Last_RegisterEmailNoVersionStatusAccountLast_TripEmployee NoCount EmailsRemove
11/15/2022 13:42person1@gmail.com123456781not avail12345 1232TRUE
3/9/2023 7:09person1@gmail.com123456782registered234563/9/2023 11:391232FALSE
2/11/2023 13:05person2@gmail.com234567891not avail345672/11/2023 17:453453TRUE
3/9/2023 7:28person2@gmail.com234567891registered456783/9/2023 11:593453TRUE
3/9/2023 7:30person2@gmail.com345678902registered56789 3453FALSE
3/7/2023 16:40person3@gmail.com456789011pending678903/7/2023 21:166782TRUE
3/9/2023 7:26person3@gmail.com456789011registered78901 6782FALSE
 
How would I build that logic - i.e., to set Remove = "TRUE" for record(s) with the earlier "Last_Register" date for dups based on email.  If I can build that logic, I can figure out the rest.  Just not sure where to start.
 
Any help would be appreciated.  Thank you.
1 ACCEPTED SOLUTION
CNENFRNL
Community Champion
Community Champion

CNENFRNL_0-1678412149691.png


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

View solution in original post

1 REPLY 1
CNENFRNL
Community Champion
Community Champion

CNENFRNL_0-1678412149691.png


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.