Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Identify duplicates and show up records

I've got contact table with columns: names, emails. 

 

I want to identify duplicates in emails and show up their names. 

for example, show me the table like this.   any ideas, thanks. CRM.PNG

 

1 ACCEPTED SOLUTION
Anonymous
Not applicable

You could create a calculated column with the following formula:

Count Emails = 
Var Emails = [Emails]
RETURN

CALCULATE(
    COUNTROWS('Your Table'),
    all('Your Table'),
    'Your Table'[Emails] = Emails
)

From here, you could simply look for any rows where the Count Emails is greater than 1.

View solution in original post

30 REPLIES 30
mkcongrove1
Frequent Visitor

This thread is the first to pop up when searching for how to solve this issue, so I thought I would share the non-DAX solution. Open Power Query, on the Home ribbon - select 'Keep Rows' and choose 'Keep Duplicates'.Keep Duplicates.jpg

wkasaval
New Member

Hi, I'm not getting any luck with the first solution:

 

It says cannot find name '[AFS_CAT]' on the var line...

 

Count AFS_CAT =
Var AFS_CAT = [AFS_CAT]
RETURN

CALCULATE(
COUNTROWS(MEMBER_RECON),
all(MEMBER_RECON),
MEMBER_RECON[AFS_CAT] = AFS_CAT
)
Olanna
Regular Visitor

Works perfectly

Anonymous
Not applicable

You could create a calculated column with the following formula:

Count Emails = 
Var Emails = [Emails]
RETURN

CALCULATE(
    COUNTROWS('Your Table'),
    all('Your Table'),
    'Your Table'[Emails] = Emails
)

From here, you could simply look for any rows where the Count Emails is greater than 1.

Hi @Anonymous  

 

The function only works by Import Storage Mode, Do you have any idea how this could work in Direct Query?

 

Thanks for the help

Im not sure if this is better, wors or the same but in later versions of Power Bi you can use "Earlier" to refere to "in same line" values... hence i used it like this:

 

count entry = CALCULATE(COUNT(yourtable[email]);
                          FILTER(yourtable;yourtable[email] = EARLIER(yourtable[email])))
 
 
count entry = CALCULATE(COUNT(FinansRecords[F.Contry+Entry]);FILTER(FinansRecords;FinansRecords[F.Contry+Entry] = EARLIER(FinansRecords[F.Contry+Entry])))
Anonymous
Not applicable

Well look at that, this thread has been open for 2 years and the day I come to post my solution someone has beaten me by 4 days! The EARLIER() solution is cleaner but should be the same complexity as declaring a variable. Essentially, creating a variable outside of the calculate function is one way to pass row context into the calculate, and using the EARLIER function in the filter is the cleaner way of doing the same thing.

Anonymous
Not applicable

Declaring variables has become a habit as it works in all cases.  Yes Earlier works in this instance but i've run into a number of instances where i need to pass information into a calculate because the corresponding function for whatever i'm doing isn't compatable with calculate.  For ease of development i just use variables 100% of the time now.

Amazing! Thank you. This saves me so much time.

This works great for me.  How would i amend this to only perform the duplicate check for the year a record is related to?

Anonymous
Not applicable

I'll have to make some assumptions about your data but it could be something like:

 

Count Emails = 
Var Emails = [Emails]
Var YearVal = YEAR([EmailDate])
RETURN

CALCULATE(
    COUNTROWS('Your Table'),
    all('Your Table'),
    'Your Table'[Emails] = Emails,
    'Your Table'[EmailDate] => DATE(YearVal, 1, 1),
    'Your Table'[EmailDate] <= DATE(YearVal, 12, 31)
)
Anonymous
Not applicable

Hello all,

 

The suggested solution works perfectly, However, in my case I need to count how many times a certain value has appear so if it appears 3 time I need to get an output for each of those 3 lines like 1 or 2 or 3 by descending order.... Anyone can suggest??

 

Thanks in advance 

Hi you can do this in M query add both line of code individually in your steps.

 

= Table.Group(#"Previous Step", {"Column_name"}, {{"Tablewithindex", each Table.AddIndexColumn(_,"Running Total",1,1), Int64.Type}})
= Table.Combine(#"Grouped Rows"[Tablewithindex])

 

Anonymous
Not applicable

@Anonymous  hi there, I just came across this post and it works exactly as suggested. thanks for that. My question is how I could use this or an equivalent in power query editor. I would prefer to add the calculated column in this environment because of subsequent logic I want to build into my table. 

 

thanks

Anonymous
Not applicable

A find and replace of null might do what you are chasing.

Go to your "Replace Values" and write 'null'  (without quotes) into the top box and then put 0 in the other box.

 

Is that the part of the post you wanted a Power Query version for?

Anonymous
Not applicable

@Anonymous 

hmm, no I don't think so. I'm looking for a query editor version to count the number of occurrences of a value of all rows in the table.  It's this part of the post I am looking for an alternative for in query editor. 

 

Count Emails = 
Var Emails = [Emails]
RETURN

CALCULATE(
    COUNTROWS('Your Table'),
    all('Your Table'),
    'Your Table'[Emails] = Emails

 

Anonymous
Not applicable

If you are wanting to do a count, you could use the Grouping function.  That will produce a count for you, but will summarize the table you use it in.  If you don't want to upset your existing data, just create a new query to reference the old query and do the grouping in the new table.  This table could be joined in the model.

 

The grouping function is found on the ribbon as "Group By".  Tell it to group by the email address and use the "Count" function to create the summarised field.

Anonymous
Not applicable

@Anonymous not toally understand this DAX, but it's toally work!!!.  many thanks!!!

Anonymous
Not applicable

Here is a quick explaination for your learning:


This formula is run on a row by row basis, so keepthat in mind.

Var Emails = [Emails]
RETURN

This part takes the answer on the current row in the field [Emails] and places it in memory to be used later on (called a variable).

CALCULATE() is used to run a statement, but with extra conditions

COUNTROWS('Your Table') is going to get us the count we care about, but since we are doing this calculation row by row, there is only 1 row that could be counted

 

all('Your Table') is used to tell the CALCULATE to look at every row in the table, instead of just the current row.  Of course we don't actually want every row, or else we'd get the same number

'Your Table'[Emails] = Emails  is a filter that is going to be applied to the all('Your Table').  This will filter the ALL down until it only has in it rows where the [Email] field contains the value we put into that variable in the first step.

@Anonymous - I'm new to PowerBI. I'm curious to know if the order of the other functions matter when using the CALCULATE function.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.