cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.

Reply
Dbjerring
Regular Visitor

Remove duplicates based on values from multiple cells

Hi Community,

 

We have dataset based on logs - here we would like to create a new column with a true/false value if the line is a duplicate.

We have UserID, LogTimeDate, LogTime, and number of logged lines. 

The Dax query should check if there are more entries with the same date and number of logged lines, and if so, flag the line with a true/false marking.

 

How would we go about formulating such an query?

 

Inputs are much appreciated.

 

Good day.

 

/David

1 ACCEPTED SOLUTION

@Dbjerring,

 

You could modify the formula as shown below.

IsDuplicate =
IF (
    COUNTROWS (
        FILTER (
            Logs,
            Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
                && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] )
                && Logs[LogID] < EARLIER ( Logs[LogID] )
        )
    )
        > 0,
    TRUE (),
    FALSE ()
)
Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

16 REPLIES 16
Anonymous
Not applicable

Hi @Dbjerring

 

Are you aware of query editor? I think you can use this tool to reach your goal.

 

Best,

Martin

Hi @Anonymous,

 

Yes, but I'm not sure how to perform an advanced query like this - any tips?

 

/David

Anonymous
Not applicable

Hi @Dbjerring

 

You go to the menu on the left where there is 3 buttons, and choose the one called "Data". You then right-click your data and choose "Edit query". Then you right-click the column-header of the column where you want to remove duplicates and choose "Remove duplicates". Let me know if this solves your problem.

 

I'll also gladly elaborate.

@Anonymous - I don't need to remove them, I have to tag them. And since it's based on values from two different cells, the remove duplicate function isn't enough. Thank you for your elaboration though.

 

Good day to you sir 

Anonymous
Not applicable

@Dbjerring

Sorry I misesd that part 😄 No problem at all. Did @v-chuncz-msft solve your problem?

 

@Anonymous - No, I still hope he'll reply.

 

Do you have any inputs?

@Dbjerring,

 

Study the RANKX Function.

Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

@Dbjerring

 

Yea I have a solution in mind. Could you create a sample dataset with 5 rows? I'd like to test it before suggesting it.

@Anonymous - The dataset: 

 

LogIDLogCNumberLogTime.1LogCountIsDuplicate
2123456720-06-2017 FALSE
3117269020-06-2017 FALSE
4 21-06-2017-1FALSE
5114524321-06-201772TRUE
6114524321-06-201772TRUE
7114524321-06-201772TRUE
8117269028-06-201794FALSE
9117269030-06-201755FALSE
10114524307-07-2017115TRUE
11114524307-07-2017115TRUE
1229745812-07-201733FALSE
13106157113-07-2017163FALSE
1433887313-07-201714FALSE
1546973613-07-20179FALSE
16116681813-07-2017148FALSE
17110724513-07-201724TRUE
18110724513-07-201724TRUE
19116536913-07-201751FALSE
20111786713-07-201722FALSE

 

Thank you

@Dbjerring,

 

You could modify the formula as shown below.

IsDuplicate =
IF (
    COUNTROWS (
        FILTER (
            Logs,
            Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
                && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] )
                && Logs[LogID] < EARLIER ( Logs[LogID] )
        )
    )
        > 0,
    TRUE (),
    FALSE ()
)
Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hlelo @v-chuncz-msft ,

 

How would you go about identifying and keeping the latest row from multiple files datasource containing duplicates?

 

Cheers,

Antonio Santos

@v-chuncz-msft: I get a "Too many arguments were passed to the IF function. The maximum argument count for the function is 3.":

 

IsDuplicate = 
IF (
    COUNTROWS (
        FILTER (
            Logs;
            Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
                && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] )
                && Logs[LogID] < EARLIER ( Logs[LogID] )
        )
    )
        > 0;
    TRUE ();
    FALSE ();
)

Any thoughts?

 

Thank you so much for the help

 

//David

 

@Dbjerring,

 

Remove the last semicolon after "FALSE ()".

Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@v-chuncz-msft - Thank you so much - That did the trick.

 

Have a nice day!

@Dbjerring,

 

You may use the following DAX to add a calculated column.

IsDuplicate =
IF (
    COUNTROWS (
        FILTER (
            Table1,
            Table1[UserID] = EARLIER ( Table1[UserID] )
                && Table1[LogTimeDate] = EARLIER ( Table1[LogTimeDate] )
        )
    )
        > 1,
    TRUE ()
)
Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@v-chuncz-msft

 

The DAX expression works, but for some reason both lines (also the the one without EARLIER value) gets tagged as TRUE.

So all lines with a possible duplicate gets TRUE VALUE.

 

Any thoughts?

 

The adjusted DAX calc I use:

 

IsDuplicate =
IF (
COUNTROWS (
FILTER (
Logs;
Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
&& Logs[LogTime.1] = EARLIER( Logs[LogTime.1] )
)
)
> 1;
True();FALSE()
)

Helpful resources

Announcements
PBI November 2023 Update Carousel

Power BI Monthly Update - November 2023

Check out the November 2023 Power BI update to learn about new features.

Community News

Fabric Community News unified experience

Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.

Dashboard in a day with date

Exclusive opportunity for Women!

Join us for a free, hands-on Microsoft workshop led by women trainers for women where you will learn how to build a Dashboard in a Day!

Power BI Fabric Summit Carousel

The largest Power BI and Fabric virtual conference

130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors