Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
Dbjerring
Regular Visitor

Remove duplicates based on values from multiple cells

Hi Community,

 

We have dataset based on logs - here we would like to create a new column with a true/false value if the line is a duplicate.

We have UserID, LogTimeDate, LogTime, and number of logged lines. 

The Dax query should check if there are more entries with the same date and number of logged lines, and if so, flag the line with a true/false marking.

 

How would we go about formulating such an query?

 

Inputs are much appreciated.

 

Good day.

 

/David

1 ACCEPTED SOLUTION

@Dbjerring,

 

You could modify the formula as shown below.

IsDuplicate =
IF (
    COUNTROWS (
        FILTER (
            Logs,
            Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
                && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] )
                && Logs[LogID] < EARLIER ( Logs[LogID] )
        )
    )
        > 0,
    TRUE (),
    FALSE ()
)
Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

16 REPLIES 16
Anonymous
Not applicable

Hi @Dbjerring

 

Are you aware of query editor? I think you can use this tool to reach your goal.

 

Best,

Martin

Hi @Anonymous,

 

Yes, but I'm not sure how to perform an advanced query like this - any tips?

 

/David

Anonymous
Not applicable

Hi @Dbjerring

 

You go to the menu on the left where there is 3 buttons, and choose the one called "Data". You then right-click your data and choose "Edit query". Then you right-click the column-header of the column where you want to remove duplicates and choose "Remove duplicates". Let me know if this solves your problem.

 

I'll also gladly elaborate.

@Anonymous - I don't need to remove them, I have to tag them. And since it's based on values from two different cells, the remove duplicate function isn't enough. Thank you for your elaboration though.

 

Good day to you sir 

Anonymous
Not applicable

@Dbjerring

Sorry I misesd that part 😄 No problem at all. Did @v-chuncz-msft solve your problem?

 

@Anonymous - No, I still hope he'll reply.

 

Do you have any inputs?

@Dbjerring,

 

Study the RANKX Function.

Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

@Dbjerring

 

Yea I have a solution in mind. Could you create a sample dataset with 5 rows? I'd like to test it before suggesting it.

@Anonymous - The dataset: 

 

LogIDLogCNumberLogTime.1LogCountIsDuplicate
2123456720-06-2017 FALSE
3117269020-06-2017 FALSE
4 21-06-2017-1FALSE
5114524321-06-201772TRUE
6114524321-06-201772TRUE
7114524321-06-201772TRUE
8117269028-06-201794FALSE
9117269030-06-201755FALSE
10114524307-07-2017115TRUE
11114524307-07-2017115TRUE
1229745812-07-201733FALSE
13106157113-07-2017163FALSE
1433887313-07-201714FALSE
1546973613-07-20179FALSE
16116681813-07-2017148FALSE
17110724513-07-201724TRUE
18110724513-07-201724TRUE
19116536913-07-201751FALSE
20111786713-07-201722FALSE

 

Thank you

@Dbjerring,

 

You could modify the formula as shown below.

IsDuplicate =
IF (
    COUNTROWS (
        FILTER (
            Logs,
            Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
                && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] )
                && Logs[LogID] < EARLIER ( Logs[LogID] )
        )
    )
        > 0,
    TRUE (),
    FALSE ()
)
Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hlelo @v-chuncz-msft ,

 

How would you go about identifying and keeping the latest row from multiple files datasource containing duplicates?

 

Cheers,

Antonio Santos

@v-chuncz-msft: I get a "Too many arguments were passed to the IF function. The maximum argument count for the function is 3.":

 

IsDuplicate = 
IF (
    COUNTROWS (
        FILTER (
            Logs;
            Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
                && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] )
                && Logs[LogID] < EARLIER ( Logs[LogID] )
        )
    )
        > 0;
    TRUE ();
    FALSE ();
)

Any thoughts?

 

Thank you so much for the help

 

//David

 

@Dbjerring,

 

Remove the last semicolon after "FALSE ()".

Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@v-chuncz-msft - Thank you so much - That did the trick.

 

Have a nice day!

@Dbjerring,

 

You may use the following DAX to add a calculated column.

IsDuplicate =
IF (
    COUNTROWS (
        FILTER (
            Table1,
            Table1[UserID] = EARLIER ( Table1[UserID] )
                && Table1[LogTimeDate] = EARLIER ( Table1[LogTimeDate] )
        )
    )
        > 1,
    TRUE ()
)
Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@v-chuncz-msft

 

The DAX expression works, but for some reason both lines (also the the one without EARLIER value) gets tagged as TRUE.

So all lines with a possible duplicate gets TRUE VALUE.

 

Any thoughts?

 

The adjusted DAX calc I use:

 

IsDuplicate =
IF (
COUNTROWS (
FILTER (
Logs;
Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
&& Logs[LogTime.1] = EARLIER( Logs[LogTime.1] )
)
)
> 1;
True();FALSE()
)

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.