Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Hi Community,
We have dataset based on logs - here we would like to create a new column with a true/false value if the line is a duplicate.
We have UserID, LogTimeDate, LogTime, and number of logged lines.
The Dax query should check if there are more entries with the same date and number of logged lines, and if so, flag the line with a true/false marking.
How would we go about formulating such an query?
Inputs are much appreciated.
Good day.
/David
Solved! Go to Solution.
You could modify the formula as shown below.
IsDuplicate = IF ( COUNTROWS ( FILTER ( Logs, Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] ) && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] ) && Logs[LogID] < EARLIER ( Logs[LogID] ) ) ) > 0, TRUE (), FALSE () )
Hi @Dbjerring
Are you aware of query editor? I think you can use this tool to reach your goal.
Best,
Martin
Hi @Anonymous,
Yes, but I'm not sure how to perform an advanced query like this - any tips?
/David
Hi @Dbjerring
You go to the menu on the left where there is 3 buttons, and choose the one called "Data". You then right-click your data and choose "Edit query". Then you right-click the column-header of the column where you want to remove duplicates and choose "Remove duplicates". Let me know if this solves your problem.
I'll also gladly elaborate.
@Anonymous - I don't need to remove them, I have to tag them. And since it's based on values from two different cells, the remove duplicate function isn't enough. Thank you for your elaboration though.
Good day to you sir
@Anonymous - No, I still hope he'll reply.
Do you have any inputs?
Study the RANKX Function.
Yea I have a solution in mind. Could you create a sample dataset with 5 rows? I'd like to test it before suggesting it.
@Anonymous - The dataset:
LogID | LogCNumber | LogTime.1 | LogCount | IsDuplicate |
2 | 1234567 | 20-06-2017 | FALSE | |
3 | 1172690 | 20-06-2017 | FALSE | |
4 | 21-06-2017 | -1 | FALSE | |
5 | 1145243 | 21-06-2017 | 72 | TRUE |
6 | 1145243 | 21-06-2017 | 72 | TRUE |
7 | 1145243 | 21-06-2017 | 72 | TRUE |
8 | 1172690 | 28-06-2017 | 94 | FALSE |
9 | 1172690 | 30-06-2017 | 55 | FALSE |
10 | 1145243 | 07-07-2017 | 115 | TRUE |
11 | 1145243 | 07-07-2017 | 115 | TRUE |
12 | 297458 | 12-07-2017 | 33 | FALSE |
13 | 1061571 | 13-07-2017 | 163 | FALSE |
14 | 338873 | 13-07-2017 | 14 | FALSE |
15 | 469736 | 13-07-2017 | 9 | FALSE |
16 | 1166818 | 13-07-2017 | 148 | FALSE |
17 | 1107245 | 13-07-2017 | 24 | TRUE |
18 | 1107245 | 13-07-2017 | 24 | TRUE |
19 | 1165369 | 13-07-2017 | 51 | FALSE |
20 | 1117867 | 13-07-2017 | 22 | FALSE |
Thank you
You could modify the formula as shown below.
IsDuplicate = IF ( COUNTROWS ( FILTER ( Logs, Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] ) && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] ) && Logs[LogID] < EARLIER ( Logs[LogID] ) ) ) > 0, TRUE (), FALSE () )
Hlelo @v-chuncz-msft ,
How would you go about identifying and keeping the latest row from multiple files datasource containing duplicates?
Cheers,
Antonio Santos
@v-chuncz-msft: I get a "Too many arguments were passed to the IF function. The maximum argument count for the function is 3.":
IsDuplicate = IF ( COUNTROWS ( FILTER ( Logs; Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] ) && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] ) && Logs[LogID] < EARLIER ( Logs[LogID] ) ) ) > 0; TRUE (); FALSE (); )
Any thoughts?
Thank you so much for the help
//David
Remove the last semicolon after "FALSE ()".
You may use the following DAX to add a calculated column.
IsDuplicate = IF ( COUNTROWS ( FILTER ( Table1, Table1[UserID] = EARLIER ( Table1[UserID] ) && Table1[LogTimeDate] = EARLIER ( Table1[LogTimeDate] ) ) ) > 1, TRUE () )
The DAX expression works, but for some reason both lines (also the the one without EARLIER value) gets tagged as TRUE.
So all lines with a possible duplicate gets TRUE VALUE.
Any thoughts?
The adjusted DAX calc I use:
IsDuplicate =
IF (
COUNTROWS (
FILTER (
Logs;
Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
&& Logs[LogTime.1] = EARLIER( Logs[LogTime.1] )
)
)
> 1;
True();FALSE()
)
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
User | Count |
---|---|
94 | |
90 | |
84 | |
70 | |
49 |
User | Count |
---|---|
143 | |
121 | |
112 | |
58 | |
57 |