Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.
Hi Community,
We have dataset based on logs - here we would like to create a new column with a true/false value if the line is a duplicate.
We have UserID, LogTimeDate, LogTime, and number of logged lines.
The Dax query should check if there are more entries with the same date and number of logged lines, and if so, flag the line with a true/false marking.
How would we go about formulating such an query?
Inputs are much appreciated.
Good day.
/David
Solved! Go to Solution.
You could modify the formula as shown below.
IsDuplicate = IF ( COUNTROWS ( FILTER ( Logs, Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] ) && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] ) && Logs[LogID] < EARLIER ( Logs[LogID] ) ) ) > 0, TRUE (), FALSE () )
Hi @Dbjerring
Are you aware of query editor? I think you can use this tool to reach your goal.
Best,
Martin
Hi @Anonymous,
Yes, but I'm not sure how to perform an advanced query like this - any tips?
/David
Hi @Dbjerring
You go to the menu on the left where there is 3 buttons, and choose the one called "Data". You then right-click your data and choose "Edit query". Then you right-click the column-header of the column where you want to remove duplicates and choose "Remove duplicates". Let me know if this solves your problem.
I'll also gladly elaborate.
@Anonymous - I don't need to remove them, I have to tag them. And since it's based on values from two different cells, the remove duplicate function isn't enough. Thank you for your elaboration though.
Good day to you sir
@Anonymous - No, I still hope he'll reply.
Do you have any inputs?
Study the RANKX Function.
Yea I have a solution in mind. Could you create a sample dataset with 5 rows? I'd like to test it before suggesting it.
@Anonymous - The dataset:
LogID | LogCNumber | LogTime.1 | LogCount | IsDuplicate |
2 | 1234567 | 20-06-2017 | FALSE | |
3 | 1172690 | 20-06-2017 | FALSE | |
4 | 21-06-2017 | -1 | FALSE | |
5 | 1145243 | 21-06-2017 | 72 | TRUE |
6 | 1145243 | 21-06-2017 | 72 | TRUE |
7 | 1145243 | 21-06-2017 | 72 | TRUE |
8 | 1172690 | 28-06-2017 | 94 | FALSE |
9 | 1172690 | 30-06-2017 | 55 | FALSE |
10 | 1145243 | 07-07-2017 | 115 | TRUE |
11 | 1145243 | 07-07-2017 | 115 | TRUE |
12 | 297458 | 12-07-2017 | 33 | FALSE |
13 | 1061571 | 13-07-2017 | 163 | FALSE |
14 | 338873 | 13-07-2017 | 14 | FALSE |
15 | 469736 | 13-07-2017 | 9 | FALSE |
16 | 1166818 | 13-07-2017 | 148 | FALSE |
17 | 1107245 | 13-07-2017 | 24 | TRUE |
18 | 1107245 | 13-07-2017 | 24 | TRUE |
19 | 1165369 | 13-07-2017 | 51 | FALSE |
20 | 1117867 | 13-07-2017 | 22 | FALSE |
Thank you
You could modify the formula as shown below.
IsDuplicate = IF ( COUNTROWS ( FILTER ( Logs, Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] ) && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] ) && Logs[LogID] < EARLIER ( Logs[LogID] ) ) ) > 0, TRUE (), FALSE () )
Hlelo @v-chuncz-msft ,
How would you go about identifying and keeping the latest row from multiple files datasource containing duplicates?
Cheers,
Antonio Santos
@v-chuncz-msft: I get a "Too many arguments were passed to the IF function. The maximum argument count for the function is 3.":
IsDuplicate = IF ( COUNTROWS ( FILTER ( Logs; Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] ) && Logs[LogTime.1] = EARLIER ( Logs[LogTime.1] ) && Logs[LogID] < EARLIER ( Logs[LogID] ) ) ) > 0; TRUE (); FALSE (); )
Any thoughts?
Thank you so much for the help
//David
Remove the last semicolon after "FALSE ()".
You may use the following DAX to add a calculated column.
IsDuplicate = IF ( COUNTROWS ( FILTER ( Table1, Table1[UserID] = EARLIER ( Table1[UserID] ) && Table1[LogTimeDate] = EARLIER ( Table1[LogTimeDate] ) ) ) > 1, TRUE () )
The DAX expression works, but for some reason both lines (also the the one without EARLIER value) gets tagged as TRUE.
So all lines with a possible duplicate gets TRUE VALUE.
Any thoughts?
The adjusted DAX calc I use:
IsDuplicate =
IF (
COUNTROWS (
FILTER (
Logs;
Logs[LogCNumber] = EARLIER ( Logs[LogCNumber] )
&& Logs[LogTime.1] = EARLIER( Logs[LogTime.1] )
)
)
> 1;
True();FALSE()
)
Check out the November 2023 Power BI update to learn about new features.
Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.
Join us for a free, hands-on Microsoft workshop led by women trainers for women where you will learn how to build a Dashboard in a Day!