cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Anonymous
Not applicable

Remove duplicates based on specific value

Hi Everyone

 

I have a datatable that I get through a power query from a SQL-server that consists of queue information for a robot that I am running.

The table looks something like this

 

TransactionStatusitemReferenceexceptionReasontransactionEnd
Failed w/ Bus. Exception1-54-631-9Business Rule Exception27-02-2020 01:42:20
Failed w/ App. Exception1-37-71-7Cannot find UI element27-02-2020 01:42:10
Failed w/ Bus. Exception1-13-45-5Business Rule Exception27-02-2020 01:42:00
Failed w/ Bus. Exception1-13-322-1Error in tenant number27-02-2020 01:41:50
Failed w/ App. Exception1-25-404-33Cannot find UI element27-02-2020 01:41:40
Retried1-13-45-5 27-02-2020 01:41:42
Retried1-23-139-2 27-02-2020 01:22:05
Retried1-25-404-33 27-02-2020 01:21:15
New1-25-404-33 26-02-2020 03:41:42
Successful1-37-71-7 25-02-2020 07:22:31
Deleted1-106-172-8 26-02-2020 10:30:00
Successful1-42-100-14 27-02-2020 01:40:40
Successful1-23-139-2 27-02-2020 01:28:32

 

I would like 2 things to happen.

First if the TransactionStatus is "Successful", then delete all other rows with the same "itemReference".

Second if a process does not have a "Succesful" then I only want to keep the one with the latest transaction date. Meaning delete all other rows with the same "itemReference" and only keep the one with the latest "transactionEnd".

 

Note that just because a process has been "Succesful" it can sometimes run again and fail. But if it was a success the first time and failed the second time, then I only care about the one where it succeeded. Meaning that a later fail run should also be deleted.

 

This would mean the table would be transformed into one looking like this:

TransactionStatusitemReferenceexceptionReasontransactionEnd
Failed w/ Bus. Exception1-54-631-9Business Rule Exception27-02-2020 01:42:20
Failed w/ Bus. Exception1-13-45-5Business Rule Exception27-02-2020 01:42:00
Failed w/ Bus. Exception1-13-322-1Error in tenant number27-02-2020 01:41:50
Failed w/ App. Exception1-25-404-33Cannot find UI element27-02-2020 01:41:40
Successful1-37-71-7 25-02-2020 07:22:31
Deleted1-106-172-8 26-02-2020 10:30:00
Successful1-42-100-14 27-02-2020 01:40:40
Successful1-23-139-2 27-02-2020 01:28:32

 

Any suggestions to how this can be implemented?

 

Thanks

 

2 ACCEPTED SOLUTIONS
edhans
Super User
Super User

See the attached PBIX file. This is what I returned using Power Query.

EDIT: I see for this image I inadvertently formated TransactionEnd as Date. Should have been DateTime. I've fixed in the PBIX link, but not redoing the image. You may only want dates in your model, not times as well, so format that however you like. But Times are necessary for this to work since some of your records are on the same day and it needs a time to find the true latest value.

 

2020-03-17 08_03_14-Untitled - Power Query Editor.png

Here is what I did:

  1. Made sure the source table is not set to load.
  2. Created a reference to that table and called it "Successful Only" and kept only the Successful records and the ItemReference field.
  3. Created another reference to the source and called it "Final Table." Here is where the work starts.
  4. Starting with the Final Table, I merged the Successful Only table using the Item Reference field then expanded just the TransactionStatus field, which is now TransactionStatus.1. Now I either have null or Successful.
  5. Added a new "TransactionStatus Match" field that returns true or false if the status from the original table matches the new field. = [TransactionStatus] = [TransactionStatus.1]
  6. Filtered based on the TransactionStatus.1 field and the TransactionStatus Match field: each ([TransactionStatus.1] = null or [TransactionStatus Match] = true) 
  7. I then grouped rows by the ItemReference and used the AllRows aggregation. You can see each embedded table has only those records related to the itemreference for that row.
    1. 2020-03-17 08_10_15-Untitled - Power Query Editor.png
  8. Then I filtered out to only return the record with the maximum date. This is a new column with the below formula:
    1. = Table.Max([All Rows],"transactionEnd")
  9. This returns a single record. I expanded that record and all fields except the TransactionStatus. We already have that.
  10. Removed all unnecessary columns, then changed the data types again. Using nested tables destroys the data types and you want them properly typed before you load.
  11. Load the Final Table into DAX for use.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

Hi @Anonymous  

you can group on "itemReference", select "All" in the Operations and name the group-column "ItemData".

Then add this column: 

 

if List.Contains([ItemData][TransactionStatus], "Successful") 
    then [ItemData]{[TransactionStatus = "Successful"]} 
    else Table.SelectRows([ItemData], (x) => x[transactionEnd] = List.Max([ItemData][transactionEnd])){0}

 Also see attached file

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

5 REPLIES 5
edhans
Super User
Super User

See the attached PBIX file. This is what I returned using Power Query.

EDIT: I see for this image I inadvertently formated TransactionEnd as Date. Should have been DateTime. I've fixed in the PBIX link, but not redoing the image. You may only want dates in your model, not times as well, so format that however you like. But Times are necessary for this to work since some of your records are on the same day and it needs a time to find the true latest value.

 

2020-03-17 08_03_14-Untitled - Power Query Editor.png

Here is what I did:

  1. Made sure the source table is not set to load.
  2. Created a reference to that table and called it "Successful Only" and kept only the Successful records and the ItemReference field.
  3. Created another reference to the source and called it "Final Table." Here is where the work starts.
  4. Starting with the Final Table, I merged the Successful Only table using the Item Reference field then expanded just the TransactionStatus field, which is now TransactionStatus.1. Now I either have null or Successful.
  5. Added a new "TransactionStatus Match" field that returns true or false if the status from the original table matches the new field. = [TransactionStatus] = [TransactionStatus.1]
  6. Filtered based on the TransactionStatus.1 field and the TransactionStatus Match field: each ([TransactionStatus.1] = null or [TransactionStatus Match] = true) 
  7. I then grouped rows by the ItemReference and used the AllRows aggregation. You can see each embedded table has only those records related to the itemreference for that row.
    1. 2020-03-17 08_10_15-Untitled - Power Query Editor.png
  8. Then I filtered out to only return the record with the maximum date. This is a new column with the below formula:
    1. = Table.Max([All Rows],"transactionEnd")
  9. This returns a single record. I expanded that record and all fields except the TransactionStatus. We already have that.
  10. Removed all unnecessary columns, then changed the data types again. Using nested tables destroys the data types and you want them properly typed before you load.
  11. Load the Final Table into DAX for use.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
amitchandak
Super User
Super User

Hi @Anonymous  

you can group on "itemReference", select "All" in the Operations and name the group-column "ItemData".

Then add this column: 

 

if List.Contains([ItemData][TransactionStatus], "Successful") 
    then [ItemData]{[TransactionStatus = "Successful"]} 
    else Table.SelectRows([ItemData], (x) => x[transactionEnd] = List.Max([ItemData][transactionEnd])){0}

 Also see attached file

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

@ImkeF can you explain what the following is doing?

(x) => x[transactionEnd]

 

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Hi @edhans 

it's equivalent to:

each [transactionEnd]

so referencing the "transactionEnd"-field in the currently iterated row/record.

But I cannot use this syntax sugar there, as this expression is written in an "AddColumns"-window. There it will be subject to an (outer) each already. So to to make it unambigous, I've used the "standard" function definition syntax ()=>.

 

https://docs.microsoft.com/en-us/powerquery-m/understanding-power-query-m-functions

Another practical example here: https://www.thebiccountant.com/2017/05/29/performance-tip-partition-tables-crossjoins-possible-power...

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Helpful resources

Announcements
PBI Sept Update Carousel

Power BI September 2023 Update

Take a look at the September 2023 Power BI update to learn more.

Learn Live

Learn Live: Event Series

Join Microsoft Reactor and learn from developers.

Dashboard in a day with date

Exclusive opportunity for Women!

Join us for a free, hands-on Microsoft workshop led by women trainers for women where you will learn how to build a Dashboard in a Day!

MPPC 2023 PBI Carousel

Power Platform Conference-Power BI and Fabric Sessions

Join us Oct 1 - 6 in Las Vegas for the Microsoft Power Platform Conference.

Top Solution Authors