Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
lroush13189
Frequent Visitor

Removing duplicates based on time response criteria

I have a report that has me stumped. I am looking to filter out duplicate records but I need to keep the record that has the quickest response time by the driver (Duration). Due to the large file size I have been performing mostly out of Query Editor and in Visual. I can provide a basic example below of how the data looks but I am unable to upload the file. 

 

TrackingNumber     CallTime         ArrivalTime        Duration

21-00794                  2:44:00 AM     02:54:47AM       00:10:47

21-11000                  9:14:00 AM     09:23:47 AM      00:09:47

21-11000                  9:14:00 AM     09:25:11 AM      00:11:11

21-15423                  6:56:00 AM     07:03:24 AM      00:07:24

21-4311                    5:40:00 PM     05:51:09 PM       00:11:09

21-4311                    5:40:00 PM     05:47:32 PM       00:07:32

21-4311                    5:40:00 PM     05:53:18 PM       00:13:18

 

I have put the duplicate runs in red that I need to keep and the other duplicate calls (TrackingNumber) will need to be filtered out. Not all calls are duplicated as shown above. If someone has a solution please let me know as I am stuck. 

2 ACCEPTED SOLUTIONS
v-jingzhang
Community Support
Community Support

Hi @lroush13189 , 

You could add these two steps into your query. Modify column names and step names per your need. The first step is to add a new column "MinDurationRow" containing the row with the minimum duration value for each tracking number. The second step is to expand the result of previous step.

Custom1 = Table.Group(#"Added Custom", {"TrackingNumber"}, {{"MinDurationRow", each Table.FirstN(Table.Sort(_,{{"Duration", Order.Ascending}}),1), type any }}),
#"Expanded Count" = Table.ExpandTableColumn(Custom1, "MinDurationRow", {"CallTime", "ArrivalTime", "Duration"}, {"CallTime", "ArrivalTime", "Duration"})

022302.jpg

Kindly let me know if this helps.
Community Support Team _ Jing Zhang
If this post helps, please consider Accept it as the solution to help other members find it.

View solution in original post

Hi @lroush13189 You can achieve this by using the UI interface. Please follow below steps. Here is a PBIX file for your reference.

1. Right click the last step in Applied Steps and select Insert Step After option, this will add a new Custom1 step.

2. In Custom1 step, copy and paste below codes into the formula bar above the table. In my example, "Changed Type" is its previous step's name. Change it according to your previous step. "MinDurationRow" is the name for a new column.

3. Click Expand icon on the new "MinDurationRow" column, select the columns you need and click OK. You will get the rows you want.

= Table.Group(#"Changed Type", {"TrackingNumber"}, {{"MinDurationRow", each Table.FirstN(Table.Sort(_,{{"Duration", Order.Ascending}}),1), type any }})

022401.jpg

022402.jpg022403.jpg

Regards,
Community Support Team _ Jing Zhang
If this post helps, please Accept it as the solution to help other members find it.

View solution in original post

10 REPLIES 10
lbendlin
Super User
Super User

You create a blank query, then go to Advanced Editor, and replace the entire contents with what I posted for the second query.

v-jingzhang
Community Support
Community Support

Hi @lroush13189 , 

You could add these two steps into your query. Modify column names and step names per your need. The first step is to add a new column "MinDurationRow" containing the row with the minimum duration value for each tracking number. The second step is to expand the result of previous step.

Custom1 = Table.Group(#"Added Custom", {"TrackingNumber"}, {{"MinDurationRow", each Table.FirstN(Table.Sort(_,{{"Duration", Order.Ascending}}),1), type any }}),
#"Expanded Count" = Table.ExpandTableColumn(Custom1, "MinDurationRow", {"CallTime", "ArrivalTime", "Duration"}, {"CallTime", "ArrivalTime", "Duration"})

022302.jpg

Kindly let me know if this helps.
Community Support Team _ Jing Zhang
If this post helps, please consider Accept it as the solution to help other members find it.

Sorry, I had to step away from this for a few days but am back at it now. I am newer to PowerBi so I am not quite up to speed with everything that you have listed. I tried copying and pasting the custom column into my project and changing the Column Names as I needed to. I received an error stating that the "syntax for the table is incorrect". I am not the best at figuring out how to fix these issues when I am not sure exactly what I am doing, to begin with 🙂 If you have anything to help me out that would be great. 

Hi @lroush13189 You can achieve this by using the UI interface. Please follow below steps. Here is a PBIX file for your reference.

1. Right click the last step in Applied Steps and select Insert Step After option, this will add a new Custom1 step.

2. In Custom1 step, copy and paste below codes into the formula bar above the table. In my example, "Changed Type" is its previous step's name. Change it according to your previous step. "MinDurationRow" is the name for a new column.

3. Click Expand icon on the new "MinDurationRow" column, select the columns you need and click OK. You will get the rows you want.

= Table.Group(#"Changed Type", {"TrackingNumber"}, {{"MinDurationRow", each Table.FirstN(Table.Sort(_,{{"Duration", Order.Ascending}}),1), type any }})

022401.jpg

022402.jpg022403.jpg

Regards,
Community Support Team _ Jing Zhang
If this post helps, please Accept it as the solution to help other members find it.

this seems to work at first glance but when I hit save and apply, the data disappears on the desktop table. Is this some sort of error on my part or an issue with the program? I have tried several ways to fix this but have not found a solution.

What is the connection mode of this query? If it is DQ, you will not see data in Desktop data view by design. If it is Import, is there any error message in the Load dialogue when it is applying the changes? If there is no error message and it seems applied successfully, how about clicking Refresh button next to Transform data? And can you see the column headers in data view?

 

Sometimes I don't see the data after applying the changes in Query Editor, but I can see the column headers. If I click Refresh, or click either column header and wait for a while, it will show the data. I guess the data size or the computer's memory might be a potential cause. 

I was able to get this to work! Now my next question is how do I take the data from there and use it to get averages and totals of the duration for display on the display side? I am looking to show how long it took for things to arrive and the average time by year.

Hi @lroush13189 

You can use measures to calculate the average and total of durations. Use functions like AVERAGE(), SUM() in measures, and populate the charts/visuals with measures.

 

One thing to pay attention to is that there is no duration data type in report view (display side). If you use Duration type in Query Editor, it’s converted into a Decimal Number Type (with day as unit) when loaded into the model. As a Decimal Number type, you can easily use them to calculate the average and total values. Or you can convert the column into values with minute or second as unit before loading them into model.

 

After calculating the average and total of durations, you can format the result per your need. Here are some exmaples for your reference:

To get the duration in the right format

How can I calculate average duration by period and format the result as "X Days, Y Hours, Z Minutes"

lbendlin
Super User
Super User

You didn't specify your expected output.  Here's a minimalistic version that you can adjust as needed.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jY5BCsMwDAS/UnxOYSVLcb23PKDQe/D/v1HZpHFuKfgyi4bxvieVJ1CqpSUpzQg8tncAlG60chAoCEptGYoIgNgr5aJUar4qMfyhOEVmReKdipvm2Ff6OpVCZKrNSgn6KZZDX5LT0I3PuHG6xGcOGhHUG8MKs54GOt01MuU1G51Sa18=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [TrackingNumber = _t, CallTime = _t, ArrivalTime = _t, Duration = _t]),
    #"Sorted Rows" = Table.Sort(Source,{{"TrackingNumber", Order.Ascending},{"Duration", Order.Ascending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"TrackingNumber"}, {{"mindur", each List.Min([Duration]), type nullable text}})
in
    #"Grouped Rows"

 

Here is a version that preserves the other columns too.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jY5BCsMwDAS/UnxOYSVLcb23PKDQe/D/v1HZpHFuKfgyi4bxvieVJ1CqpSUpzQg8tncAlG60chAoCEptGYoIgNgr5aJUar4qMfyhOEVmReKdipvm2Ff6OpVCZKrNSgn6KZZDX5LT0I3PuHG6xGcOGhHUG8MKs54GOt01MuU1G51Sa18=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [TrackingNumber = _t, CallTime = _t, ArrivalTime = _t, Duration = _t]),
    #"Sorted Rows" = Table.Sort(Source,{{"TrackingNumber", Order.Ascending},{"Duration", Order.Ascending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"TrackingNumber"}, {{"mindur", each _, type table [TrackingNumber=nullable text, CallTime=nullable text, ArrivalTime=nullable text, Duration=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "mindurrow", each [mindur]{0}),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"TrackingNumber", "mindurrow"}),
    #"Expanded mindurrow" = Table.ExpandRecordColumn(#"Removed Other Columns", "mindurrow", {"CallTime", "ArrivalTime", "Duration"}, {"CallTime", "ArrivalTime", "Duration"})
in
    #"Expanded mindurrow"

 

I am not sure what you mean by expected output, as I am fairly new to this, but I am looking to remove just the duplicate data that has a higher response time. I tried to implement the version that preserves the other columns and it states that the "syntax for the table is incorrect". I added it as a column and then changed the column names, as I had to change them for the example. 

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.