Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the FabCon + SQLCon recap series. Up next: Power BI, Real-Time Intelligence, IQ and AI, and Data Factory take center stage. All sessions are available on-demand after the live show. Register now

Reply
Aizhanz
Frequent Visitor

Remove duplicates while do time rounding | clean time data

Hello, 

I have an issue with cleaning my time data. I have duplicates as I highlighted in the screenshot. Ex: rider_id=25642, has duplicate with time start:  row5: 24/08/2020 7:31:11 AM and row2: 23/08/2020 7:33:10 AM. How to clean such data? 

 

I tried to use "Start of Hour", and "End of Hour" to round the time and then to use "remove duplicates". But it doesn't work (I suppose because PBI doesn't see that I rounded time).  

Aizhanz_0-1651823085683.jpeg

 

rider_idactual_start_atactual_end_at
256422020-08-23 03:10:11 UTC2020-08-23 06:30:10 UTC
256422020-08-23 07:33:10 UTC2020-08-23 11:30:11 UTC
489722020-08-24 01:30:10 UTC2020-08-24 04:38:11 UTC
256422020-08-24 03:17:10 UTC2020-08-24 06:30:10 UTC
256422020-08-24 07:31:11 UTC2020-08-24 11:30:10 UTC

 

1 ACCEPTED SOLUTION
sanalytics
Super User
Super User

@Aizhanz 

 

The given data can not be unique as it is in different date.If you convert the whole data based on time.then below are the PQ steps

let
Source = Source,
#"Replaced Value" = Table.ReplaceValue(Source,"UTC","",Replacer.ReplaceText,{"actual_start_at"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","UTC","",Replacer.ReplaceText,{"actual_end_at"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value1", "actual_start_at", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"actual_start_at.1", "actual_start_at.2", "actual_start_at.3"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"rider_id", Int64.Type}, {"actual_start_at.1", type date}, {"actual_start_at.2", type time}, {"actual_start_at.3", type text}}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "actual_end_at", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"actual_end_at.1", "actual_end_at.2", "actual_end_at.3"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"actual_end_at.1", type date}, {"actual_end_at.2", type time}, {"actual_end_at.3", type text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type1",{"rider_id", "actual_start_at.2", "actual_end_at.2"}),
#"Calculated Start of Hour" = Table.TransformColumns(#"Removed Other Columns",{{"actual_start_at.2", Time.StartOfHour, type time}}),
#"Calculated End of Hour" = Table.TransformColumns(#"Calculated Start of Hour",{{"actual_end_at.2", Time.EndOfHour, type time}}),
#"Removed Duplicates" = Table.Distinct(#"Calculated End of Hour")
in
#"Removed Duplicates"

 

Hope this will help you.

 

Regards

Sanalytics

 

View solution in original post

2 REPLIES 2
sanalytics
Super User
Super User

@Aizhanz 

 

The given data can not be unique as it is in different date.If you convert the whole data based on time.then below are the PQ steps

let
Source = Source,
#"Replaced Value" = Table.ReplaceValue(Source,"UTC","",Replacer.ReplaceText,{"actual_start_at"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","UTC","",Replacer.ReplaceText,{"actual_end_at"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value1", "actual_start_at", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"actual_start_at.1", "actual_start_at.2", "actual_start_at.3"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"rider_id", Int64.Type}, {"actual_start_at.1", type date}, {"actual_start_at.2", type time}, {"actual_start_at.3", type text}}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "actual_end_at", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"actual_end_at.1", "actual_end_at.2", "actual_end_at.3"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"actual_end_at.1", type date}, {"actual_end_at.2", type time}, {"actual_end_at.3", type text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type1",{"rider_id", "actual_start_at.2", "actual_end_at.2"}),
#"Calculated Start of Hour" = Table.TransformColumns(#"Removed Other Columns",{{"actual_start_at.2", Time.StartOfHour, type time}}),
#"Calculated End of Hour" = Table.TransformColumns(#"Calculated Start of Hour",{{"actual_end_at.2", Time.EndOfHour, type time}}),
#"Removed Duplicates" = Table.Distinct(#"Calculated End of Hour")
in
#"Removed Duplicates"

 

Hope this will help you.

 

Regards

Sanalytics

 

Vijay_A_Verma
Most Valuable Professional
Most Valuable Professional

Actually, your dates are different in Start / End of Hours. First one is 23-8-20 and second one is 24-8-20. Hence, this will not be treated as duplicates record. Hence, PQ is behaving in the right manner. But if you want to extract time only to check for duplicates, use below formulas for Start of Hour and End of Hour columns

 

= Time.From(Time.StartOfHour([actual_start_at]))
= Time.From(Time.EndOfHour([actual_end_at]))

 

 

Helpful resources

Announcements
April Power BI Update Carousel

Power BI Monthly Update - April 2026

Check out the April 2026 Power BI update to learn about new features.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.

FabCon and SQLCon Highlights Carousel

FabCon &SQLCon Highlights

Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.