Get certified for free when you join Fabric Data Days 2026 and dive into Fabric, Power BI, SQL, AI, and other essential data skills.
Join nowTry your skills in the Power BI Dataviz World Championship! Round one ends June 26. Join now
Hello,
I have an issue with cleaning my time data. I have duplicates as I highlighted in the screenshot. Ex: rider_id=25642, has duplicate with time start: row5: 24/08/2020 7:31:11 AM and row2: 23/08/2020 7:33:10 AM. How to clean such data?
I tried to use "Start of Hour", and "End of Hour" to round the time and then to use "remove duplicates". But it doesn't work (I suppose because PBI doesn't see that I rounded time).
| rider_id | actual_start_at | actual_end_at |
| 25642 | 2020-08-23 03:10:11 UTC | 2020-08-23 06:30:10 UTC |
| 25642 | 2020-08-23 07:33:10 UTC | 2020-08-23 11:30:11 UTC |
| 48972 | 2020-08-24 01:30:10 UTC | 2020-08-24 04:38:11 UTC |
| 25642 | 2020-08-24 03:17:10 UTC | 2020-08-24 06:30:10 UTC |
| 25642 | 2020-08-24 07:31:11 UTC | 2020-08-24 11:30:10 UTC |
Solved! Go to Solution.
The given data can not be unique as it is in different date.If you convert the whole data based on time.then below are the PQ steps
let
Source = Source,
#"Replaced Value" = Table.ReplaceValue(Source,"UTC","",Replacer.ReplaceText,{"actual_start_at"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","UTC","",Replacer.ReplaceText,{"actual_end_at"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value1", "actual_start_at", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"actual_start_at.1", "actual_start_at.2", "actual_start_at.3"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"rider_id", Int64.Type}, {"actual_start_at.1", type date}, {"actual_start_at.2", type time}, {"actual_start_at.3", type text}}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "actual_end_at", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"actual_end_at.1", "actual_end_at.2", "actual_end_at.3"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"actual_end_at.1", type date}, {"actual_end_at.2", type time}, {"actual_end_at.3", type text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type1",{"rider_id", "actual_start_at.2", "actual_end_at.2"}),
#"Calculated Start of Hour" = Table.TransformColumns(#"Removed Other Columns",{{"actual_start_at.2", Time.StartOfHour, type time}}),
#"Calculated End of Hour" = Table.TransformColumns(#"Calculated Start of Hour",{{"actual_end_at.2", Time.EndOfHour, type time}}),
#"Removed Duplicates" = Table.Distinct(#"Calculated End of Hour")
in
#"Removed Duplicates"
Hope this will help you.
Regards
Sanalytics
The given data can not be unique as it is in different date.If you convert the whole data based on time.then below are the PQ steps
let
Source = Source,
#"Replaced Value" = Table.ReplaceValue(Source,"UTC","",Replacer.ReplaceText,{"actual_start_at"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","UTC","",Replacer.ReplaceText,{"actual_end_at"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value1", "actual_start_at", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"actual_start_at.1", "actual_start_at.2", "actual_start_at.3"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"rider_id", Int64.Type}, {"actual_start_at.1", type date}, {"actual_start_at.2", type time}, {"actual_start_at.3", type text}}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "actual_end_at", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"actual_end_at.1", "actual_end_at.2", "actual_end_at.3"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"actual_end_at.1", type date}, {"actual_end_at.2", type time}, {"actual_end_at.3", type text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type1",{"rider_id", "actual_start_at.2", "actual_end_at.2"}),
#"Calculated Start of Hour" = Table.TransformColumns(#"Removed Other Columns",{{"actual_start_at.2", Time.StartOfHour, type time}}),
#"Calculated End of Hour" = Table.TransformColumns(#"Calculated Start of Hour",{{"actual_end_at.2", Time.EndOfHour, type time}}),
#"Removed Duplicates" = Table.Distinct(#"Calculated End of Hour")
in
#"Removed Duplicates"
Hope this will help you.
Regards
Sanalytics
Actually, your dates are different in Start / End of Hours. First one is 23-8-20 and second one is 24-8-20. Hence, this will not be treated as duplicates record. Hence, PQ is behaving in the right manner. But if you want to extract time only to check for duplicates, use below formulas for Start of Hour and End of Hour columns
= Time.From(Time.StartOfHour([actual_start_at]))
= Time.From(Time.EndOfHour([actual_end_at]))
Don't miss out on Data Days, June 15 through August 7. Learn Fabric, Power BI, SQL, AI and more.
Check out the May 2026 Power BI update to learn about new features.
| User | Count |
|---|---|
| 22 | |
| 22 | |
| 21 | |
| 20 | |
| 12 |
| User | Count |
|---|---|
| 58 | |
| 55 | |
| 41 | |
| 36 | |
| 34 |