Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
LeeGrantGeek
Regular Visitor

Getting odd results from 'to_timestamp()'

I am working with an open source Dataset (LA Crime Data from Kaggle). I am trying to clean up the data a little. I want a true timestamp for the Date of the reported crime. The Date (Date Rptd) and Time (TIME OCC) are text fields. I want to combine them to be a timestamp. I have removed the time that is in the date field,  all are 12:00:00 and transformed the Time field with zero padding and inserting a ':' between hours and minutes (e.g. 45 -> 00:45, 1424 -> 14:24. So all go so far. When I concat the two values into a new column I get this:

+----------------+
| TimeStamp_temp|
+----------------+
|02/20/2010 13:50|
|09/13/2010 00:45|
|08/09/2010 15:15|
|01/05/2010 01:50|
|01/03/2010 21:00|
|01/05/2010 16:50|
|01/08/2010 20:05|
|01/09/2010 21:00|
|01/09/2010 02:30|
|01/09/2010 21:00|
|01/14/2010 14:45|
|01/15/2010 20:00|
|01/15/2010 02:45|
|01/16/2010 17:45|
|01/16/2010 20:30|
|01/17/2010 17:35|
|01/23/2010 12:25|
|01/23/2010 11:00|
|01/23/2010 20:00|
|01/26/2010 18:20|
+----------------+
only showing top 20 rows

Then I call to_timestamp with 'MM/dd/YYYY HH:mm' as the format, I get this:

+-------------------+
| TimeStamp_Rptd|
+-------------------+
|2009-12-27 13:50:00|
|2009-12-27 00:45:00|
|2009-12-27 15:15:00|
|2009-12-27 01:50:00|
|2009-12-27 21:00:00|
|2009-12-27 16:50:00|
|2009-12-27 20:05:00|
|2009-12-27 21:00:00|
|2009-12-27 02:30:00|
|2009-12-27 21:00:00|
|2009-12-27 14:45:00|
|2009-12-27 20:00:00|
|2009-12-27 02:45:00|
|2009-12-27 17:45:00|
|2009-12-27 20:30:00|
|2009-12-27 17:35:00|
|2009-12-27 12:25:00|
|2009-12-27 11:00:00|
|2009-12-27 20:00:00|
|2009-12-27 18:20:00|
+-------------------+
only showing top 20 rows

All the dates are the same and note that this data starts in January 2010. So I decide to build a small data set with a Date and a Time:

Date, Time,
12/31/1961, 1147,
11/08/2019, 2032,
12/20/2021, 45

 

I run this through the code and get this:

+----------+-----------+-------------------+
| Date            |Padded_Time | New_Date|
+----------+-----------+-------------------+
|12/31/1961   | 11:47           | 1961-01-31 00:12:00|
|11/08/2019   | 20:32           | 2019-01-08 00:11:00|
|12/20/2021   | 00:45           | 2021-01-20 00:12:00|
+----------+-----------+-------------------+

Note that the new date is 11 month before the actual date and the hours/minutes are not correct. This has me baffled. Here is the code and csv data for the test:

 

#start code

from pyspark.sql.types import DateType
from pyspark.sql.functions import unix_timestamp, from_unixtime, concat, concat_ws, col, substring, to_timestamp, to_date, date_format, lit, lpad, regexp_replace
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
df_test = spark.read.format("csv").option("header","true").load("Files/newtestdata-timestamp.csv")
df_test.drop(' _c2')
#df now is a Spark DataFrame containing CSV data from "Files/newtestdata-timestamp.csv".
display(df_test)
 
df_test_time = df_test.withColumn('New_Date', to_timestamp(col('Date'), 'mm/dd/yy'))
df_test_time = df_test_time.withColumn('Padded_Time', lpad(col('Time'), 4, '0'))
df_test_time = df_test_time.withColumn('Padded_Time', regexp_replace(col('Padded_Time'), '(\\d{2})(\\d{2})', '$1:$2'))
df_test_time.select('Date', 'Padded_Time', 'New_Date').show()
#end code
 
CSV Data:
Date,Time,
12/31/1961,1147,
11/08/2019,2032,
12/20/2021,45
 
Any help is appreciated.
Thanks,
Lee
 
 

The date is still in the yyyy-mm-dd format, but the dates displayed are 11 months before the actual dates

 

 

 

4 REPLIES 4
v-cboorla-msft
Microsoft Employee
Microsoft Employee

Hi @LeeGrantGeek 

 

Thanks for using Microsoft Fabric Community.
At this time, we are reaching out to the internal team to get some help on this.
We will update you once we hear back from them.

Appreciate your patience.

 

Thanks

Hi @LeeGrantGeek 

 

Apologize for the inconvenience.

Please reach out to our support team to gain deeper insights and explore potential solutions. It's highly recommended that you reach out to our support team. Their expertise will be invaluable in suggesting the most appropriate approach.

Please go ahead and raise a support ticket to reach our support team:

https://support.fabric.microsoft.com/support

After creating a Support ticket please provide the ticket number as it would help us to track for more information.

 

Thank you.

I had opened a ticket earlier in the week and am working with the support team. Thank you for looking a this.

Hi @LeeGrantGeek 

 

Thank you for creating support ticket. Could you please share the ticket number here as it would help us to track for more information.

 

Thanks.

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.