The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi All,
I have been getting the following error when reading some parquet files in a pyspark notebook.
Illegal Parquet type: INT64 (TIME(NANOS,true))
The parquet files are loaded by a copy activity in a pipeline, contained in a forEach loop, so its not easy to pull them out and manually map to say a string for later conversion.
I have done a bit of searching and it seems this was a known spark issue some time ago that was apparently rectified in Spark 3.2 ([SPARK-40819] Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of aut...)
I have tried running the below in first cell of notebook.
Solved! Go to Solution.
Just an update...
I have successfully read the offending files using pandas and the fastparquet engine (after setting up a new environment to load that library).
Once read into a pandas frame, I convert to a spark df in order to continue with the rest of the notebook without having to refactor. I found I do need to run the spark.conf.set() calls above in order to write to delta tables (obviously parquet underneath).
Not elegant but is a workaround unless anyone has something else?
Hi guys! I don't think this issue is solved. We faced the same problem too, and reading into pandas df is not an option for us.
Is there any response from the fabric team?
Not that I've seen. Unfortunately forcing a read with Pandas was the only solution I came up with.
Just an update...
I have successfully read the offending files using pandas and the fastparquet engine (after setting up a new environment to load that library).
Once read into a pandas frame, I convert to a spark df in order to continue with the rest of the notebook without having to refactor. I found I do need to run the spark.conf.set() calls above in order to write to delta tables (obviously parquet underneath).
Not elegant but is a workaround unless anyone has something else?
HI @Kesahli,
I'm glad to hear you find the workaround, did you mind to share these codes here? I think they will help for others who faced the simalri scenario.
Regards,
Xiaoxin Sheng
User | Count |
---|---|
6 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
20 | |
17 | |
6 | |
5 | |
4 |