Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Sureshmannem
Frequent Visitor

Parquet files reading into spark data frame is throwing data type error

Dear All,

 

I have a requirement to read parquet files form the directory into a data frame to prepare the data form Bronze Lakehouse to Silver Lakehouse. while reading files, it is throwing error message 

 org.apache.spark.SparkException: Parquet column cannot be converted in file

filepath/SRV0001148_20250819065539974.parquet. Column: [Syncxxx.xxx:ApplicationArea.xxx:CreationDateTime], Expected: string, Found: INT96.

 

#1) sample script:

from pyspark.sql import SparkSession
from pyspark.sql.types import *
source_df = spark.read.parquet("filepath/SRV0001148_*.parquet")
source_df.show()
 

#2) sample script:

from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
schema = StructType([
    StructField("column1", StringType(), True),
    StructField("column2", StringType(), True)
])
 
source_df = spark.read.schema(schema).parquet("filepath/SRV0001148_*.parquet")
source_df.show()
 
some of them are working, I was looking for approach to load data with every attribute to be considered as string, it's not working. Hence requesting support. please let us know if any one is experiencing the similar issue, please share your insight. it would be great help. Thanks in advance.
 
Regards,
Suresh
 
1 ACCEPTED SOLUTION

Dear Community,

Thank you for your continued support.

I’m happy to share that I’ve resolved the issue I was facing, and I’d like to outline the approach I followed in case it helps others encountering similar challenges.

Initial Observation

The issue occurred when I attempted to load over 50 Parquet files into a single PySpark DataFrame using a wildcard path. PySpark inferred the schema from the data in each file, but inconsistencies arose—some files interpreted a particular attribute as an integer, while others treated the same attribute as a string.

This led to data type mismatch errors during the read operation.

Testing

To investigate further, I loaded each file individually into a DataFrame. This worked as expected, confirming that the wildcard-based bulk load was failing due to schema inference conflicts across files.

Solution

I modified my script to iterate through each file individually, applying the full processing logic per file. This approach bypasses the schema inference conflict and successfully loads and processes all files.

 

View solution in original post

3 REPLIES 3
v-ssriganesh
Community Support
Community Support

Hello @Sureshmannem,
Thank you for reaching out to the Microsoft Fabric Community Forum.

I have reproduced your scenario in a Fabric Notebook, and I got the expected results. Below I’ll share the steps, the code I used and screenshots of the outputs for clarity.

  • Created a DataFrame with sample data
from datetime import datetime
from pyspark.sql import Row

data = [

    Row(ID="1", Name="Ganesh", CreationDateTime=datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]),

    Row(ID="2", Name="Ravi",   CreationDateTime=datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")[:-3])

]

df = spark.createDataFrame(data)
df.printSchema()
df.show(truncate=False)

 

Output (Screenshot 1 – Schema & Screenshot 2 – Data):

vssriganesh_0-1755686626562.png

 

  • Saved DataFrame as a Lakehouse table
df.write.mode("overwrite").saveAsTable("DemoTable")

 

  • Verified the table in catalog
spark.catalog.listTables("default")

 

Output (Screenshot 3 – Table Catalog):

vssriganesh_1-1755686691368.png

 

With this approach, the table DemoTable was successfully created in the Lakehouse with the expected schema, and data was retrieved correctly with CreationDateTime as a string. It worked in my case because I explicitly formatted the creationdatetime column as a string before saving to the Lakehouse table. By default, Spark can sometimes infer a different data type (like timestamp) depending on how the value is created. Converting it to string ensures consistency and prevents schema mismatch issues.

 

Best Regards,
Ganesh singamshetty.



Hi Ganesh,

 

Thanks for your kind support and explanation.

My scneario is slightly different, I am sharing the sample script with masking 

 

I have a scneario to read parquet files stored in lakehouse into a data frame to prepare my data, the issue is happening at very first step itself

source_df = spark.read.parquet("abfss://xxxxxx@onelake.dfs.fabric.microsoft.com/xxxx.Lakehouse/Files/xxxx/SRV0001148_*.parquet")

 

error: org.apache.spark.SparkException: Parquet column cannot be converted in file xxxxxxx Expected: string, Found: INT96.

 

I have tried by defining my schema explicitly, spark is still ignoring and considering only from parquet files. 

Dear Community,

Thank you for your continued support.

I’m happy to share that I’ve resolved the issue I was facing, and I’d like to outline the approach I followed in case it helps others encountering similar challenges.

Initial Observation

The issue occurred when I attempted to load over 50 Parquet files into a single PySpark DataFrame using a wildcard path. PySpark inferred the schema from the data in each file, but inconsistencies arose—some files interpreted a particular attribute as an integer, while others treated the same attribute as a string.

This led to data type mismatch errors during the read operation.

Testing

To investigate further, I loaded each file individually into a DataFrame. This worked as expected, confirming that the wildcard-based bulk load was failing due to schema inference conflicts across files.

Solution

I modified my script to iterate through each file individually, applying the full processing logic per file. This approach bypasses the schema inference conflict and successfully loads and processes all files.

 

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.