Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
I'm using a medallion architecture system where one pipeline ingests data in the Bronze layer as parquet files and another pipeline validates data types and copies the parquet files into the Silver layer as a table.
Right now I'm quite confused about how to manage data type checks in the pipeline. It seems simple but I'm not sure about how I should be doing data type checks between a parquet file(no data types), to a PySpark data frame and then to a Fabric table.
My Silver pipeline uses a PySpark notebook to read in the parquet file as a PySpark data frame and then casts columns to certain PySpark data types(https://spark.apache.org/docs/latest/sql-ref-datatypes.html). However once the PySpark data frame is written to a Fabric table then the data types become Fabric data types(https://learn.microsoft.com/en-us/fabric/data-warehouse/data-types). So far I've been casting the columns to PySpark data types based on what they would convert to when they reach Fabric
Is this the best practice? Data type checks would rely on conversions between PySpark and Fabric which could change. Also it relies on the Fabric data type of the column when it was first copied over in Silver so if I intentionally change the data type in the notebook due to business rules thatn it would read into the Fabric table.
Is there a better way to manage data types for Silver in Fabric?
Solved! Go to Solution.
Hi @BriefStop ,
Few ways to handle type conversion,
Notebook:
1. Have the mapping sheet for the datatypes between pyspark and fabric.
2. Create table with the proper datatype required for Fabric
3. Define your schema and read the parquet files. Use casting only on the required places. or Use infer schema and read the parquet files.
Copy Activity:
1. If no transformation required then you can use copy activity and under mapping you can see Type Conversion settingsfor datetime data type .
2. For other types if you want to convert use import schema and change the data type for destination table.
Regards,
Srisakthi
Hi @BriefStop ,
Few ways to handle type conversion,
Notebook:
1. Have the mapping sheet for the datatypes between pyspark and fabric.
2. Create table with the proper datatype required for Fabric
3. Define your schema and read the parquet files. Use casting only on the required places. or Use infer schema and read the parquet files.
Copy Activity:
1. If no transformation required then you can use copy activity and under mapping you can see Type Conversion settingsfor datetime data type .
2. For other types if you want to convert use import schema and change the data type for destination table.
Regards,
Srisakthi
Hi @BriefStop
Here are some practices that may help you:
Maintain a mapping table between PySpark data types and Fabric data types. This can be used as a reference for conversion to ensure that you are converting to the right type.
Suppose you have the following data type mappings
Before writing data to the Fabric table, implement validation functions in the PySpark notebook to check that the DataFrame column matches the expected data type based on the mapping. This can help you catch any inconsistencies early.
You can write a simple validation function to check the column data type in the DataFrame. For example,
from pyspark.sql.types import StringType, IntegerType
def validate_data_types(df):
expected_types = {
'name': StringType(),
'age': IntegerType(),
'salary': FloatType()
}
for column, expected_type in expected_types.items():
actual_type = df.schema[column].dataType
if actual_type != expected_type:
raise ValueError(f"Column '{column}' has type '{actual_type}' but expected '{expected_type}'")
validate_data_types(my_dataframe)
If you need to support schema evolution in Fabric, you can use the following code to handle possible schema changes,
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("DataTypeExample").getOrCreate()
df = spark.read.parquet("path/to/parquet")
df = df.withColumn("age", df["age"].cast(StringType()))
df.write.format("fabric").mode("overwrite").save("path/to/fabric_table")
You can use a simple script to automate data quality checks and ensure that data meets expectations before being written to Fabric tables,
def check_data_quality(df):
if df.filter(df.age.isNull()).count() > 0:
raise ValueError("Data quality check failed: 'age' column contains null values.")
check_data_quality(my_dataframe)
These examples may help you understand how to manage data types in the Silver pipeline. Ensuring validation and mapping at each step can help you reduce potential errors and improve data quality.
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.
User | Count |
---|---|
2 | |
2 | |
2 | |
2 | |
2 |