Solved: Common issue when writing data to warehouse from n...

IAMCS · ‎09-30-2025

I was writing a df to warehouse , where I created a column as shown in image , and in the next image you can see the schema for the column as (nullable = false) , but for rest all columns it is (nullable=true), how can I make this column also nullable = true ?

because when for backfilling this table with historical data , I am using an excel file as source with same schema and when writing historical data to same table it says:

IAMCS · ‎10-14-2025

I understood the theoritical explanation to my post , but i was looking some easy method to solve it , and below is what I craeted , a script which can change the nullability of your column.

# Use This script to change nullability of any column

from pyspark.sql.types import StructField, StructType

from pyspark.sql import SparkSession

from pyspark.sql import functions as F

# --- Assuming you have your df_final2 DataFrame initialized ---

# 1. Define the target column name

COLUMN_NAME = "One-off vs recurring"

# 2. Get the current schema

current_schema = df_final2.schema

# 3. Create the list of new fields, modifying the target column's nullability

new_fields = []

for field in current_schema.fields:

if field.name == COLUMN_NAME:

# Create a new StructField with the same properties but nullable=True

new_field = StructField(

name=field.name,

dataType=field.dataType,

nullable=False, # <--- THIS IS THE KEY CHANGE

metadata=field.metadata

)

new_fields.append(new_field)

else:

# Keep all other fields as they are

new_fields.append(field)

# 4. Create the new StructType (the updated schema)

new_schema = StructType(new_fields)

# 5. Apply the new schema to the DataFrame

# Use .withColumn to cast the existing column to its own type with the new schema definition.

df_final2 = df_final2.withColumn(

COLUMN_NAME,

F.col(COLUMN_NAME).cast(new_schema[COLUMN_NAME].dataType)

).rdd.toDF(new_schema) # Final re-mapping to ensure schema is fully updated

# 6. Verify the change (Optional)

df_final2.printSchema()

View solution in original post

v-achippa · ‎10-01-2025

Hi @IAMCS,

Thank you for reaching out to Microsoft Fabric Community.

Thank you @rohit1991 for the prompt response.

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by the user for the issue worked? or let us know if you need any further assistance.

Thanks and regards,

Anjan Kumar Chippa

v-achippa · ‎10-05-2025

Hi @IAMCS,

We wanted to kindly follow up to check if the solution provided by the user for the issue worked? or let us know if you need any further assistance.

Thanks and regards,

Anjan Kumar Chippa

v-achippa · ‎10-09-2025

Hi @IAMCS,

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by the user for the issue worked? or let us know if you need any further assistance.

Thanks and regards,

Anjan Kumar Chippa

rohit1991 · ‎09-30-2025

Hi @IAMCS

The error happens because the column you created in Spark was marked as not nullable (nullable = false), while in your Excel source the same column is nullable (nullable = true). This difference makes the schemas inconsistent, so when you try to write historical data back into the same table, Spark blocks it. To fix this, you need to make sure both schemas match. The easiest way is to recreate or cast the column in Spark with nullable = true, so it can accept empty values just like your Excel data. Once the schemas are aligned, the write will succeed without errors. Always check the schema before writing, because Spark is strict about column types and nullability. In short, your table column needs to be made nullable to match your source file and allow backfilling with no issues.

Did it work? ✔ Give a Kudo • Mark as Solution – help others too!

IAMCS · ‎09-30-2025

The column in not present in excel sheet , it is the column which I am later creating at end after reading the data from excel. So I don't think , it has something related with excel sheet.

rohit1991 · ‎09-30-2025

Hi @IAMCS

You’re correct that this isn’t about the Excel sheet. The real reason is how Spark handles new columns. When you add a column using withColumn, Spark by default marks it as not nullable (nullable = false) unless you explicitly define it differently. That’s why this specific column shows up as not nullable, while the ones coming from Excel stay as nullable = true. To fix this, you need to explicitly allow nulls when creating or casting the column. One way is to rebuild the DataFrame with a schema where that column is set as nullable = true, for example by defining a StructType with nullable=True, or by recreating the column using a cast and schema override. This ensures the new column aligns with the rest of your data, avoids schema mismatch errors, and lets you backfill or write historical data without issues.

Did it work? ✔ Give a Kudo • Mark as Solution – help others too!

IAMCS · ‎10-14-2025

I understood the theoritical explanation to my post , but i was looking some easy method to solve it , and below is what I craeted , a script which can change the nullability of your column.

# Use This script to change nullability of any column

from pyspark.sql.types import StructField, StructType

from pyspark.sql import SparkSession

from pyspark.sql import functions as F

# --- Assuming you have your df_final2 DataFrame initialized ---

# 1. Define the target column name

COLUMN_NAME = "One-off vs recurring"

# 2. Get the current schema

current_schema = df_final2.schema

# 3. Create the list of new fields, modifying the target column's nullability

new_fields = []

for field in current_schema.fields:

if field.name == COLUMN_NAME:

# Create a new StructField with the same properties but nullable=True

new_field = StructField(

name=field.name,

dataType=field.dataType,

nullable=False, # <--- THIS IS THE KEY CHANGE

metadata=field.metadata

)

new_fields.append(new_field)

else:

# Keep all other fields as they are

new_fields.append(field)

# 4. Create the new StructType (the updated schema)

new_schema = StructType(new_fields)

# 5. Apply the new schema to the DataFrame

# Use .withColumn to cast the existing column to its own type with the new schema definition.

df_final2 = df_final2.withColumn(

COLUMN_NAME,

F.col(COLUMN_NAME).cast(new_schema[COLUMN_NAME].dataType)

).rdd.toDF(new_schema) # Final re-mapping to ensure schema is fully updated

# 6. Verify the change (Optional)

df_final2.printSchema()

Common issue when writing data to warehouse from notebook

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

FabCon is coming to Atlanta

Common issue when writing data to warehouse from notebook

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026