Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

View all the Fabric Data Days sessions on demand. View schedule

Reply
IAMCS
Helper I
Helper I

Common issue when writing data to warehouse from notebook

I was writing a df to warehouse , where I created a column as shown in image , and in the next image you can see the schema for the column as (nullable = false) , but for rest all columns it is (nullable=true), how can I make this column also nullable = true ?

IAMCS_0-1759218325751.png

IAMCS_1-1759218537935.png

because when for backfilling this table with historical data , I am using an excel file as source with same schema and when writing historical data to same table it says:

IAMCS_2-1759218664219.png

 

 

1 ACCEPTED SOLUTION

I understood the theoritical explanation to my post , but i was looking some easy method to solve it , and below is what I craeted , a script which can change the nullability of your column.

# Use This script to change nullability of any column

from pyspark.sql.types import StructField, StructType
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

# --- Assuming you have your df_final2 DataFrame initialized ---

# 1. Define the target column name
COLUMN_NAME = "One-off vs recurring"

# 2. Get the current schema
current_schema = df_final2.schema

# 3. Create the list of new fields, modifying the target column's nullability
new_fields = []
for field in current_schema.fields:
if field.name == COLUMN_NAME:
# Create a new StructField with the same properties but nullable=True
new_field = StructField(
name=field.name,
dataType=field.dataType,
nullable=False, # <--- THIS IS THE KEY CHANGE
metadata=field.metadata
)
new_fields.append(new_field)
else:
# Keep all other fields as they are
new_fields.append(field)

# 4. Create the new StructType (the updated schema)
new_schema = StructType(new_fields)

# 5. Apply the new schema to the DataFrame
# Use .withColumn to cast the existing column to its own type with the new schema definition.
df_final2 = df_final2.withColumn(
COLUMN_NAME,
F.col(COLUMN_NAME).cast(new_schema[COLUMN_NAME].dataType)
).rdd.toDF(new_schema) # Final re-mapping to ensure schema is fully updated

# 6. Verify the change (Optional)
df_final2.printSchema()

View solution in original post

7 REPLIES 7
v-achippa
Community Support
Community Support

Hi @IAMCS,

 

Thank you for reaching out to Microsoft Fabric Community.

 

Thank you @rohit1991 for the prompt response. 

 

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by the user for the issue worked? or let us know if you need any further assistance.

 

Thanks and regards,

Anjan Kumar Chippa

Hi @IAMCS,

 

We wanted to kindly follow up to check if the solution provided by the user for the issue worked? or let us know if you need any further assistance.

 

Thanks and regards,

Anjan Kumar Chippa

Hi @IAMCS,

 

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by the user for the issue worked? or let us know if you need any further assistance.

 

Thanks and regards,

Anjan Kumar Chippa

rohit1991
Super User
Super User

Hi @IAMCS 

 

The error happens because the column you created in Spark was marked as not nullable (nullable = false), while in your Excel source the same column is nullable (nullable = true). This difference makes the schemas inconsistent, so when you try to write historical data back into the same table, Spark blocks it. To fix this, you need to make sure both schemas match. The easiest way is to recreate or cast the column in Spark with nullable = true, so it can accept empty values just like your Excel data. Once the schemas are aligned, the write will succeed without errors. Always check the schema before writing, because Spark is strict about column types and nullability. In short, your table column needs to be made nullable to match your source file and allow backfilling with no issues.


Did it work? ✔ Give a Kudo • Mark as Solution – help others too!

The column in not present in excel sheet , it is the column which I am later creating at end after reading the data from excel. So I don't think , it has something  related with excel sheet.

Hi @IAMCS 

 

You’re correct that this isn’t about the Excel sheet. The real reason is how Spark handles new columns. When you add a column using withColumn, Spark by default marks it as not nullable (nullable = false) unless you explicitly define it differently. That’s why this specific column shows up as not nullable, while the ones coming from Excel stay as nullable = true. To fix this, you need to explicitly allow nulls when creating or casting the column. One way is to rebuild the DataFrame with a schema where that column is set as nullable = true, for example by defining a StructType with nullable=True, or by recreating the column using a cast and schema override. This ensures the new column aligns with the rest of your data, avoids schema mismatch errors, and lets you backfill or write historical data without issues.


Did it work? ✔ Give a Kudo • Mark as Solution – help others too!

I understood the theoritical explanation to my post , but i was looking some easy method to solve it , and below is what I craeted , a script which can change the nullability of your column.

# Use This script to change nullability of any column

from pyspark.sql.types import StructField, StructType
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

# --- Assuming you have your df_final2 DataFrame initialized ---

# 1. Define the target column name
COLUMN_NAME = "One-off vs recurring"

# 2. Get the current schema
current_schema = df_final2.schema

# 3. Create the list of new fields, modifying the target column's nullability
new_fields = []
for field in current_schema.fields:
if field.name == COLUMN_NAME:
# Create a new StructField with the same properties but nullable=True
new_field = StructField(
name=field.name,
dataType=field.dataType,
nullable=False, # <--- THIS IS THE KEY CHANGE
metadata=field.metadata
)
new_fields.append(new_field)
else:
# Keep all other fields as they are
new_fields.append(field)

# 4. Create the new StructType (the updated schema)
new_schema = StructType(new_fields)

# 5. Apply the new schema to the DataFrame
# Use .withColumn to cast the existing column to its own type with the new schema definition.
df_final2 = df_final2.withColumn(
COLUMN_NAME,
F.col(COLUMN_NAME).cast(new_schema[COLUMN_NAME].dataType)
).rdd.toDF(new_schema) # Final re-mapping to ensure schema is fully updated

# 6. Verify the change (Optional)
df_final2.printSchema()

Helpful resources

Announcements
November Fabric Update Carousel

Fabric Monthly Update - November 2025

Check out the November 2025 Fabric update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.