Spark Connector for Microsoft Fabric Warehouse Mer...

fabconmvp · ‎05-18-2025

Hello everyone,

I recently implemented the Fabric Spark Connector as described in this community blog post:

https://community.fabric.microsoft.com/t5/Data-Engineering-Community-Blog/Spark-Connector-for-Fabric...

My architecture follows the classic Silver Lakehouse(Test workspace) → Gold Warehouse(Production workspace) pattern. An automated ETL job runs daily and performs as expected—except when the schema in the Silver layer changes.

⸻

Issue #1: Schema evolution from Lakehouse to Warehouse

option("mergeSchema", "true") works for Delta tables in the Lakehouse but is not applied to Warehouse tables. Consequently, whenever a new column is added in the Silver Lakehouse, the write to the Gold Warehouse fails unless I manually run the gold warehouse:

ALTER TABLE <table> ADD <column> <datatype>;

Because Spark notebooks cannot execute T-SQL directly against a Warehouse, this step becomes manual and error-prone.(Execute SQL)

Issue #2: Automating the ALTER TABLE step

To eliminate manual maintenance, I am considering a stored procedure called from a Fabric Pipeline. The idea is:

1. The notebook compares the DataFrame schema with INFORMATION_SCHEMA.COLUMNS to identify missing columns.

2. It passes the list of missing columns to a pipeline parameter.

3. The pipeline executes a stored procedure (or SQL Script) that iterates through the list and issues the necessary ALTER TABLE commands.(However, with Warehouses only supports current workspaces. I use different workspaces!)

I have not found documentation on how to pass notebook output as a parameter into a pipeline activity. If anyone has implemented a similar pattern—or can point me to relevant guidance—your input would be greatly appreciated.

fabconmvp · ‎05-19-2025

Hello everyone,

I recently implemented the Fabric Spark Connector as described in this community blog post:

https://community.fabric.microsoft.com/t5/Data-Engineering-Community-Blog/Spark-Connector-for-Fabric...

My architecture follows the classic Silver Lakehouse(Test workspace) → Gold Warehouse(Production workspace) pattern. An automated ETL job runs daily and performs as expected—except when the schema in the Silver layer changes.

Issue #1: Schema evolution from Lakehouse to Warehouse

option("mergeSchema", "true") works for Delta tables in the Lakehouse but is not applied to Warehouse tables. Consequently, whenever a new column is added in the Silver Lakehouse, the write to the Gold Warehouse fails unless I manually run the gold warehouse:

ALTER TABLE <table> ADD <column> <datatype>;

Because Spark notebooks cannot execute T-SQL directly against a Warehouse, this step becomes manual and error-prone.(Execute SQL)

Issue #2: Automating the ALTER TABLE step

To eliminate manual maintenance, I am considering a stored procedure called from a Fabric Pipeline. The idea is:

1. The notebook compares the DataFrame schema with INFORMATION_SCHEMA.COLUMNS to identify missing columns.

2. It passes the list of missing columns to a pipeline parameter.

3. The pipeline executes a stored procedure (or SQL Script) that iterates through the list and issues the necessary ALTER TABLE commands.(However, with Warehouses only supports current workspaces. I use different workspaces!)

I have not found documentation on how to pass notebook output as a parameter into a pipeline activity. If anyone has implemented a similar pattern—or can point me to relevant guidance—your input would be greatly appreciated.

v-dineshya · ‎05-19-2025

Hi @fabconmvp ,

Thank you for reaching out to the Microsoft Community Forum.

Issue 1: mergeSchema not working for Warehouse

Option("mergeSchema", "true") is a Delta Lake feature and only applies within Lakehouse environments (which are backed by Delta). Microsoft Fabric Warehouses are SQL-based, so schema evolution must be handled with explicit T-SQL ALTER TABLE statements.

Solution: Because Spark can’t execute SQL directly against a Fabric Warehouse (and cross-workspace execution is limited), you need an external orchestrator (like Fabric Pipelines or Azure Data Factory) to bridge that gap.

Issue 2: Automating ALTER TABLE via Fabric Pipeline

You have outlined a promising approach. Let me break it down with more detail and suggestions:

Please follow below steps.

1. Compare Schemas in Notebook

Inside your Fabric Spark notebook (Lakehouse workspace): Read the Silver Delta table into a DataFrame. Use the Warehouse's INFORMATION_SCHEMA.COLUMNS via the SQL Analytics endpoint (via JDBC) or REST API to get the current schema of the Warehouse table. Compare the two schemas to detect new columns.

Example schema comparison logic:

df_silver = spark.read.format("delta").load("/path/to/silver/table")
silver_schema = {field.name: field.dataType.simpleString() for field in df_silver.schema}

# Assuming you pull warehouse schema into a DataFrame `df_gold_schema`
warehouse_schema = {row['column_name']: row['data_type'] for row in df_gold_schema.collect()}

# Identify new columns
new_columns = [
(name, dtype) for name, dtype in silver_schema.items()
if name not in warehouse_schema
]

2. Pass List to Pipeline

The challenge here is how to get this list into a Fabric Pipeline parameter.

Solution: Write the list of new columns to a Delta/Parquet/CSV file in OneLake. Use a Pipeline activity to read that file and pass its contents to a custom parameter in a stored procedure. Alternatively, write it to a SQL table in the Lakehouse that acts as a staging config.

3. Execute Stored Procedure in Warehouse

In the target Gold Warehouse, use a Fabric Pipeline SQL activity to: Read the list of new columns (from a staging table or dataset). Dynamically execute ALTER TABLE statements.

Example SQL Script inside a Stored Procedure

DECLARE @columnName NVARCHAR(MAX), @columnType NVARCHAR(MAX);
DECLARE column_cursor CURSOR FOR
SELECT column_name, data_type FROM StagingSchemaChanges;

OPEN column_cursor
FETCH NEXT FROM column_cursor INTO @columnName, @columnType

WHILE @@FETCH_STATUS = 0
BEGIN
EXEC('ALTER TABLE dbo.MyGoldTable ADD ' + @columnName + ' ' + @columnType)
FETCH NEXT FROM column_cursor INTO @columnName, @columnType
END

CLOSE column_cursor
DEALLOCATE column_cursor

Please refer community threads and Microsoft official articles and documents.

Solved: Spark Connector for MS Fabric Warehouse - Microsoft Fabric Community

Solved: Re: Pyspark notebook connect to MS Fabric Warehou... - Microsoft Fabric Community

SparkConfigurationOperations interface | Microsoft Learn

Azure Databricks Automatic Schema Evolution - Microsoft Q&A

Spark connector for Microsoft Fabric Data Warehouse - Microsoft Fabric | Microsoft Learn

Get data from Apache Spark - Microsoft Fabric | Microsoft Learn

Spark Connector for Fabric Data Warehouse (DW) – Preview | Microsoft Fabric Blog | Microsoft Fabric

If my response has resolved your query, please mark it as the "Accepted Solution" to assist others. Additionally, a "Kudos" would be appreciated if you found my response helpful.

Thank you

Spark Connector for Microsoft Fabric Warehouse MergeSchema

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Spark Connector for Microsoft Fabric Warehouse MergeSchema

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026