<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Spark Connector for Microsoft Fabric Warehouse MergeSchema in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Connector-for-Microsoft-Fabric-Warehouse-MergeSchema/m-p/4697128#M9359</link>
    <description>&lt;DIV&gt;Hello everyone,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I recently implemented the Fabric Spark Connector as described in this community blog post:&lt;/DIV&gt;&lt;DIV&gt;&lt;A href="https://community.fabric.microsoft.com/t5/Data-Engineering-Community-Blog/Spark-Connector-for-Fabric-Warehouse-Unified-Analytics/ba-p/4611309" target="_self"&gt;https://community.fabric.microsoft.com/t5/Data-Engineering-Community-Blog/Spark-Connector-for-Fabric-Warehouse-Unified-Analytics/ba-p/4611309&lt;/A&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;My architecture follows the classic Silver Lakehouse(Test workspace) → Gold Warehouse(Production workspace) pattern. An automated ETL job runs daily and performs as expected—except when the schema in the Silver layer changes.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;⸻&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Issue #1: Schema evolution from Lakehouse to Warehouse&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;option("mergeSchema", "true") works for Delta tables in the Lakehouse but is not applied to Warehouse tables. Consequently, whenever a new column is added in the Silver Lakehouse, the write to the Gold Warehouse fails unless I manually run the gold warehouse:&lt;/DIV&gt;&lt;DIV&gt;ALTER TABLE &amp;lt;table&amp;gt; ADD &amp;lt;column&amp;gt; &amp;lt;datatype&amp;gt;;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Because Spark notebooks cannot execute T-SQL directly against a Warehouse, this step becomes manual and error-prone.(Execute SQL)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Issue #2: Automating the ALTER TABLE step&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;To eliminate manual maintenance, I am considering a stored procedure called from a Fabric Pipeline. The idea is:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;1. The notebook compares the DataFrame schema with INFORMATION_SCHEMA.COLUMNS to identify missing columns.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;2. It passes the list of missing columns to a pipeline parameter.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;3. The pipeline executes a stored procedure (or SQL Script) that iterates through the list and issues the necessary ALTER TABLE commands.(&lt;U&gt;&lt;EM&gt;&lt;STRONG&gt;However, with Warehouses only supports current workspaces. I use different workspaces!&lt;/STRONG&gt;&lt;/EM&gt;&lt;/U&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I have not found documentation on how to pass notebook output as a parameter into a pipeline activity. If anyone has implemented a similar pattern—or can point me to relevant guidance—your input would be greatly appreciated.&lt;/DIV&gt;</description>
    <pubDate>Sun, 18 May 2025 08:52:28 GMT</pubDate>
    <dc:creator>fabconmvp</dc:creator>
    <dc:date>2025-05-18T08:52:28Z</dc:date>
    <item>
      <title>Spark Connector for Microsoft Fabric Warehouse MergeSchema</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Connector-for-Microsoft-Fabric-Warehouse-MergeSchema/m-p/4697128#M9359</link>
      <description>&lt;DIV&gt;Hello everyone,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I recently implemented the Fabric Spark Connector as described in this community blog post:&lt;/DIV&gt;&lt;DIV&gt;&lt;A href="https://community.fabric.microsoft.com/t5/Data-Engineering-Community-Blog/Spark-Connector-for-Fabric-Warehouse-Unified-Analytics/ba-p/4611309" target="_self"&gt;https://community.fabric.microsoft.com/t5/Data-Engineering-Community-Blog/Spark-Connector-for-Fabric-Warehouse-Unified-Analytics/ba-p/4611309&lt;/A&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;My architecture follows the classic Silver Lakehouse(Test workspace) → Gold Warehouse(Production workspace) pattern. An automated ETL job runs daily and performs as expected—except when the schema in the Silver layer changes.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;⸻&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Issue #1: Schema evolution from Lakehouse to Warehouse&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;option("mergeSchema", "true") works for Delta tables in the Lakehouse but is not applied to Warehouse tables. Consequently, whenever a new column is added in the Silver Lakehouse, the write to the Gold Warehouse fails unless I manually run the gold warehouse:&lt;/DIV&gt;&lt;DIV&gt;ALTER TABLE &amp;lt;table&amp;gt; ADD &amp;lt;column&amp;gt; &amp;lt;datatype&amp;gt;;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Because Spark notebooks cannot execute T-SQL directly against a Warehouse, this step becomes manual and error-prone.(Execute SQL)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Issue #2: Automating the ALTER TABLE step&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;To eliminate manual maintenance, I am considering a stored procedure called from a Fabric Pipeline. The idea is:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;1. The notebook compares the DataFrame schema with INFORMATION_SCHEMA.COLUMNS to identify missing columns.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;2. It passes the list of missing columns to a pipeline parameter.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;3. The pipeline executes a stored procedure (or SQL Script) that iterates through the list and issues the necessary ALTER TABLE commands.(&lt;U&gt;&lt;EM&gt;&lt;STRONG&gt;However, with Warehouses only supports current workspaces. I use different workspaces!&lt;/STRONG&gt;&lt;/EM&gt;&lt;/U&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I have not found documentation on how to pass notebook output as a parameter into a pipeline activity. If anyone has implemented a similar pattern—or can point me to relevant guidance—your input would be greatly appreciated.&lt;/DIV&gt;</description>
      <pubDate>Sun, 18 May 2025 08:52:28 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Connector-for-Microsoft-Fabric-Warehouse-MergeSchema/m-p/4697128#M9359</guid>
      <dc:creator>fabconmvp</dc:creator>
      <dc:date>2025-05-18T08:52:28Z</dc:date>
    </item>
    <item>
      <title>Spark Connector for Microsoft Fabric Warehouse MergeSchema</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Connector-for-Microsoft-Fabric-Warehouse-MergeSchema/m-p/4698007#M9459</link>
      <description>&lt;DIV&gt;Hello everyone,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I recently implemented the Fabric Spark Connector as described in this community blog post:&lt;/DIV&gt;&lt;DIV&gt;&lt;A href="https://community.fabric.microsoft.com/t5/Data-Engineering-Community-Blog/Spark-Connector-for-Fabric-Warehouse-Unified-Analytics/ba-p/4611309" target="_self"&gt;https://community.fabric.microsoft.com/t5/Data-Engineering-Community-Blog/Spark-Connector-for-Fabric...&lt;/A&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;My architecture follows the classic Silver Lakehouse(Test workspace) → Gold Warehouse(Production workspace) pattern. An automated ETL job runs daily and performs as expected—except when the schema in the Silver layer changes.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Issue #1: Schema evolution from Lakehouse to Warehouse&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;option("mergeSchema", "true") works for Delta tables in the Lakehouse but is not applied to Warehouse tables. Consequently, whenever a new column is added in the Silver Lakehouse, the write to the Gold Warehouse fails unless I manually run the gold warehouse:&lt;/DIV&gt;&lt;DIV&gt;ALTER TABLE &amp;lt;table&amp;gt; ADD &amp;lt;column&amp;gt; &amp;lt;datatype&amp;gt;;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Because Spark notebooks cannot execute T-SQL directly against a Warehouse, this step becomes manual and error-prone.(Execute SQL)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Issue #2: Automating the ALTER TABLE step&lt;/DIV&gt;&lt;DIV&gt;To eliminate manual maintenance, I am considering a stored procedure called from a Fabric Pipeline. The idea is:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;1. The notebook compares the DataFrame schema with INFORMATION_SCHEMA.COLUMNS to identify missing columns.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;2. It passes the list of missing columns to a pipeline parameter.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;3. The pipeline executes a stored procedure (or SQL Script) that iterates through the list and issues the necessary ALTER TABLE commands.(&lt;U&gt;&lt;EM&gt;&lt;STRONG&gt;However, with Warehouses only supports current workspaces. I use different workspaces!&lt;/STRONG&gt;&lt;/EM&gt;&lt;/U&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I have not found documentation on how to pass notebook output as a parameter into a pipeline activity. If anyone has implemented a similar pattern—or can point me to relevant guidance—your input would be greatly appreciated.&lt;/DIV&gt;</description>
      <pubDate>Mon, 19 May 2025 08:05:28 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Connector-for-Microsoft-Fabric-Warehouse-MergeSchema/m-p/4698007#M9459</guid>
      <dc:creator>fabconmvp</dc:creator>
      <dc:date>2025-05-19T08:05:28Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Connector for Microsoft Fabric Warehouse MergeSchema</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Connector-for-Microsoft-Fabric-Warehouse-MergeSchema/m-p/4698420#M9460</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/837054"&gt;@fabconmvp&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you for reaching out to the Microsoft Community Forum.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Issue 1: mergeSchema not working for Warehouse&lt;/P&gt;
&lt;P&gt;Option("mergeSchema", "true") is a Delta Lake feature and only applies within Lakehouse environments (which are backed by Delta). Microsoft Fabric Warehouses are SQL-based, so schema evolution must be handled with explicit T-SQL ALTER TABLE statements.&lt;/P&gt;
&lt;P&gt;Solution: Because Spark can’t execute SQL directly against a Fabric Warehouse (and cross-workspace execution is limited), you need an external orchestrator (like Fabric Pipelines or Azure Data Factory) to bridge that gap.&lt;/P&gt;
&lt;P&gt;Issue 2: Automating ALTER TABLE via Fabric Pipeline&lt;/P&gt;
&lt;P&gt;You have outlined a promising approach. Let me break it down with more detail and suggestions:&lt;/P&gt;
&lt;P&gt;Please follow below steps.&lt;/P&gt;
&lt;P&gt;1. Compare Schemas in Notebook&lt;/P&gt;
&lt;P&gt;Inside your Fabric Spark notebook (Lakehouse workspace): Read the Silver Delta table into a DataFrame. Use the Warehouse's INFORMATION_SCHEMA.COLUMNS via the SQL Analytics endpoint (via JDBC) or REST API to get the current schema of the Warehouse table. Compare the two schemas to detect new columns.&lt;/P&gt;
&lt;P&gt;Example schema comparison logic:&lt;/P&gt;
&lt;P&gt;df_silver = spark.read.format("delta").load("/path/to/silver/table")&lt;BR /&gt;silver_schema = {field.name: field.dataType.simpleString() for field in df_silver.schema}&lt;/P&gt;
&lt;P&gt;# Assuming you pull warehouse schema into a DataFrame `df_gold_schema`&lt;BR /&gt;warehouse_schema = {row['column_name']: row['data_type'] for row in df_gold_schema.collect()}&lt;/P&gt;
&lt;P&gt;# Identify new columns&lt;BR /&gt;new_columns = [&lt;BR /&gt;(name, dtype) for name, dtype in silver_schema.items()&lt;BR /&gt;if name not in warehouse_schema&lt;BR /&gt;]&lt;/P&gt;
&lt;P&gt;2. Pass List to Pipeline&lt;/P&gt;
&lt;P&gt;The challenge here is how to get this list into a Fabric Pipeline parameter.&lt;/P&gt;
&lt;P&gt;Solution: Write the list of new columns to a Delta/Parquet/CSV file in OneLake. Use a Pipeline activity to read that file and pass its contents to a custom parameter in a stored procedure. Alternatively, write it to a SQL table in the Lakehouse that acts as a staging config.&lt;/P&gt;
&lt;P&gt;3. Execute Stored Procedure in Warehouse&lt;/P&gt;
&lt;P&gt;In the target Gold Warehouse, use a Fabric Pipeline SQL activity to: Read the list of new columns (from a staging table or dataset). Dynamically execute ALTER TABLE statements.&lt;/P&gt;
&lt;P&gt;Example SQL Script inside a Stored Procedure&lt;/P&gt;
&lt;P&gt;DECLARE @columnName NVARCHAR(MAX), @columnType NVARCHAR(MAX);&lt;BR /&gt;DECLARE column_cursor CURSOR FOR&lt;BR /&gt;SELECT column_name, data_type FROM StagingSchemaChanges;&lt;/P&gt;
&lt;P&gt;OPEN column_cursor&lt;BR /&gt;FETCH NEXT FROM column_cursor INTO @columnName, @columnType&lt;/P&gt;
&lt;P&gt;WHILE @@FETCH_STATUS = 0&lt;BR /&gt;BEGIN&lt;BR /&gt;EXEC('ALTER TABLE dbo.MyGoldTable ADD ' + @columnName + ' ' + @columnType)&lt;BR /&gt;FETCH NEXT FROM column_cursor INTO @columnName, @columnType&lt;BR /&gt;END&lt;/P&gt;
&lt;P&gt;CLOSE column_cursor&lt;BR /&gt;DEALLOCATE column_cursor&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please refer community threads and Microsoft official articles and documents.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Connector-for-MS-Fabric-Warehouse/m-p/4597115" target="_blank"&gt;Solved: Spark Connector for MS Fabric Warehouse - Microsoft Fabric Community&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://community.fabric.microsoft.com/t5/Data-Warehouse/Pyspark-notebook-connect-to-MS-Fabric-Warehouse-to-Insert-update/m-p/4372405" target="_blank"&gt;Solved: Re: Pyspark notebook connect to MS Fabric Warehou... - Microsoft Fabric Community&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/javascript/api/@azure/synapse-artifacts/sparkconfigurationoperations?view=azure-node-preview" target="_blank"&gt;SparkConfigurationOperations interface | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/answers/questions/1599953/azure-databricks-automatic-schema-evolution" target="_blank"&gt;Azure Databricks Automatic Schema Evolution - Microsoft Q&amp;amp;A&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/data-engineering/spark-data-warehouse-connector?tabs=pyspark" target="_blank"&gt;Spark connector for Microsoft Fabric Data Warehouse - Microsoft Fabric | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/real-time-intelligence/get-data-spark" target="_blank"&gt;Get data from Apache Spark - Microsoft Fabric | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blog.fabric.microsoft.com/en-sg/blog/spark-connector-for-fabric-data-warehouse-dw-public-preview?ft=02-2025:date" target="_blank"&gt;Spark Connector for Fabric Data Warehouse (DW) – Preview | Microsoft Fabric Blog | Microsoft Fabric&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If my response has resolved your query, please mark it as the&amp;nbsp;&lt;STRONG&gt;"Accepted Solution"&lt;/STRONG&gt;&amp;nbsp;to assist others. Additionally, a&amp;nbsp;&lt;STRONG&gt;"Kudos"&lt;/STRONG&gt;&amp;nbsp;would be appreciated if you found my response helpful.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thank you&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 May 2025 12:04:45 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Connector-for-Microsoft-Fabric-Warehouse-MergeSchema/m-p/4698420#M9460</guid>
      <dc:creator>v-dineshya</dc:creator>
      <dc:date>2025-05-19T12:04:45Z</dc:date>
    </item>
  </channel>
</rss>

