Solved: Spark Connector for MS Fabric Warehouse

fabconmvp · ‎03-05-2025

Hello everyone,

Firstly, I hope you all are doing well. Recently microsoft article has been published about spark connector(https://learn.microsoft.com/en-us/fabric/data-engineering/spark-data-warehouse-connector?tabs=pyspar...). that interests me very well. Because before the article exists there are only workaround solution such as odbc or jdbc connections.

However, ı have few questions on that.

1) If i read or write other workspaces warehouse via spark that is really slow comparing to read or write LH(Abfs path). Why? Will it be optimize? How can ı improve performances? For example i write 10million rows to lakehouse(via abfs patch) it takes overall 3mins. İf it is warehouse(other workspaces) that takes overall 50minutes.

2)Apart from article, ı have simple spark question. Assume that, you read warehouse data. How can ı delete some rows via spark that effects to fabric warehouse. I don't write to warehouse as overwrite or append. I just want to delete rows from warehouse via spark.

Thank you for supports.

nilendraFabric · ‎03-05-2025

Why Warehouses Are Slower
1. ACID Compliance Overhead
• Full transaction support requires locking and delayed write visibility
• Multi-table transactions add coordination costs not present in Lakehouse’s append-only writes
2. T-SQL Translation Layer
• Spark-to-Warehouse writes go through a SQL translation layer, unlike Lakehouse’s direct Parquet writes
• Adds ~40% latency compared to ABFS path writes
3. File Management
• Warehouses use smaller file sizes (avg 8MB vs 128MB in Lakehouse), increasing metadata ops
• More files per write operation exacerbate Fabric’s file count limits

No Native Spark DELETE Support
Warehouse tables require T-SQL transactions for deletions, unlike Lakehouse tables that support Delta Lake’s `DELETE` via Spark-SQL

# Requires db_writer permissions
spark.sql("""
EXECUTE SQL.warehouse.sales_db
"DELETE FROM orders WHERE ProductKey = 5"
""")

View solution in original post

v-saisrao-msft · ‎03-10-2025

Hi @fabconmvp,

Thank you for reaching out to the Microsoft Fabric Forum Community.

Here are the links to Microsoft's official documentation that address your concerns about the Spark Connector for Fabric Warehouse and its performance:
Please refer to the links below:

Warehouse performance guidelines - Microsoft Fabric | Microsoft Learn

Spark connector for Microsoft Fabric Data Warehouse - Microsoft Fabric | Microsoft Learn

If this post helps, then please give us ‘Kudos’ and consider Accept it as a solution to help the other members find it more quickly.

Thank you.

View solution in original post

AndyDDC · ‎03-11-2025

Hi @nilendraFabric can you share where you got the info "Adds ~40% latency compared to ABFS path writes" from? The way that data is loaded from Spark into a Warehouse is it's staged first in storage then uses the fast COPY INTO process.

View solution in original post

v-saisrao-msft · ‎03-16-2025

Hi @fabconmvp,

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

nilendraFabric · ‎03-05-2025

Why Warehouses Are Slower
1. ACID Compliance Overhead
• Full transaction support requires locking and delayed write visibility
• Multi-table transactions add coordination costs not present in Lakehouse’s append-only writes
2. T-SQL Translation Layer
• Spark-to-Warehouse writes go through a SQL translation layer, unlike Lakehouse’s direct Parquet writes
• Adds ~40% latency compared to ABFS path writes
3. File Management
• Warehouses use smaller file sizes (avg 8MB vs 128MB in Lakehouse), increasing metadata ops
• More files per write operation exacerbate Fabric’s file count limits

No Native Spark DELETE Support
Warehouse tables require T-SQL transactions for deletions, unlike Lakehouse tables that support Delta Lake’s `DELETE` via Spark-SQL

# Requires db_writer permissions
spark.sql("""
EXECUTE SQL.warehouse.sales_db
"DELETE FROM orders WHERE ProductKey = 5"
""")

AndyDDC · ‎03-11-2025

Hi @nilendraFabric can you share where you got the info "Adds ~40% latency compared to ABFS path writes" from? The way that data is loaded from Spark into a Warehouse is it's staged first in storage then uses the fast COPY INTO process.

fabconmvp · ‎03-09-2025

Hello @nilendraFabric ,

Thank you for answering. Could you please share the resources link about why warehouse is slower.
And the spark.sql code that you've given doesn't work both same workspace or not same workspace via notebooks.

Thank you for supports.

v-saisrao-msft · ‎03-25-2025

Hi @fabconmvp,

We haven’t heard back from you regarding your issue. If it has been resolved, please mark the helpful response as the solution and give a ‘Kudos’ to assist others. If you still need support, let us know.

Thank you.

v-saisrao-msft · ‎03-12-2025

Hi @fabconmvp,

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

Thank you.

v-saisrao-msft · ‎03-10-2025

Hi @fabconmvp,

Thank you for reaching out to the Microsoft Fabric Forum Community.

Here are the links to Microsoft's official documentation that address your concerns about the Spark Connector for Fabric Warehouse and its performance:
Please refer to the links below:

Warehouse performance guidelines - Microsoft Fabric | Microsoft Learn

Spark connector for Microsoft Fabric Data Warehouse - Microsoft Fabric | Microsoft Learn

If this post helps, then please give us ‘Kudos’ and consider Accept it as a solution to help the other members find it more quickly.

Thank you.