Solved: Re: Issues writing to Warehouse from PySpark Noteb...

TNastyCodes · ‎10-06-2025

Hi all,

I've been having difficulties leveraging pyspark notebooks to their fullest extent. Currently I establish my bronze layer tables (around 4 tables) with optimized partitioning and z-ordering in a notebook and save them into a lakehouse. From here, my silver layer is going to be defined in a warehouse. Currently I'm using a notebook to attempt to do this with difficulty, and am wondering if there is anyway to work through this? I have the ability to set up the tables through SQL stored procedures and creating the silver layer with full overwrites on the data by truncating the original table, creating a new one, and then inserting data from my bronze view using T-SQL. However, I am more interested in using pyspark to drive this if possible.

First, I tried:

table.write.format("delta")\

.mode("overwrite")\

.option('overwriteSchema',"true")\

.saveAsTable('warehouse_name.dbo.table')

This fails with the following error for which I assume the issue is due to the warehouse not being mounted but from my understanding logic for this isn't in place.

Py4JJavaError: An error occurred while calling o6124.saveAsTable. : Operation failed: "Bad Request", 400, PUT, http://onelake.dfs.fabric.microsoft.com/*/Tables/dbo/table/_delta_log?resource=directory&timeout=90, OperationNotAllowedOnThePath, "PUT call is not allowed on Tables path

My next attempt was as follows:

table.write.mode('overwrite').synapsesql('warehouse_name.dbo.table')

This fails with the following error:
Py4JJavaError: An error occurred while calling o6136.synapsesql. : com.microsoft.spark.fabric.tds.write.error.FabricSparkTDSWriteError: Write orchestration failed...

WITH( FILE_TYPE = 'PARQUET', CREDENTIAL = ( IDENTITY='Shared Access Signature', SECRET='?[REDACTED]')). A retry attempt for error code - 13807 isn't expected to change outcome...

: Content of directory on path 'https://*.*.dfs.fabric.microsoft.com/*/_system/artifacts/*/user/trusted-service-user/table*/*.parquet' cannot be listed.

Unsure as to why this occurs

My last attempt was done with a jdbc based approach:

warehouse_url = "*.datawarehouse.fabric.microsoft.com"

jdbc_url = f"jdbc:sqlserver://{warehouse_url}:1433;database={db_name}"

final_table = f"[dbo].[table]"

token = mssparkutils.credentials.getToken("pbi")

table.write \

.format("jdbc") \

.option("url", jdbc_url) \

.option("dbtable", final_table) \

.option("accessToken", token) \

.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \

.mode("overwrite") \

.option("overwriteSchema", "true")\

.save('warehouse_name.dbo.table')

This fails with the following error:

Py4JJavaError: An error occurred while calling o6958.save. : com.microsoft.sqlserver.jdbc.SQLServerException: The data type 'nvarchar(max)' is not supported in this edition of SQL Server.

However, I checked the schema for my table that I'm writing with and we do have multiple string columns with a max length of about 100 characters, so I implemented the following code iterating over all string columns to resolve the error with no luck.

table= table.withColumn(col, F.col(col).cast(T.VarcharType(200)))

Any other alternatives for writing to warehouses from a notebook that I am missing? Or are any of these issues resolvable? Thank you!

tayloramy · ‎10-06-2025

Hi @TNastyCodes ,

What your errors mean:
• saveAsTable(... 'warehouse_name.dbo.table') -> PUT not allowed on Tables path – you tried to write a Delta table directly into a Warehouse’s managed storage. Warehouses don’t accept Delta writes from Spark; all writes go through the SQL engine, not straight into OneLake folders. See “two-phase write via COPY INTO” in the Spark <-> Warehouse connector doc, and note that Warehouse tables are stored/managed by the SQL engine even though they live in OneLake. (Spark connector for Fabric Data Warehouse, Lakehouse & Warehouse “Better Together”)
• df.write.synapsesql(...)-> FabricSparkTDSWriteError ... Content ... cannot be listed – this usually indicates a staging step failure in the connector’s two-phase write (Spark stages Parquet, then the engine runs COPY INTO). Common causes: Private Link enabled (write not supported) or environment/permission issues. (doc notes & restrictions)
• JDBC -> NVARCHAR(MAX) not supported – Fabric Warehouse doesn’t support NVARCHAR types; use VARCHAR (UTF-8). VARCHAR(MAX) exists but is preview with a 1 MB limit, so the safest path is VARCHAR(n) where possible. (Warehouse data types)

Quick solution

Prefer the Fabric Spark connector (synapsesql) for PySpark → Warehouse.
– Use Runtime 1.3 (or newer).
– Disable Private Link for write scenarios (tenant/workspace).
– Pre-create the Warehouse table with VARCHAR columns (not NVARCHAR) if you need strict typing.
(doc with examples & constraints)
If you must use JDBC:
– Pre-create the target table in the Warehouse with VARCHAR types and then TRUNCATE TABLE + mode("append") from Spark.
– Or specify createTableColumnTypes so Spark won’t default to NVARCHAR.
(data type support)
Consider SQL-first ingestion for Bronze>Silver:
– Use Warehouse T-SQL (CTAS / INSERT-SELECT) reading from the Lakehouse SQL analytics endpoint (via shortcuts/gold views) or
– Use COPY INTO into Warehouse from external storage. OneLake as a direct COPY source has been in preview and may vary by tenant; if it isn’t enabled, stage in ADLS Gen2.
(Lakehouse/Warehouse integration, Warehouse ingest overview)

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

View solution in original post

v-tsaipranay · ‎10-07-2025

Hi @TNastyCodes @TNastyCodes

Thanks for reaching out to the Microsoft fabric community forum.

As @apturlov given the recommended PySpark-to-Warehouse write method using the Fabric Spark connector and .synapsesql(). @tayloramy clearly explained the errors with earlier approaches, highlighting permission and configuration issues. @smeetsh suggested Data Pipelines as another solid option for Warehouse load.

Since you are still experiencing the staging error with .synapsesql(), please check that your Fabric workspace has write permissions to the target Warehouse, that Private Link is disabled for write operations, and that the Spark Runtime is version 1.3 or higher.

After confirming these settings, your notebook should be able to complete the .synapsesql() successfully.

Please let us know if you continue to experience issues after verifying the above. f you require any further assistance, feel free to let us know and we will be happy to support you.

Thank you.

smeetsh · ‎10-06-2025

I don't think you can write to a warehouse from a notebook. We also use and architecture were we load various degrees of the medalion model in to warehouse, but what we do is use a pipeline that runs either a copy activity or a script activity that connects to the SQL endpoint of the lakehouse and uses SQL to do our ETL. You dont have to truncate your gold and silver tables (Though if the amount of data isn't huge, it may very wel be the most efficient). Using scripts you could create and upsert script that checks if a row excists and updates values if it does, and inserts new rows if it doesn't

(I think the sql merge fucntion is in preview, but I am not sure and if it is I haven't tried it yet, but we are eagerly awaiting that one lol)

I hope this helps a bit
Cheers

Hans

tayloramy · ‎10-06-2025

Hi @TNastyCodes ,

What your errors mean:
• saveAsTable(... 'warehouse_name.dbo.table') -> PUT not allowed on Tables path – you tried to write a Delta table directly into a Warehouse’s managed storage. Warehouses don’t accept Delta writes from Spark; all writes go through the SQL engine, not straight into OneLake folders. See “two-phase write via COPY INTO” in the Spark <-> Warehouse connector doc, and note that Warehouse tables are stored/managed by the SQL engine even though they live in OneLake. (Spark connector for Fabric Data Warehouse, Lakehouse & Warehouse “Better Together”)
• df.write.synapsesql(...)-> FabricSparkTDSWriteError ... Content ... cannot be listed – this usually indicates a staging step failure in the connector’s two-phase write (Spark stages Parquet, then the engine runs COPY INTO). Common causes: Private Link enabled (write not supported) or environment/permission issues. (doc notes & restrictions)
• JDBC -> NVARCHAR(MAX) not supported – Fabric Warehouse doesn’t support NVARCHAR types; use VARCHAR (UTF-8). VARCHAR(MAX) exists but is preview with a 1 MB limit, so the safest path is VARCHAR(n) where possible. (Warehouse data types)

Quick solution

Prefer the Fabric Spark connector (synapsesql) for PySpark → Warehouse.
– Use Runtime 1.3 (or newer).
– Disable Private Link for write scenarios (tenant/workspace).
– Pre-create the Warehouse table with VARCHAR columns (not NVARCHAR) if you need strict typing.
(doc with examples & constraints)
If you must use JDBC:
– Pre-create the target table in the Warehouse with VARCHAR types and then TRUNCATE TABLE + mode("append") from Spark.
– Or specify createTableColumnTypes so Spark won’t default to NVARCHAR.
(data type support)
Consider SQL-first ingestion for Bronze>Silver:
– Use Warehouse T-SQL (CTAS / INSERT-SELECT) reading from the Lakehouse SQL analytics endpoint (via shortcuts/gold views) or
– Use COPY INTO into Warehouse from external storage. OneLake as a direct COPY source has been in preview and may vary by tenant; if it isn’t enabled, stage in ADLS Gen2.
(Lakehouse/Warehouse integration, Warehouse ingest overview)

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

TNastyCodes · ‎10-09-2025

Hi @tayloramy,

Thanks for this breakdown this definitely helped on what I was doing wrong. Currently I'm using Fabric Runtime 1.3 and I reached out to an admin on my side to see if I have private link on. I got this issue trying to create the table using spark sql with well defined varchars which might also be due to the private link error.

Moving forward I'll try using Warehouse T-SQL to create the tables in my pipeline first. Thank you!

smeetsh · ‎10-09-2025

if you are using a pipeline, you can use a copy activity and that will create the table in the warehouse for you...saves a lot of typing 🙂

apturlov · ‎10-06-2025

@TNastyCodes the challenges you describe with your approach do require some deeper investigation. However, I wanted to ask you about your approach from an overall architecture perspective. Using Warehouse in a silver layer might make sense in your case but traditionally a Warehouse is used in a gold layer because it represents a finalized data analytics product ready for consumption through reports or direct queries, while the silver layer is usually reserved for conformed data that can be shared across other data products. The main difference between a Warehouse and a Lakehouse is the Warehouse is primarily SQL oriened as it's expected to have a star schema with data loaded from silver and/or bronze layers. That said, I'd like to postulate a few suggestions:

Using a Notebook for SQL-centric storage (Warehouse) is possible but not the most effective.
When working with a warehouse a two-staged approach is preferred: first create a star or snowflake schema and then load data into the warehouse as a separate activity.
Loading data into a warehouse on a regular basis implies that an incremental load should be used as there are already some historical data in the warehouse after a first load.
As warehouse is SQL-centric it would be easier to use t-sql or spark SQL in the notebook without a need to use a python-native technique to connect to a SQL endpoint.

That being said, there is a way to load data into a warehouse using PySpark in a notebook. The code sample below does that using a specific synapsesql connector:

import com.microsoft.spark.fabric
df = spark.sql("SELECT * FROM Lakehouselab.silver.sales LIMIT 1000")
df.write.mode("overwrite").synapsesql("LabDwh.dbo.sales_data")

As you can see from the screenshot the code executes successfully:

Please note that tables from both Lakehouslab lakehouse and LabDwh warehouse are referred to by a full name.

Hope this helps. If you find this post helped solve your problem please mark it as a solution. If you find this post useful otherwise please give it a kudo.

TNastyCodes · ‎10-06-2025

Hi @apturlov thanks for your insight!

So to your points, my goal was to have the bronze layer in a lakehouse with the silver and gold in a warehouse as a lot of our silver analytics will still have highly useful data for analytics for downstream users. We do have a STAR schema for centric tables and a high level SCD process for loading in data in more of the two staged process you're describing which it seems like I'll be trying to adapt as the notebook is still limited seemingly for me. I again tried the syntax you showed since before I was using synapsesql to read from the bronze layer and then also to write, and instead tried to use spark.sql to read and synapsesql for the write but its giving me the same error which may be a symptom of access issues.

Issues writing to Warehouse from PySpark Notebook

Quick solution

Quick solution

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

Fabric Data Days starts November 4th!

Issues writing to Warehouse from PySpark Notebook

Quick solution

Quick solution

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026