Solved: Re: PySpark Update Warehouse

holgergubbels · ‎01-19-2026

Hi community,

I really have problems implementing a medallion architecture in fabric - I always struggle at some point with (I thought) easiest things... It is really frustrating - and Copilot/chatgpt/Claude are answering wrong things...

I have a landing zone in a lakehouse. I did this using Pipeline/Copy activity. Now I want to load everything in bronze - using PySpark. I decided to use a separate Warehouse to store all my metadata (Log/RunConfig/Wartermarks etc.). Especially because Pipelines have easy access to SQL-Tables in a Warehouse.

Now I am inside a PySpark notebook. E.g.: When it comes to Logging I want to insert a log entry and update the same one after finishing the job. Watermarks as well have to be updated.

First I thought I use StoredProcedures. But I could not find a way to use StoredProcedures. Same problem when it comes to "Update" Statements. I do not want to use the verbose jdbc-connector, because I need to insert driver,user,passowrd etc.

Is there any way to update data in a Warehouse or call StoredProcs, where my Metadata is stored? Or ist probably a bad idea to do something like that? Do I miss something? It is really hard to get an architecture up and running when there is so less documentation...

Thanks

Holger

holgergubbels · ‎01-23-2026

Hi @chetanhiwale ,

thanks for your answer! I decided to use another approach:

- Load to landing: pipeline using copy activity, logs/watermarks via SP in Warehouse

- Raw to Bronze: Piepline for logging, watermark, call notebook for the load.

- same with bronze to silver

It is not the best approach I think, because the spark session is always startet and temrinated for every job. But it is the best I could find...

Holger

View solution in original post

chetanhiwale · ‎01-22-2026

Hi @holgergubbels ,

We had the same issue during our implementation. These are the things which we implemented.
1. Checks the logs types as data and fabric items logs/audits.
Data logs/audits were written using JDBC.
Fabric Item logs/audits were done using stored proc in a pipeline.

2. For data logs/audits we used JDBC connection. This was easy to use and was able to run our stored proc. We were fetching our secrets from Keyvault.

Honestly, I dont recommend adding logs/audits using Synapse sql connector for writng any data. Although its best for reading the data.

For your case the best option will be
1. Create your stored proc in Warehouse

2. Add secrets of your warehouse in Azure Key vault

3. Fetch those secrets in runtime in notebook

4. Use those credentials to hit the stored proc using Pyodbc module.

Hope this helps. Lets me know if you have any doubts

holgergubbels · ‎01-23-2026

Hi @chetanhiwale ,

thanks for your answer! I decided to use another approach:

- Load to landing: pipeline using copy activity, logs/watermarks via SP in Warehouse

- Raw to Bronze: Piepline for logging, watermark, call notebook for the load.

- same with bronze to silver

It is not the best approach I think, because the spark session is always startet and temrinated for every job. But it is the best I could find...

Holger

v-pgoloju · ‎01-22-2026

Hi @holgergubbels,

Thank you for reaching out to the Microsoft Fabric Forum Community, and special thanks to @apturlov , @tayloramy @deborshi_nag for prompt and helpful responses.

Just following up to see if the Response provided by community members were helpful in addressing the issue. if the issue still persists Feel free to reach out if you need any further clarification or assistance.

Best regards,
Prasanna Kumar

apturlov · ‎01-19-2026

Hi @holgergubbels, as @deborshi_nag and @tayloramy mentioned above the confusion in this situation comes from misunderstanding the Fabric resources and how to use them. Here is a decision guide that may help you make the right choice between Warehouse and Lakehouse Microsoft Fabric Decision Guide: Choose between Warehouse and Lakehouse - Microsoft Fabric | Microso....

My preference, when implementing a medallion architecture in Fabric, is to use a Lakehouse for both data landing, raw (bronze), and curated (silver) medallion tiers. I would use a Warehouse for consumption (gold) tiers only because dimensional modelling is better implemented in a Warehouse and consumption is usually better aligned with using SQL. From this point of view, Fabric tools map naturally: Fabric pipelines can orchestrate all batch operations, including but not limited to data ingestion, landing, validations and transformations. Complex data transformations are implemented with Notebooks, and data loading into a Warehouse using script activity or a stored procedure activity. The metadata could be stored in a Lakehouse (files or detla tables) or in a SQL database. I would advise against using a Warehouse as a metadata storage.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

If you find this answer useful or solving your problem please consider giving kudos and/or marking as a solution.

tayloramy · ‎01-19-2026

Hi @holgergubbels,

The way that I would typically approach this is to just work with the lakehouses natively. I store my metadata in a Lakehouse as it makes it nice and easy to interact with in notebooks.

Then when updating records, I will filter on records where completed (or another similar flag) is null and update the most recent record (which in my case corresponds with the current run) for whatever table/process is currently running.

If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Proud to be a Super User!

deborshi_nag · ‎01-19-2026

Hello @holgergubbels

In Microsoft Fabric, Spark and SQL are two different engines. While they share the same storage (OneLake), Spark cannot directly execute a UPDATE statement or a Stored Procedure inside a Fabric Data Warehouse using standard Spark commands.

The recommended approach -

1. To Write/Append Metadata

You can use SynapseSQL connector to write/append metadata.

df.write.mode("overwrite").synapsesql("<warehouse>.<schema>.<table>")

2. To Update or call Stored Procedures

To perform a specific UPDATE or run a Stored Procedure for watermarking, you have two main choices

a) The Pipeline Activity

Use Fabric Pipeline as the orchestrator

i) Run your Spark Notebook.

ii) Add a Stored Procedure Activity or a Script Activity immediately after Notebook in the pipeline.

iii) Pass parameters from the Notebook back to the Pipeline using mssparkutils.notebook.exit()

b) Python pyodbc

If you absolutely need to trigger a stored proc during the Spark job, you can use a Python library like pyodbc. Since Fabric Warehouses support Azure AD authentication, you can get a token via mssparkutils to avoid hardcoding credentials.

Hope this helps - please appreciate by leaving a Kudos or accepting as a Solution!

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

PySpark Update Warehouse

Helpful resources

Join our Fabric User Panel