Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! It's time to submit your entry. Live now!
Hi community,
I really have problems implementing a medallion architecture in fabric - I always struggle at some point with (I thought) easiest things... It is really frustrating - and Copilot/chatgpt/Claude are answering wrong things...
I have a landing zone in a lakehouse. I did this using Pipeline/Copy activity. Now I want to load everything in bronze - using PySpark. I decided to use a separate Warehouse to store all my metadata (Log/RunConfig/Wartermarks etc.). Especially because Pipelines have easy access to SQL-Tables in a Warehouse.
Now I am inside a PySpark notebook. E.g.: When it comes to Logging I want to insert a log entry and update the same one after finishing the job. Watermarks as well have to be updated.
First I thought I use StoredProcedures. But I could not find a way to use StoredProcedures. Same problem when it comes to "Update" Statements. I do not want to use the verbose jdbc-connector, because I need to insert driver,user,passowrd etc.
Is there any way to update data in a Warehouse or call StoredProcs, where my Metadata is stored? Or ist probably a bad idea to do something like that? Do I miss something? It is really hard to get an architecture up and running when there is so less documentation...
Thanks
Holger
Solved! Go to Solution.
Hi @chetanhiwale ,
thanks for your answer! I decided to use another approach:
- Load to landing: pipeline using copy activity, logs/watermarks via SP in Warehouse
- Raw to Bronze: Piepline for logging, watermark, call notebook for the load.
- same with bronze to silver
It is not the best approach I think, because the spark session is always startet and temrinated for every job. But it is the best I could find...
Holger
Hi @holgergubbels ,
We had the same issue during our implementation. These are the things which we implemented.
1. Checks the logs types as data and fabric items logs/audits.
Data logs/audits were written using JDBC.
Fabric Item logs/audits were done using stored proc in a pipeline.
2. For data logs/audits we used JDBC connection. This was easy to use and was able to run our stored proc. We were fetching our secrets from Keyvault.
Honestly, I dont recommend adding logs/audits using Synapse sql connector for writng any data. Although its best for reading the data.
For your case the best option will be
1. Create your stored proc in Warehouse
2. Add secrets of your warehouse in Azure Key vault
3. Fetch those secrets in runtime in notebook
4. Use those credentials to hit the stored proc using Pyodbc module.
Hope this helps. Lets me know if you have any doubts
Hi @chetanhiwale ,
thanks for your answer! I decided to use another approach:
- Load to landing: pipeline using copy activity, logs/watermarks via SP in Warehouse
- Raw to Bronze: Piepline for logging, watermark, call notebook for the load.
- same with bronze to silver
It is not the best approach I think, because the spark session is always startet and temrinated for every job. But it is the best I could find...
Holger
Hi @holgergubbels,
Thank you for reaching out to the Microsoft Fabric Forum Community, and special thanks to @apturlov , @tayloramy @deborshi_nag for prompt and helpful responses.
Just following up to see if the Response provided by community members were helpful in addressing the issue. if the issue still persists Feel free to reach out if you need any further clarification or assistance.
Best regards,
Prasanna Kumar
Hi @holgergubbels, as @deborshi_nag and @tayloramy mentioned above the confusion in this situation comes from misunderstanding the Fabric resources and how to use them. Here is a decision guide that may help you make the right choice between Warehouse and Lakehouse Microsoft Fabric Decision Guide: Choose between Warehouse and Lakehouse - Microsoft Fabric | Microso....
My preference, when implementing a medallion architecture in Fabric, is to use a Lakehouse for both data landing, raw (bronze), and curated (silver) medallion tiers. I would use a Warehouse for consumption (gold) tiers only because dimensional modelling is better implemented in a Warehouse and consumption is usually better aligned with using SQL. From this point of view, Fabric tools map naturally: Fabric pipelines can orchestrate all batch operations, including but not limited to data ingestion, landing, validations and transformations. Complex data transformations are implemented with Notebooks, and data loading into a Warehouse using script activity or a stored procedure activity. The metadata could be stored in a Lakehouse (files or detla tables) or in a SQL database. I would advise against using a Warehouse as a metadata storage.
If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.
Hi @holgergubbels,
The way that I would typically approach this is to just work with the lakehouses natively. I store my metadata in a Lakehouse as it makes it nice and easy to interact with in notebooks.
Then when updating records, I will filter on records where completed (or another similar flag) is null and update the most recent record (which in my case corresponds with the current run) for whatever table/process is currently running.
Hello @holgergubbels
In Microsoft Fabric, Spark and SQL are two different engines. While they share the same storage (OneLake), Spark cannot directly execute a UPDATE statement or a Stored Procedure inside a Fabric Data Warehouse using standard Spark commands.
The recommended approach -
1. To Write/Append Metadata
You can use SynapseSQL connector to write/append metadata.