The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Our source environment is all Microsoft SQL server, and there is no unstructured or semi structured data coming in to the Fabric. The need is that we're replacing a traditional data warehouse on-prem (SQL server SSIS/OLAP/SSRS) over to Fabric.
While I understand the benefits of storing RAW data in PARQUET file format in BRONZE layer, I wonder why not many discussions or articles discuss using a Lakehouse -> Warehouse -> Warehouse type medallion. Specifically, when the teams have not enough knowledge in SPARK or PySpark, SparkSQL, and the company has no need for dealing with additional technologies or does it have any unstructured or semi structured content coming into the Fabric echo system, I wonder what are the disadvantages with the above proposed model?
It allows us to do transformations using TSQL and not having to use notebooks or other features, and the final layer can still be in Warehouse. The company might eventually add some semi structured data (files) coming in directly in to the platform but that can be handled via Data Factory transformations. Since we are primarily looking to replace on-premse echo system with Fabric (since there is no more an option for using Azure Synapse warehouse in the classical sense - we're left with Fabric if we need to stay with MS).
So, trying to keep it simple and easy for our teams who are super comfortable with good old TSQL.
Looking for some inputs on this design choice - your two cents please.
Thanks!
I don't have experience with building medallion architecture, but here are my two cents:
I think yes, you can use data warehouse for silver and gold if that is your preference and suits your skillset best.
However maybe the skillset in your organization will change in the future, and maybe there is a tendency for new employees wanting to work with PySpark instead of T-SQL. I don't know.
The T-SQL functionality in Fabric Data Warehouse also has some limitations (as the underlying file format is Delta Table also in the Data Warehouse).
If you want to do Data Science / ML on the data, maybe that's easier if the silver layer is in Lakehouse. I'm not sure about the current possibility for reading/working with Data Warehouse data in Notebooks.
Also if you want to query and join data across workspaces, I think maybe that is easier with Lakehouse and Notebooks.
However, I think you just need to try the product and see what works best for you and your needs.
Lakehouse -> Warehouse -> Warehouse is also mentioned in this blog https://www.kevinrchant.com/2024/05/03/the-great-number-of-workspaces-for-medallion-architecture-in-...
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Fabric update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
4 | |
1 | |
1 | |
1 | |
1 |