Solved: Staging area in fabric

Si_7777 · ‎11-25-2024

New to fabric so please go gently

I currently have a Data Warehouse in SQL Server. We have a database for staging and a database for the Data Warehouse containing dimension and facts. We use SSIS to ETL the data from legacy to staging and then to the DW. I am looking to move this into the cloud preferably Fabric. I have watched allot of You Tube videos on fabric/azure/dw and all of them seem to do things different which makes me double guess myself, so can anyone help clear the mud.

For a staging area in Fabric would this be a lake-house as some videos have shown staging tables as part of the warehouse?

FabianSchut · ‎11-25-2024

Hi, I always use lakehouses for staging.

One advantage of lakehouses is that you can store (un)structered data files (like JSON or even images). Especially if you work with API-calls that contain complex JSON responses, it could be convenient to first store the output as a JSON file before transforming it into a delta table.

Another advantage of a lakehouse is that it natively interacts with a PySpark Notebook. I prefer to use PySpark to extract data sources. With PySpark you have all the flexibilty to connect to all (complex) data sources.

The last advantage I will mention here is the use of shortcuts. In a lakehouse, you can create shortcuts from other lakehouses or warehouses, but also other data sources outside Fabric. You can find the full list of shortcut sources here: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts#types-of-shortcuts

Warehouses in Fabric lack these three features, that is why my preference is a lakehouse for staging.

View solution in original post

Srisakthi · ‎11-25-2024

Hi @Si_7777 ,

Do you want to completely move from legacy to cloud? or just want to keep only the data warehouse piece?

Complete Legacy to Cloud:

When you wan to completely move from legacy to cloud, you can consider the following

Staging Area : Lakehouse files/tables

Data Ingestion : To get the data from legacy source systems you can use data pipeline(check for your connector) and load it to staging area.

Data Transformation : If you want to clean and transform your data you can use notebook and persist the data to warehouse.

Consumption Area : Warehouse

Only from Warehouse to Cloud:

When you wan to move from warehouse to cloud, you can consider the following

Consumption Area : Warehouse

Data Ingestion : To get the data from warehouse systems you can use data pipeline(check for your connector) and load it to Warehouse. Assuming your warehouse is already a cleaned and ready to consume data.

Regards,

Srisakthi

View solution in original post

Srisakthi · ‎11-25-2024

Hi @Si_7777 ,

Do you want to completely move from legacy to cloud? or just want to keep only the data warehouse piece?

Complete Legacy to Cloud:

When you wan to completely move from legacy to cloud, you can consider the following

Staging Area : Lakehouse files/tables

Data Ingestion : To get the data from legacy source systems you can use data pipeline(check for your connector) and load it to staging area.

Data Transformation : If you want to clean and transform your data you can use notebook and persist the data to warehouse.

Consumption Area : Warehouse

Only from Warehouse to Cloud:

When you wan to move from warehouse to cloud, you can consider the following

Consumption Area : Warehouse

Data Ingestion : To get the data from warehouse systems you can use data pipeline(check for your connector) and load it to Warehouse. Assuming your warehouse is already a cleaned and ready to consume data.

Regards,

Srisakthi

Si_7777 · ‎11-26-2024

Hi Srisakthi

Yes eventually complete to cloud, thank you for your reply, Simon

FabianSchut · ‎11-25-2024

Hi, I always use lakehouses for staging.

One advantage of lakehouses is that you can store (un)structered data files (like JSON or even images). Especially if you work with API-calls that contain complex JSON responses, it could be convenient to first store the output as a JSON file before transforming it into a delta table.

Another advantage of a lakehouse is that it natively interacts with a PySpark Notebook. I prefer to use PySpark to extract data sources. With PySpark you have all the flexibilty to connect to all (complex) data sources.

The last advantage I will mention here is the use of shortcuts. In a lakehouse, you can create shortcuts from other lakehouses or warehouses, but also other data sources outside Fabric. You can find the full list of shortcut sources here: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts#types-of-shortcuts

Warehouses in Fabric lack these three features, that is why my preference is a lakehouse for staging.

Si_7777 · ‎11-26-2024

Thank you Fabian very helpful

Staging area in fabric

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Staging area in fabric

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026