The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi Team,
I have hundrednds of files coming from our sources everyday. I would like to process these files in Fabric notebook and store extracted data into warehouse in around 50 to 100 tables. I would like process this data and store in warehouse with in 5 to 10mins, basically as fast as possible. I would like to load the data to staging can be in lakehouse or warehouse whichever is possible. Once data is staged, I would like to transfer the data from staging tables to final tables by keeping insert statements in transaction block. I am aware transaction block is not possible in fabric notebook I need to depend on Stored procedure to do this.
1) What is the best way to stage data? Is it faster to stage in lakehouse or warehouse in notebook? We have tried staging in Lakehouse tables but inserting data to lakehouse is taking more time around 20 mins in Fabric notebook where as storing the data to lakehouse as files instead of L:akehouse tables taking just 2 mins. But the challenge is again loading the staging data from Lakehouse files to Warehouse final table in transaction manner.
2) How to transfer data from Lakehouse files(Parquet format) to Warehouse table in transaction block manner?
Any suggestion?
Solved! Go to Solution.
Hi @NagaRK ,
Thank you for reaching out to the Microsoft Community Forum.
You are trying to process hundreds of log files daily using Microsoft Fabric notebooks and load the data into 50–100 warehouse tables within 5–10 minutes. You have already observed that writing to Lakehouse tables is slower 20 mins, while storing files in Lakehouse as Parquet is much faster 2 mins. You also want to transfer staged data to final warehouse tables using transactional logic via stored procedures.
1. Fabric notebooks excel in batch processing and complex transformations, but they do not inherently support real-time ingestion. For optimized storage and access, the recommendation is to write data into Delta tables, which are supported in Lakehouse environments.
Solution: Write Parquet files to Lakehouse is fast. Write to Lakehouse tables is slower due to metadata sync and Spark overhead. Warehouse ingestion is faster and more suitable for transactional operations.
Note: Stage data as Parquet files in Lakehouse for speed. Use COPY INTO or Pipelines to load from Lakehouse files into Warehouse staging tables, then use stored procedures for transactional inserts into final tables.
2. Fabric notebooks don’t support transaction blocks, but Stored Procedures in Warehouse will support.
Solution:
Please follow below steps.
1. Read Parquet files in Notebook and compare schema with Warehouse using "INFORMATION_SCHEMA.COLUMNS".
2. Write schema differences or data to a staging table or file in OneLake.
3. Use Fabric Pipeline to read staging data. Pass parameters to a stored procedure. Execute INSERT or ALTER TABLE statements in a transaction block.
Please refer below links.
Load data into a Microsoft Fabric data warehouse - Training | Microsoft Learn
Efficiently Processing Massive Data Volumes using ... - Microsoft Fabric Community
I hope this information helps. Please do let us know if you have any further queries.
Regards,
Dinesh
Hi @NagaRK ,
Thank you for reaching out to the Microsoft Community Forum.
You are trying to process hundreds of log files daily using Microsoft Fabric notebooks and load the data into 50–100 warehouse tables within 5–10 minutes. You have already observed that writing to Lakehouse tables is slower 20 mins, while storing files in Lakehouse as Parquet is much faster 2 mins. You also want to transfer staged data to final warehouse tables using transactional logic via stored procedures.
1. Fabric notebooks excel in batch processing and complex transformations, but they do not inherently support real-time ingestion. For optimized storage and access, the recommendation is to write data into Delta tables, which are supported in Lakehouse environments.
Solution: Write Parquet files to Lakehouse is fast. Write to Lakehouse tables is slower due to metadata sync and Spark overhead. Warehouse ingestion is faster and more suitable for transactional operations.
Note: Stage data as Parquet files in Lakehouse for speed. Use COPY INTO or Pipelines to load from Lakehouse files into Warehouse staging tables, then use stored procedures for transactional inserts into final tables.
2. Fabric notebooks don’t support transaction blocks, but Stored Procedures in Warehouse will support.
Solution:
Please follow below steps.
1. Read Parquet files in Notebook and compare schema with Warehouse using "INFORMATION_SCHEMA.COLUMNS".
2. Write schema differences or data to a staging table or file in OneLake.
3. Use Fabric Pipeline to read staging data. Pass parameters to a stored procedure. Execute INSERT or ALTER TABLE statements in a transaction block.
Please refer below links.
Load data into a Microsoft Fabric data warehouse - Training | Microsoft Learn
Efficiently Processing Massive Data Volumes using ... - Microsoft Fabric Community
I hope this information helps. Please do let us know if you have any further queries.
Regards,
Dinesh
Hi @NagaRK ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. And, if you have any further query do let us know.
Regards,
Dinesh