Solved: Handling Pipeline Failures and Audit Layer for Dat...

Hussain_Abedi · ‎01-23-2025

I am ingesting data from PostgreSQL and MySQL into the Fabric Lakehouse Bronze layer. In the Silver layer, the two tables are joined, and the combined data is stored in a delta table, which I have named the temp table.

From the temp table, the data is ingested into a warehouse. After ingestion, an inner join is performed between the temp table and the warehouse to identify the records that match in both places. These matched records are marked as loaded and saved into the Silver layer of the Lakehouse.

Problem:

If the pipeline breaks after the data is ingested into the warehouse but before it is marked as loaded and saved into the Silver layer, the incoming new data will overwrite the temp table, causing a loss of unprocessed records from the previous run. This happens because the Silver delta tables in Fabric are immutable, which led me to adopt this approach.

Questions:

What should I use as an audit layer?
How can I ensure that the records loaded into the warehouse are properly tracked, especially in the event of a pipeline failure?
How can I maintain which records have been loaded into the warehouse and which have not?

v-aatheeque · ‎01-27-2025

Hi @Hussain_Abedi ,

Thanks for reaching out to the Microsoft Fabric Community forum.
To address your data ingestion challenges and prevent data loss during pipeline failures, implement these strategies:

1. Create a dedicated audit table that logs the status of each record processed during the pipeline. This table can include the following columns: record_id , ingestion _timestamp, status, error_message.

2.Instead of directly overwriting the temp table, consider using a staging area where you can store the new incoming data. This allows you to keep the previous run's data until you confirm that the new data has been successfully processed

Ingest new data into the staging area.
Perform an inner join between the temp table and staging area to identify matching records.
Update the audit table to mark successfully ingested records as "loaded."
Overwrite the temp table with new data only after all records are processed and marked.
Use the audit table to mark records as "loaded" upon successful ingestion into the warehouse.
If a pipeline failure occurs, you can query the audit table to identify which records have not been marked as "loaded" and need to be reprocessed.

If this post was helpful, please consider marking Accept as solution to assist other members in finding it more easily.

If you continue to face issues, feel free to reach out to us for further assistance!

View solution in original post

v-aatheeque · ‎01-27-2025

Hi @Hussain_Abedi ,

Thanks for reaching out to the Microsoft Fabric Community forum.
To address your data ingestion challenges and prevent data loss during pipeline failures, implement these strategies:

1. Create a dedicated audit table that logs the status of each record processed during the pipeline. This table can include the following columns: record_id , ingestion _timestamp, status, error_message.

2.Instead of directly overwriting the temp table, consider using a staging area where you can store the new incoming data. This allows you to keep the previous run's data until you confirm that the new data has been successfully processed

Ingest new data into the staging area.
Perform an inner join between the temp table and staging area to identify matching records.
Update the audit table to mark successfully ingested records as "loaded."
Overwrite the temp table with new data only after all records are processed and marked.
Use the audit table to mark records as "loaded" upon successful ingestion into the warehouse.
If a pipeline failure occurs, you can query the audit table to identify which records have not been marked as "loaded" and need to be reprocessed.

If this post was helpful, please consider marking Accept as solution to assist other members in finding it more easily.

If you continue to face issues, feel free to reach out to us for further assistance!

v-aatheeque · ‎01-31-2025

Hi @Hussain_Abedi ,

We haven’t heard from you on the last response and was just checking back to see if your query got resolved. If this post was helpful, please consider marking Accept as solution to assist other members in finding it more easily.

If you continue to face issues, feel free to reach out to us for further assistance!

Thanks for using Microsoft Fabric Community.

v-aatheeque · ‎02-04-2025

Hi @Hussain_Abedi ,

We haven’t heard from you on the last response and was just checking back to see if your query got resolved. If this post was helpful, please consider marking Accept as solution to assist other members in finding it more easily.

If you continue to face issues, feel free to reach out to us for further assistance!

Thanks for using Microsoft Fabric Community.

v-aatheeque · ‎02-07-2025

Hi @Hussain_Abedi ,

We haven’t heard from you on the last response and was just checking back to see if your query got resolved. If this post was helpful, please consider marking Accept as solution to assist other members in finding it more easily.

If you continue to face issues, feel free to reach out to us for further assistance!

Thanks for using Microsoft Fabric Community.

Handling Pipeline Failures and Audit Layer for Data Ingestion in Microsoft Fabric Lakehouse

Problem:

Questions:

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

Handling Pipeline Failures and Audit Layer for Data Ingestion in Microsoft Fabric Lakehouse

Problem:

Questions:

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025