Don't miss your chance to take the Fabric Data Engineer (DP-600) exam for FREE! Find out how by attending the DP-600 session on April 23rd (pacific time), live or on-demand.
Learn moreJoin the FabCon + SQLCon recap series. Up next: Power BI, Real-Time Intelligence, IQ and AI, and Data Factory take center stage. All sessions are available on-demand after the live show. Register now
Hello
I am learning Microsoft Fabric during my data engineering training at Simplon.
I am building a pipeline that extracts data from an API every day using Fabric Data Factory.
I want to implement CDC to process only new inserts updates and deletes instead of loading the full dataset each day.
What is the best approach in Fabric to implement CDC
Should I load data into a staging table first and then detect changes or is there a native CDC approach
Thank you for your help !
🙂
Solved! Go to Solution.
Hello @kaouter
If your source is a REST API, then true CDC is not available in Microsoft Fabric, because CDC relies on database transaction logs. However, you can implement an industry-standard incremental ingestion pattern that achieves the same outcome.
1. Ingest incrementally from the API (source-side filtering)
If the API supports it, use:
In Fabric, this can be implemented using:
You store and reuse a watermark value (last successful timestamp or ID) between runs to fetch only new or changed records.
2. Land data in a Bronze (staging) area in the Lakehouse
This staging step is strongly recommended in modern lakehouse architectures.
3. Apply changes using MERGE (CDC-style processing)
Once data is in the Lakehouse:
Delta Lake’s MERGE operation is the standard mechanism for CDC-style processing in Fabric Lakehouses.
4. Handling deletes (often missed)
Industry best practice for APIs:
Fabric does not automatically detect deletes for APIs—this must be handled explicitly in your logic.
The best approach really depends on what capabilities your API provides.
First, you need to check whether the API returns a watermark field, such as a lastUpdatedDateTime, modifiedAt, or something similar. Also verify whether the API allows you to filter results based on that field, for example by passing a query parameter like ?updated_since=.
If the API supports both of these, then you can implement CDC very efficiently. You simply store the timestamp of the last successful pipeline run in a metadata table, and on each new run, you call the API using that timestamp to retrieve only the records that were inserted, updated, or deleted since the last run. From there, you can load the incremental results into a staging table and perform an upsert/merge operation into your Fabric tables.
This avoids reloading the full dataset every day and lets you process only the changed data.
Hi @kaouter , Hope you are doing well. Kindly let us know if the issue has been resolved or if further assistance is needed. Your input could be helpful to others in the community.
Hi @kaouter , Thank you for reaching out to the Microsoft Community Forum.
We find the answer shared by @deborshi_nag is appropriate. Can you please confirm if the solution worked for you. It will help others with similar issues find the answer easily.
Thank you @deborshi_nag for your valuable response.
Hello @kaouter
If your source is a REST API, then true CDC is not available in Microsoft Fabric, because CDC relies on database transaction logs. However, you can implement an industry-standard incremental ingestion pattern that achieves the same outcome.
1. Ingest incrementally from the API (source-side filtering)
If the API supports it, use:
In Fabric, this can be implemented using:
You store and reuse a watermark value (last successful timestamp or ID) between runs to fetch only new or changed records.
2. Land data in a Bronze (staging) area in the Lakehouse
This staging step is strongly recommended in modern lakehouse architectures.
3. Apply changes using MERGE (CDC-style processing)
Once data is in the Lakehouse:
Delta Lake’s MERGE operation is the standard mechanism for CDC-style processing in Fabric Lakehouses.
4. Handling deletes (often missed)
Industry best practice for APIs:
Fabric does not automatically detect deletes for APIs—this must be handled explicitly in your logic.
Hi @kaouter,
This all depends on the capabilities of your API.
Does the API return a watermark column like a last updated datetime?
Does the API allow you to filter results on that column?
If yes, then you can keep track of the last time you ran your pipeline to pull the data in a metadata table somewhere, and then only pull data that has changed since the last time your pipeline started. From there you can upsert or merge into your tables in Fabric.
Proud to be a Super User! | |
Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
| User | Count |
|---|---|
| 9 | |
| 5 | |
| 5 | |
| 4 | |
| 4 |
| User | Count |
|---|---|
| 29 | |
| 16 | |
| 10 | |
| 9 | |
| 8 |