topic Re: CDC ( incremental load ) in fabric pipline - help in Data Engineering

CDC ( incremental load ) in fabric pipline - help

kaouter — Sat, 14 Mar 2026 20:42:51 GMT

Hello

I am learning Microsoft Fabric during my data engineering training at Simplon.
I am building a pipeline that extracts data from an API every day using Fabric Data Factory.

I want to implement CDC to process only new inserts updates and deletes instead of loading the full dataset each day.

What is the best approach in Fabric to implement CDC
Should I load data into a staging table first and then detect changes or is there a native CDC approach

Thank you for your help !
🙂

Re: CDC ( incremental load ) in fabric pipline - help

tayloramy — Mon, 16 Mar 2026 13:35:13 GMT

Hi @kaouter,

This all depends on the capabilities of your API.

Does the API return a watermark column like a last updated datetime?
Does the API allow you to filter results on that column?

If yes, then you can keep track of the last time you ran your pipeline to pull the data in a metadata table somewhere, and then only pull data that has changed since the last time your pipeline started. From there you can upsert or merge into your tables in Fabric.

Re: CDC ( incremental load ) in fabric pipline - help

deborshi_nag — Mon, 16 Mar 2026 14:42:50 GMT

Hello @kaouter

If your source is a REST API, then true CDC is not available in Microsoft Fabric, because CDC relies on database transaction logs. However, you can implement an industry-standard incremental ingestion pattern that achieves the same outcome.

1. Ingest incrementally from the API (source-side filtering)

If the API supports it, use:

A lastModified, updatedAt, or similar timestamp
Or a monotonically increasing ID

In Fabric, this can be implemented using:

Copy Data activity with a REST connector, or
Notebook-based ingestion (for complex pagination or auth)

You store and reuse a watermark value (last successful timestamp or ID) between runs to fetch only new or changed records.

2. Land data in a Bronze (staging) area in the Lakehouse

Store the raw API responses as JSON or Delta
This provides:
- Replayability
- Schema evolution handling
- Auditability (industry best practice)

This staging step is strongly recommended in modern lakehouse architectures.

3. Apply changes using MERGE (CDC-style processing)

Once data is in the Lakehouse:

Use Spark / SQL MERGE INTO on Delta tables to:
- Insert new records
- Update changed records
- Handle deletes (if the API provides delete indicators)

Delta Lake’s MERGE operation is the standard mechanism for CDC-style processing in Fabric Lakehouses.

4. Handling deletes (often missed)

Industry best practice for APIs:

If the API provides:
- A deleted flag > soft delete
- Or delete events > propagate deletes via MERGE
If not:
- Periodic reconciliation or snapshot comparison may be required

Fabric does not automatically detect deletes for APIs—this must be handled explicitly in your logic.

Re: CDC ( incremental load ) in fabric pipline - help

v-hashadapu — Fri, 20 Mar 2026 08:41:36 GMT

Hi @kaouter , Thank you for reaching out to the Microsoft Community Forum.

We find the answer shared by @deborshi_nag is appropriate. Can you please confirm if the solution worked for you. It will help others with similar issues find the answer easily.
Thank you @deborshi_nag for your valuable response.

Re: CDC ( incremental load ) in fabric pipline - help

v-hashadapu — Mon, 23 Mar 2026 09:01:12 GMT

Hi @kaouter , Hope you are doing well. Kindly let us know if the issue has been resolved or if further assistance is needed. Your input could be helpful to others in the community.

Re: CDC ( incremental load ) in fabric pipline - help

NaveenUpadhye — Mon, 30 Mar 2026 16:39:30 GMT

The best approach really depends on what capabilities your API provides.

First, you need to check whether the API returns a watermark field, such as a lastUpdatedDateTime, modifiedAt, or something similar. Also verify whether the API allows you to filter results based on that field, for example by passing a query parameter like ?updated_since=.

If the API supports both of these, then you can implement CDC very efficiently. You simply store the timestamp of the last successful pipeline run in a metadata table, and on each new run, you call the API using that timestamp to retrieve only the records that were inserted, updated, or deleted since the last run. From there, you can load the incremental results into a staging table and perform an upsert/merge operation into your Fabric tables.

This avoids reloading the full dataset every day and lets you process only the changed data.