Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the FabCon + SQLCon recap series. Up next: Power BI, Real-Time Intelligence, IQ and AI, and Data Factory take center stage. All sessions are available on-demand after the live show. Register now

Reply
kaouter
Frequent Visitor

CDC ( incremental load ) in fabric pipline - help

Hello

I am learning Microsoft Fabric during my data engineering training at Simplon.
I am building a pipeline that extracts data from an API every day using Fabric Data Factory.

I want to implement CDC to process only new inserts updates and deletes instead of loading the full dataset each day.

What is the best approach in Fabric to implement CDC
Should I load data into a staging table first and then detect changes or is there a native CDC approach

Thank you for your help ! 
🙂

1 ACCEPTED SOLUTION
deborshi_nag
Resident Rockstar
Resident Rockstar

Hello @kaouter 

 

If your source is a REST API, then true CDC is not available in Microsoft Fabric, because CDC relies on database transaction logs. However, you can implement an industry-standard incremental ingestion pattern that achieves the same outcome.

 

1. Ingest incrementally from the API (source-side filtering)

If the API supports it, use:

  • A lastModified, updatedAt, or similar timestamp
  • Or a monotonically increasing ID

In Fabric, this can be implemented using:

  • Copy Data activity with a REST connector, or
  • Notebook-based ingestion (for complex pagination or auth)

You store and reuse a watermark value (last successful timestamp or ID) between runs to fetch only new or changed records.

 

2. Land data in a Bronze (staging) area in the Lakehouse

  • Store the raw API responses as JSON or Delta
  • This provides:
    • Replayability
    • Schema evolution handling
    • Auditability (industry best practice)

This staging step is strongly recommended in modern lakehouse architectures. 

 

3. Apply changes using MERGE (CDC-style processing)

Once data is in the Lakehouse:

  • Use Spark / SQL MERGE INTO on Delta tables to:
    • Insert new records
    • Update changed records
    • Handle deletes (if the API provides delete indicators)

Delta Lake’s MERGE operation is the standard mechanism for CDC-style processing in Fabric Lakehouses. 

 

4. Handling deletes (often missed)

Industry best practice for APIs:

  • If the API provides:
    • A deleted flag > soft delete
    • Or delete events > propagate deletes via MERGE
  • If not:
    • Periodic reconciliation or snapshot comparison may be required

Fabric does not automatically detect deletes for APIs—this must be handled explicitly in your logic.

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

View solution in original post

5 REPLIES 5
NaveenUpadhye
Advocate I
Advocate I

The best approach really depends on what capabilities your API provides.

First, you need to check whether the API returns a watermark field, such as a lastUpdatedDateTime, modifiedAt, or something similar. Also verify whether the API allows you to filter results based on that field, for example by passing a query parameter like ?updated_since=.

If the API supports both of these, then you can implement CDC very efficiently. You simply store the timestamp of the last successful pipeline run in a metadata table, and on each new run, you call the API using that timestamp to retrieve only the records that were inserted, updated, or deleted since the last run. From there, you can load the incremental results into a staging table and perform an upsert/merge operation into your Fabric tables.

This avoids reloading the full dataset every day and lets you process only the changed data.

v-hashadapu
Community Support
Community Support

Hi @kaouter , Hope you are doing well. Kindly let us know if the issue has been resolved or if further assistance is needed. Your input could be helpful to others in the community.

v-hashadapu
Community Support
Community Support

Hi @kaouter , Thank you for reaching out to the Microsoft Community Forum.

 

We find the answer shared by @deborshi_nag  is appropriate. Can you please confirm if the solution worked for you. It will help others with similar issues find the answer easily.
Thank you @deborshi_nag  for your valuable response.

deborshi_nag
Resident Rockstar
Resident Rockstar

Hello @kaouter 

 

If your source is a REST API, then true CDC is not available in Microsoft Fabric, because CDC relies on database transaction logs. However, you can implement an industry-standard incremental ingestion pattern that achieves the same outcome.

 

1. Ingest incrementally from the API (source-side filtering)

If the API supports it, use:

  • A lastModified, updatedAt, or similar timestamp
  • Or a monotonically increasing ID

In Fabric, this can be implemented using:

  • Copy Data activity with a REST connector, or
  • Notebook-based ingestion (for complex pagination or auth)

You store and reuse a watermark value (last successful timestamp or ID) between runs to fetch only new or changed records.

 

2. Land data in a Bronze (staging) area in the Lakehouse

  • Store the raw API responses as JSON or Delta
  • This provides:
    • Replayability
    • Schema evolution handling
    • Auditability (industry best practice)

This staging step is strongly recommended in modern lakehouse architectures. 

 

3. Apply changes using MERGE (CDC-style processing)

Once data is in the Lakehouse:

  • Use Spark / SQL MERGE INTO on Delta tables to:
    • Insert new records
    • Update changed records
    • Handle deletes (if the API provides delete indicators)

Delta Lake’s MERGE operation is the standard mechanism for CDC-style processing in Fabric Lakehouses. 

 

4. Handling deletes (often missed)

Industry best practice for APIs:

  • If the API provides:
    • A deleted flag > soft delete
    • Or delete events > propagate deletes via MERGE
  • If not:
    • Periodic reconciliation or snapshot comparison may be required

Fabric does not automatically detect deletes for APIs—this must be handled explicitly in your logic.

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.
tayloramy
Super User
Super User

Hi @kaouter

 

This all depends on the capabilities of your API. 

Does the API return a watermark column like a last updated datetime? 
Does the API allow you to filter results on that column? 

If yes, then you can keep track of the last time you ran your pipeline to pull the data in a metadata table somewhere, and then only pull data that has changed since the last time your pipeline started. From there you can upsert or merge into your tables in Fabric.  





If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Join the Fabric Discord!

Proud to be a Super User!





Helpful resources

Announcements
FabCon and SQLCon Highlights Carousel

FabCon &SQLCon Highlights

Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

March Fabric Update Carousel

Fabric Monthly Update - March 2026

Check out the March 2026 Fabric update to learn about new features.