Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Learn from the best! Meet the four finalists headed to the FINALS of the Power BI Dataviz World Championships! Register now

Reply
Om_Bhartiya
Frequent Visitor

Need Guidance Solid Structure for Incremental REST API Pipeline with Pagination in Fabric

Hello everyone,

I’m currently building an ingestion pipeline in Microsoft Fabric and I’d like to know if anyone has already implemented something similar successfully.

What I’m trying to achieve

I need a solid, production-ready structure for:

  • Incremental loading from a REST API (using epoch timestamps)

  • Handling dynamic pagination (nextPage from API response)

  • Loading raw data into Lakehouse

  • Automatically handling schema changes

  • Performing MERGE (upsert) into a Warehouse

  • Updating a watermark table after success

The API:

  • Requires startDate and endDate for incremental filtering

  • Uses curPage and returns nextPage

  • Returns nested JSON that must be flattened

I currently:

  • Use a Web activity to read nextPage

  • Use an Until loop for pagination

  • Copy data into Lakehouse with OverwriteSchema

  • Run schema sync script before MERGE

  • Then update watermark

It works, but I want to confirm:

Has anyone built a stable architecture like this in Fabric?
Is this the recommended approach?
Is there a cleaner way to handle pagination and incremental logic?

I’m looking for a proven, solid structure before moving this into production.

Any guidance, examples, or best practices would be greatly appreciated.

Thank you.

1 ACCEPTED SOLUTION

Hey @Om_Bhartiya  , 

 

What you’re building is absolutely valid and yes, this pattern is commonly implemented in Microsoft Fabric. Your current approach is technically correct, but for production you can make it cleaner, more scalable, and easier to maintain.

 

Below is a solid production-ready architecture pattern that many teams use successfully.

 

Recommended Production Architecture (Medallion Style)

 

  • Control Layer (Watermark + Metadata Driven)
  • Instead of hardcoding logic inside activities:
  • Maintain a Watermark table in Warehouse
    • source_name
    • last_success_epoch
    • last_run_time
    • status
  • Read watermark at pipeline start
  • Calculate:
    • startDate = last_success_epoch
    • endDate = current_epoch

This keeps everything restartable and idempotent.

 

Ingestion Layer (Bronze – Raw Landing in Lakehouse)

For REST pagination + incremental:

Recommended Structure - Instead of complex Web + Until orchestration:

 

  • One Until loop
  • Inside loop:
    • Copy Activity (REST source)
    • Pass curPage
    • Set variable = nextPage

Stop condition:

@equals(variables('nextPage'), null)

 

✔ Keep pagination logic entirely inside the loop
✔ Do NOT mix schema handling here
✔ Always land raw JSON unchanged

 

Load into:

  • Bronze table (append only)
  • Partition by ingestion_date

This gives you replay capability.

 

 

Schema Handling (Do NOT rely only on OverwriteSchema)

OverwriteSchema works, but in production it can break downstream models.

Better pattern:

  • Land raw JSON as string (Variant column or JSON column)
  • Flatten in Notebook (Spark)
  • Use: df = spark.read.json(...)
  • Enable: spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

This allows controlled schema evolution in Delta tables.

 

Silver Layer (Flatten + Normalize)

Use a Notebook or Dataflow Gen2 to:

  • Flatten nested JSON
  • Standardize column names
  • Cast data types
  • Remove duplicates (based on business key + timestamp)

Store cleaned table in Lakehouse (Silver).

 

 

Warehouse MERGE (Upsert Pattern)

Now perform MERGE from Silver → Warehouse.

Use pattern:

MERGE INTO target t
USING source s
ON t.id = s.id
WHEN MATCHED AND s.updated_at > t.updated_at THEN UPDATE
WHEN NOT MATCHED THEN INSERT

Key best practices:

  • Always compare timestamps
  • Never blindly overwrite
  • Keep it idempotent

 

Watermark Update (ONLY After Success)

Very important:

Update watermark only if:

  • Pagination completed
  • Merge completed
  • No failure in pipeline

Wrap merge + watermark update in:

  • Stored Procedure
    OR
  • Sequential Script activities

If failure occurs → watermark must NOT update.

 

Final Production-Ready Structure

Pipeline Flow:

  1. Get Watermark
  2. Set start/end epoch
  3. Until Loop (REST pagination → Bronze append)
  4. Notebook (Flatten + Deduplicate → Silver)
  5. MERGE Silver → Warehouse
  6. Update Watermark
  7. Log success/failure

This is a very common enterprise pattern in Fabric.

 

 

If this explanation helped, please mark it as the solution so others can find it easily.

If it helped, a quick Kudos is always appreciated it highlights useful answers for the community.

Thanks for being part of the discussion!

View solution in original post

5 REPLIES 5
Om_Bhartiya
Frequent Visitor

Thank you so much for the detailed explanation

The way you broke down the production‑ready architecture, especially the separation into watermark control, Bronze/Silver layers, and the clean MERGE pattern, really clarified the best practices for me. I also appreciate the emphasis on keeping pagination logic isolated and handling schema evolution safely through notebooks instead of relying only on OverwriteSchema.

This gives me a much clearer and more reliable structure to move forward with before taking the pipeline to production. I truly appreciate the time and depth you put into the explanation — it helped a lot!

Thanks again!

v-sgandrathi
Community Support
Community Support

HI @Om_Bhartiya,


we haven't heard back from you regarding our last response and wanted to check if your issue has been resolved.

Should you have any further questions, feel free to reach out.
Thank you for being a part of the Microsoft Fabric Community Forum!

v-sgandrathi
Community Support
Community Support

Hi @Om_Bhartiya,

 

Your current approach aligns well with established best practices for building REST API ingestion pipelines in Microsoft Fabric. In production settings, it's common to use a pipeline that reads the watermark (last successful load timestamp) from a control table, constructs API requests with startDate and endDate parameters for incremental extraction, and handles pagination with a Web activity and an Until loop. This loop continues until the API no longer provides a nextPage value, and each page is ingested into the Lakehouse as the raw/bronze layer, often using schema drift or overwrite options to manage schema changes.

For nested JSON, it's best to land the raw data first and then use notebooks or Dataflows Gen2 for flattening and transformations before moving it to curated layers. Incremental updates are typically managed with a MERGE (upsert) operation into the Warehouse or Delta tables, and the watermark table is updated after successful pipeline completion to support ongoing incremental loads.

This architecture -- watermark control -- incremental API call -- pagination loop → raw Lakehouse ingestion -- schema handling -- MERGE into Warehouse -- watermark update, is widely used and effective in Fabric. Separating the pipeline into Bronze, Silver, and Gold layers can enhance scalability and maintainability. Adding retry policies, logging, and error handling can also improve production readiness.

 

Thank you.

If it's okay, could I share my current pipeline structure here?
Since I’m a fresher and still learning, I’d love to get your feedback on whether I’m implementing things correctly or if there are improvements I should make before taking it to production.

Hey @Om_Bhartiya  , 

 

What you’re building is absolutely valid and yes, this pattern is commonly implemented in Microsoft Fabric. Your current approach is technically correct, but for production you can make it cleaner, more scalable, and easier to maintain.

 

Below is a solid production-ready architecture pattern that many teams use successfully.

 

Recommended Production Architecture (Medallion Style)

 

  • Control Layer (Watermark + Metadata Driven)
  • Instead of hardcoding logic inside activities:
  • Maintain a Watermark table in Warehouse
    • source_name
    • last_success_epoch
    • last_run_time
    • status
  • Read watermark at pipeline start
  • Calculate:
    • startDate = last_success_epoch
    • endDate = current_epoch

This keeps everything restartable and idempotent.

 

Ingestion Layer (Bronze – Raw Landing in Lakehouse)

For REST pagination + incremental:

Recommended Structure - Instead of complex Web + Until orchestration:

 

  • One Until loop
  • Inside loop:
    • Copy Activity (REST source)
    • Pass curPage
    • Set variable = nextPage

Stop condition:

@equals(variables('nextPage'), null)

 

✔ Keep pagination logic entirely inside the loop
✔ Do NOT mix schema handling here
✔ Always land raw JSON unchanged

 

Load into:

  • Bronze table (append only)
  • Partition by ingestion_date

This gives you replay capability.

 

 

Schema Handling (Do NOT rely only on OverwriteSchema)

OverwriteSchema works, but in production it can break downstream models.

Better pattern:

  • Land raw JSON as string (Variant column or JSON column)
  • Flatten in Notebook (Spark)
  • Use: df = spark.read.json(...)
  • Enable: spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

This allows controlled schema evolution in Delta tables.

 

Silver Layer (Flatten + Normalize)

Use a Notebook or Dataflow Gen2 to:

  • Flatten nested JSON
  • Standardize column names
  • Cast data types
  • Remove duplicates (based on business key + timestamp)

Store cleaned table in Lakehouse (Silver).

 

 

Warehouse MERGE (Upsert Pattern)

Now perform MERGE from Silver → Warehouse.

Use pattern:

MERGE INTO target t
USING source s
ON t.id = s.id
WHEN MATCHED AND s.updated_at > t.updated_at THEN UPDATE
WHEN NOT MATCHED THEN INSERT

Key best practices:

  • Always compare timestamps
  • Never blindly overwrite
  • Keep it idempotent

 

Watermark Update (ONLY After Success)

Very important:

Update watermark only if:

  • Pagination completed
  • Merge completed
  • No failure in pipeline

Wrap merge + watermark update in:

  • Stored Procedure
    OR
  • Sequential Script activities

If failure occurs → watermark must NOT update.

 

Final Production-Ready Structure

Pipeline Flow:

  1. Get Watermark
  2. Set start/end epoch
  3. Until Loop (REST pagination → Bronze append)
  4. Notebook (Flatten + Deduplicate → Silver)
  5. MERGE Silver → Warehouse
  6. Update Watermark
  7. Log success/failure

This is a very common enterprise pattern in Fabric.

 

 

If this explanation helped, please mark it as the solution so others can find it easily.

If it helped, a quick Kudos is always appreciated it highlights useful answers for the community.

Thanks for being part of the discussion!

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

February Power BI Update Carousel

Power BI Monthly Update - February 2026

Check out the February 2026 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.