Solved: Need Guidance Solid Structure for Incremental RES...

Om_Bhartiya · ‎03-03-2026

Hello everyone,

I’m currently building an ingestion pipeline in Microsoft Fabric and I’d like to know if anyone has already implemented something similar successfully.

What I’m trying to achieve

I need a solid, production-ready structure for:

Incremental loading from a REST API (using epoch timestamps)
Handling dynamic pagination (nextPage from API response)
Loading raw data into Lakehouse
Automatically handling schema changes
Performing MERGE (upsert) into a Warehouse
Updating a watermark table after success

The API:

Requires startDate and endDate for incremental filtering
Uses curPage and returns nextPage
Returns nested JSON that must be flattened

I currently:

Use a Web activity to read nextPage
Use an Until loop for pagination
Copy data into Lakehouse with OverwriteSchema
Run schema sync script before MERGE
Then update watermark

It works, but I want to confirm:

Has anyone built a stable architecture like this in Fabric?
Is this the recommended approach?
Is there a cleaner way to handle pagination and incremental logic?

I’m looking for a proven, solid structure before moving this into production.

Any guidance, examples, or best practices would be greatly appreciated.

Thank you.

MohdZaid_ · ‎03-04-2026

Hey @Om_Bhartiya ,

What you’re building is absolutely valid and yes, this pattern is commonly implemented in Microsoft Fabric. Your current approach is technically correct, but for production you can make it cleaner, more scalable, and easier to maintain.

Below is a solid production-ready architecture pattern that many teams use successfully.

Recommended Production Architecture (Medallion Style)

Control Layer (Watermark + Metadata Driven)
Instead of hardcoding logic inside activities:
Maintain a Watermark table in Warehouse
- source_name
- last_success_epoch
- last_run_time
- status
Read watermark at pipeline start
Calculate:
- startDate = last_success_epoch
- endDate = current_epoch

This keeps everything restartable and idempotent.

Ingestion Layer (Bronze – Raw Landing in Lakehouse)

For REST pagination + incremental:

Recommended Structure - Instead of complex Web + Until orchestration:

One Until loop
Inside loop:
- Copy Activity (REST source)
- Pass curPage
- Set variable = nextPage

Stop condition:

@equals(variables('nextPage'), null)

✔ Keep pagination logic entirely inside the loop
✔ Do NOT mix schema handling here
✔ Always land raw JSON unchanged

Load into:

Bronze table (append only)
Partition by ingestion_date

This gives you replay capability.

Schema Handling (Do NOT rely only on OverwriteSchema)

OverwriteSchema works, but in production it can break downstream models.

Better pattern:

Land raw JSON as string (Variant column or JSON column)
Flatten in Notebook (Spark)
Use: df = spark.read.json(...)
Enable: spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

This allows controlled schema evolution in Delta tables.

Silver Layer (Flatten + Normalize)

Use a Notebook or Dataflow Gen2 to:

Flatten nested JSON
Standardize column names
Cast data types
Remove duplicates (based on business key + timestamp)

Store cleaned table in Lakehouse (Silver).

Warehouse MERGE (Upsert Pattern)

Now perform MERGE from Silver → Warehouse.

Use pattern:

MERGE INTO target t
USING source s
ON t.id = s.id
WHEN MATCHED AND s.updated_at > t.updated_at THEN UPDATE
WHEN NOT MATCHED THEN INSERT

Key best practices:

Always compare timestamps
Never blindly overwrite
Keep it idempotent

Watermark Update (ONLY After Success)

Very important:

Update watermark only if:

Pagination completed
Merge completed
No failure in pipeline

Wrap merge + watermark update in:

Stored Procedure
OR
Sequential Script activities

If failure occurs → watermark must NOT update.

Final Production-Ready Structure

Pipeline Flow:

Get Watermark
Set start/end epoch
Until Loop (REST pagination → Bronze append)
Notebook (Flatten + Deduplicate → Silver)
MERGE Silver → Warehouse
Update Watermark
Log success/failure

This is a very common enterprise pattern in Fabric.

If this explanation helped, please mark it as the solution so others can find it easily.

If it helped, a quick Kudos is always appreciated it highlights useful answers for the community.

Thanks for being part of the discussion!

View solution in original post

Om_Bhartiya · ‎03-09-2026

Thank you so much for the detailed explanation

The way you broke down the production‑ready architecture, especially the separation into watermark control, Bronze/Silver layers, and the clean MERGE pattern, really clarified the best practices for me. I also appreciate the emphasis on keeping pagination logic isolated and handling schema evolution safely through notebooks instead of relying only on OverwriteSchema.

This gives me a much clearer and more reliable structure to move forward with before taking the pipeline to production. I truly appreciate the time and depth you put into the explanation — it helped a lot!

Thanks again!

v-sgandrathi · ‎03-09-2026

HI @Om_Bhartiya,

we haven't heard back from you regarding our last response and wanted to check if your issue has been resolved.

Should you have any further questions, feel free to reach out.
Thank you for being a part of the Microsoft Fabric Community Forum!

v-sgandrathi · ‎03-04-2026

Hi @Om_Bhartiya,

Your current approach aligns well with established best practices for building REST API ingestion pipelines in Microsoft Fabric. In production settings, it's common to use a pipeline that reads the watermark (last successful load timestamp) from a control table, constructs API requests with startDate and endDate parameters for incremental extraction, and handles pagination with a Web activity and an Until loop. This loop continues until the API no longer provides a nextPage value, and each page is ingested into the Lakehouse as the raw/bronze layer, often using schema drift or overwrite options to manage schema changes.

For nested JSON, it's best to land the raw data first and then use notebooks or Dataflows Gen2 for flattening and transformations before moving it to curated layers. Incremental updates are typically managed with a MERGE (upsert) operation into the Warehouse or Delta tables, and the watermark table is updated after successful pipeline completion to support ongoing incremental loads.

This architecture -- watermark control -- incremental API call -- pagination loop → raw Lakehouse ingestion -- schema handling -- MERGE into Warehouse -- watermark update, is widely used and effective in Fabric. Separating the pipeline into Bronze, Silver, and Gold layers can enhance scalability and maintainability. Adding retry policies, logging, and error handling can also improve production readiness.

Thank you.

Om_Bhartiya · ‎03-09-2026

If it's okay, could I share my current pipeline structure here?
Since I’m a fresher and still learning, I’d love to get your feedback on whether I’m implementing things correctly or if there are improvements I should make before taking it to production.

MohdZaid_ · ‎03-04-2026

Hey @Om_Bhartiya ,

What you’re building is absolutely valid and yes, this pattern is commonly implemented in Microsoft Fabric. Your current approach is technically correct, but for production you can make it cleaner, more scalable, and easier to maintain.

Below is a solid production-ready architecture pattern that many teams use successfully.

Recommended Production Architecture (Medallion Style)

Control Layer (Watermark + Metadata Driven)
Instead of hardcoding logic inside activities:
Maintain a Watermark table in Warehouse
- source_name
- last_success_epoch
- last_run_time
- status
Read watermark at pipeline start
Calculate:
- startDate = last_success_epoch
- endDate = current_epoch

This keeps everything restartable and idempotent.

Ingestion Layer (Bronze – Raw Landing in Lakehouse)

For REST pagination + incremental:

Recommended Structure - Instead of complex Web + Until orchestration:

One Until loop
Inside loop:
- Copy Activity (REST source)
- Pass curPage
- Set variable = nextPage

Stop condition:

@equals(variables('nextPage'), null)

✔ Keep pagination logic entirely inside the loop
✔ Do NOT mix schema handling here
✔ Always land raw JSON unchanged

Load into:

Bronze table (append only)
Partition by ingestion_date

This gives you replay capability.

Schema Handling (Do NOT rely only on OverwriteSchema)

OverwriteSchema works, but in production it can break downstream models.

Better pattern:

Land raw JSON as string (Variant column or JSON column)
Flatten in Notebook (Spark)
Use: df = spark.read.json(...)
Enable: spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

This allows controlled schema evolution in Delta tables.

Silver Layer (Flatten + Normalize)

Use a Notebook or Dataflow Gen2 to:

Flatten nested JSON
Standardize column names
Cast data types
Remove duplicates (based on business key + timestamp)

Store cleaned table in Lakehouse (Silver).

Warehouse MERGE (Upsert Pattern)

Now perform MERGE from Silver → Warehouse.

Use pattern:

MERGE INTO target t
USING source s
ON t.id = s.id
WHEN MATCHED AND s.updated_at > t.updated_at THEN UPDATE
WHEN NOT MATCHED THEN INSERT

Key best practices:

Always compare timestamps
Never blindly overwrite
Keep it idempotent

Watermark Update (ONLY After Success)

Very important:

Update watermark only if:

Pagination completed
Merge completed
No failure in pipeline

Wrap merge + watermark update in:

Stored Procedure
OR
Sequential Script activities

If failure occurs → watermark must NOT update.

Final Production-Ready Structure

Pipeline Flow:

Get Watermark
Set start/end epoch
Until Loop (REST pagination → Bronze append)
Notebook (Flatten + Deduplicate → Silver)
MERGE Silver → Warehouse
Update Watermark
Log success/failure

This is a very common enterprise pattern in Fabric.

If this explanation helped, please mark it as the solution so others can find it easily.

If it helped, a quick Kudos is always appreciated it highlights useful answers for the community.

Thanks for being part of the discussion!

Need Guidance Solid Structure for Incremental REST API Pipeline with Pagination in Fabric

What I’m trying to achieve

Helpful resources

Fabric Data Days 2026

Power BI Monthly Update - May 2026

Power BI DataViz World Championships - June 2026

Get Fabric or SQL Certified for Free.

Need Guidance Solid Structure for Incremental REST API Pipeline with Pagination in Fabric

What I’m trying to achieve

Helpful resources

Fabric Data Days 2026

Power BI Monthly Update - May 2026

Power BI DataViz World Championships - June 2026