Solved: Best Incremental Refresh Strategy in Fabric for On...

Alaahady · ‎09-29-2025

Hi Fabric Community,

I'm currently working on optimizing an incremental refresh strategy in Microsoft Fabric and would appreciate your insights on the best combination of tools and destinations.

📌 Scenario Overview:

Source: On-premises SQL Server
Table Structure: Contains ID and last_update_date columns
Refresh Logic:
- New records have unique IDs
- Updated records have last_update_date within the last 2 months
- I want to refresh only records where last_update_date > MAX(last_update_date) in the destination

❓ Questions:

Which ingestion method is best suited for this use case?
- Copy Activity in Pipeline
- Dataflow Gen2
- Copy Job
- Notebook with MERGE logic
Which destination is most appropriate for incremental refresh and reporting?
- Lakehouse (currently not supported)
- Warehouse
- SQL Database in Fabric
Do I need to stage the data in a Bronze layer first and then merge into a Gold layer?
- If so, should this be done via Notebooks or Dataflows?
Is there a recommended pattern for handling updates efficiently (e.g., using watermark tables, CDC, or timestamp filters)?

🎯 Goal:

Achieve a reliable and scalable incremental refresh setup that supports:

Efficient upserts (insert/update)
Minimal compute usage
Compatibility with reporting tools (Power BI, SQL endpoints)

Any best practices, architecture suggestions, or lessons learned from similar implementations would be greatly appreciated.

Thanks in advance!

Alaa Abdulhady

v-kpoloju-msft · ‎10-05-2025

Hi @Alaahady,

Hope you had a chance to try out the solution shared earlier. Let us know if anything needs further clarification or if there's an update from your side always here to help.

Thank you.

View solution in original post

Alaahady · ‎10-06-2025

Hi,

Thank you all for your support. I was able to resolve the issue by following these steps:

Created a Dataflow Gen2 to build the gold fact table in the Lakehouse.
Created another Dataflow Gen2 to generate the silver table, filtered to include only the last 7 days of data.
Developed a notebook to merge the silver data into the gold table.
Built a pipeline that includes steps 2 and 3, and configured a schedule to automate the process.

here is the notebook code:

from delta.tables import DeltaTable

from pyspark.sql import SparkSession

# Start Spark session

spark = SparkSession.builder.getOrCreate()

# Load source and target tables (single-part names only)

source_df = spark.read.table("fact_survey_detail_silver")

target_table = DeltaTable.forName(spark, "fact_survey_detail")

# Merge: update matching records and insert new ones

target_table.alias("gold").merge(

source=source_df.alias("silver"),

condition="gold.id = silver.id"

).whenMatchedUpdateAll() \

.whenNotMatchedInsertAll() \

.execute()

I hope that can help other

v-kpoloju-msft · ‎09-30-2025

Hi @Alaahady,

Thank you for reaching out to the Microsoft Fabric Community Forum. Also, thanks to @tayloramy, for his inputs on this thread.

This is a common pattern when working with on-prem SQL Server sources and incremental refresh in Fabric. Here’s a practical approach you can consider:

Ingest only changed rows: Use a Copy Activity or Copy Job via a self-hosted Integration Runtime to pull only rows where last_update_date is greater than the latest value in your destination (watermark). This keeps your ingestion lightweight and avoids unnecessary full-table copies.

If you can enable Change Data Capture (CDC) on SQL Server, it’s even more robust for inserts/updates/deletes.
Landing zone (Bronze layer): Store the delta rows in a Bronze Delta Lakehouse table. Keep this append-only; each pipeline run writes only the new/updated rows.

Upsert into Gold layer: Use a Notebook (Spark) or Dataflow Gen2 to perform a MERGE from Bronze into your Gold Delta table. Match on ID and compare last_update_date to handle inserts/updates atomically.

Expose Gold for reporting: Use the Lakehouse SQL endpoint for smaller concurrency reports. Use a Warehouse if you have high-concurrency or interactive Power BI users.

Refer these links:
1. https://learn.microsoft.com/en-us/fabric/data-factory/dataflow-gen2-incremental-refresh
2. https://learn.microsoft.com/en-us/fabric/fundamentals/decision-guide-lakehouse-warehouse
3. https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview
4. https://learn.microsoft.com/en-us/azure/databricks/delta/merge

Hope this clears it up. Let us know if you have any doubts regarding this. We will be happy to help.

Thank you for using the Microsoft Fabric Community Forum.

v-kpoloju-msft · ‎10-02-2025

Hi @Alaahady,

Just checking in to see if the issue has been resolved on your end. If the earlier suggestions helped, that’s great to hear! And if you’re still facing challenges, feel free to share more details happy to assist further.

Thank you.

v-kpoloju-msft · ‎10-05-2025

Hi @Alaahady,

Hope you had a chance to try out the solution shared earlier. Let us know if anything needs further clarification or if there's an update from your side always here to help.

Thank you.

tayloramy · ‎09-29-2025

Hi @Alaahady,

You’re dealing with the classic “insert + update” problem from an on-prem SQL Server. The trick is to land only what changed, then upsert efficiently, and serve for reporting. Here’s what I’ve found works consistently.

Quick solution

Ingestion method:
Use Dataflow Gen2 (incremental refresh on last_update_date) or Pipeline Copy Activity (incremental mode) through the on-premises data gateway to land just the last N days/weeks of changes into a staging table.
Docs: Dataflow Gen2 Incremental Refresh, On-premises data (gateway), overview pattern from ADF: Incremental copy (watermark).
Destination for serving/reporting:
Prefer Warehouse (good T-SQL surface, easy PBI connectivity). MERGE is supported (preview), which enables straight T-SQL upserts. See: T-SQL surface area (Warehouse).
Bronze > Gold recommended:
Yes. Land delta rows into Bronze (staging), then MERGE into Gold (reporting) with keys + last_update_date. The MERGE can be a Warehouse stored proc or a Spark notebook MERGE (if you choose Lakehouse Delta tables).
Efficient update handling:
Use a watermark table that stores the last successful last_update_date. Your ingestion filters > watermark at source. After a successful upsert, bump the watermark. For sources that support it, SQL Server Change Tracking also works well; pattern reference (ADF): Change Tracking incremental pattern.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

Best Incremental Refresh Strategy in Fabric for On-Prem SQL Server Source with Updateable Records

📌 Scenario Overview:

❓ Questions:

🎯 Goal:

Quick solution

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Best Incremental Refresh Strategy in Fabric for On-Prem SQL Server Source with Updateable Records

📌 Scenario Overview:

❓ Questions:

🎯 Goal:

Quick solution

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026