Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Alaahady
Helper II
Helper II

Best Incremental Refresh Strategy in Fabric for On-Prem SQL Server Source with Updateable Records

Hi Fabric Community,

I'm currently working on optimizing an incremental refresh strategy in Microsoft Fabric and would appreciate your insights on the best combination of tools and destinations.

📌 Scenario Overview:

  • Source: On-premises SQL Server
  • Table Structure: Contains ID and last_update_date columns
  • Refresh Logic:
    • New records have unique IDs
    • Updated records have last_update_date within the last 2 months
    • I want to refresh only records where last_update_date > MAX(last_update_date) in the destination

Questions:

  1. Which ingestion method is best suited for this use case?

    • Copy Activity in Pipeline
    • Dataflow Gen2
    • Copy Job
    • Notebook with MERGE logic
  2. Which destination is most appropriate for incremental refresh and reporting?

    • Lakehouse (currently not supported) 
    • Warehouse
    • SQL Database in Fabric
  3. Do I need to stage the data in a Bronze layer first and then merge into a Gold layer?

    • If so, should this be done via Notebooks or Dataflows?
  4. Is there a recommended pattern for handling updates efficiently (e.g., using watermark tables, CDC, or timestamp filters)?

🎯 Goal:

Achieve a reliable and scalable incremental refresh setup that supports:

  • Efficient upserts (insert/update)
  • Minimal compute usage
  • Compatibility with reporting tools (Power BI, SQL endpoints)

Any best practices, architecture suggestions, or lessons learned from similar implementations would be greatly appreciated.

Thanks in advance!

Alaa Abdulhady

1 ACCEPTED SOLUTION

Hi @Alaahady,

Hope you had a chance to try out the solution shared earlier. Let us know if anything needs further clarification or if there's an update from your side always here to help.

Thank you.

View solution in original post

5 REPLIES 5
Alaahady
Helper II
Helper II

Hi,

Thank you all for your support. I was able to resolve the issue by following these steps:

  1. Created a Dataflow Gen2 to build the gold fact table in the Lakehouse.
  2. Created another Dataflow Gen2 to generate the silver table, filtered to include only the last 7 days of data.
  3. Developed a notebook to merge the silver data into the gold table.
  4. Built a pipeline that includes steps 2 and 3, and configured a schedule to automate the process.

 

here is the notebook code:

from delta.tables import DeltaTable
from pyspark.sql import SparkSession

# Start Spark session
spark = SparkSession.builder.getOrCreate()

# Load source and target tables (single-part names only)
source_df = spark.read.table("fact_survey_detail_silver")
target_table = DeltaTable.forName(spark, "fact_survey_detail")

# Merge: update matching records and insert new ones
target_table.alias("gold").merge(
    source=source_df.alias("silver"),
    condition="gold.id = silver.id"
).whenMatchedUpdateAll() \
 .whenNotMatchedInsertAll() \
 .execute()
 
I hope that can help other 
v-kpoloju-msft
Community Support
Community Support

Hi @Alaahady

Thank you for reaching out to the Microsoft Fabric Community Forum. Also, thanks to @tayloramy, for his inputs on this thread.

This is a common pattern when working with on-prem SQL Server sources and incremental refresh in Fabric. Here’s a practical approach you can consider:

Ingest only changed rows: Use a Copy Activity or Copy Job via a self-hosted Integration Runtime to pull only rows where last_update_date is greater than the latest value in your destination (watermark). This keeps your ingestion lightweight and avoids unnecessary full-table copies.

If you can enable Change Data Capture (CDC) on SQL Server, it’s even more robust for inserts/updates/deletes.
Landing zone (Bronze layer): Store the delta rows in a Bronze Delta Lakehouse table. Keep this append-only; each pipeline run writes only the new/updated rows.

Upsert into Gold layer: Use a Notebook (Spark) or Dataflow Gen2 to perform a MERGE from Bronze into your Gold Delta table. Match on ID and compare last_update_date to handle inserts/updates atomically.

Expose Gold for reporting: Use the Lakehouse SQL endpoint for smaller concurrency reports. Use a Warehouse if you have high-concurrency or interactive Power BI users.

Refer these links:
1. https://learn.microsoft.com/en-us/fabric/data-factory/dataflow-gen2-incremental-refresh 
2. https://learn.microsoft.com/en-us/fabric/fundamentals/decision-guide-lakehouse-warehouse 
3. https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview 
4. https://learn.microsoft.com/en-us/azure/databricks/delta/merge 

Hope this clears it up. Let us know if you have any doubts regarding this. We will be happy to help.

Thank you for using the Microsoft Fabric Community Forum.

Hi @Alaahady,

Just checking in to see if the issue has been resolved on your end. If the earlier suggestions helped, that’s great to hear! And if you’re still facing challenges, feel free to share more details happy to assist further.

Thank you.

Hi @Alaahady,

Hope you had a chance to try out the solution shared earlier. Let us know if anything needs further clarification or if there's an update from your side always here to help.

Thank you.

tayloramy
Community Champion
Community Champion

Hi @Alaahady,

You’re dealing with the classic “insert + update” problem from an on-prem SQL Server. The trick is to land only what changed, then upsert efficiently, and serve for reporting. Here’s what I’ve found works consistently.

Quick solution

  • Ingestion method:
    Use Dataflow Gen2 (incremental refresh on last_update_date) or Pipeline Copy Activity (incremental mode) through the on-premises data gateway to land just the last N days/weeks of changes into a staging table.
    Docs: Dataflow Gen2 Incremental Refresh, On-premises data (gateway), overview pattern from ADF: Incremental copy (watermark).
  • Destination for serving/reporting:
    Prefer Warehouse (good T-SQL surface, easy PBI connectivity). MERGE is supported (preview), which enables straight T-SQL upserts. See: T-SQL surface area (Warehouse).
  • Bronze > Gold recommended:
    Yes. Land delta rows into Bronze (staging), then MERGE into Gold (reporting) with keys + last_update_date. The MERGE can be a Warehouse stored proc or a Spark notebook MERGE (if you choose Lakehouse Delta tables).
  • Efficient update handling:
    Use a watermark table that stores the last successful last_update_date. Your ingestion filters > watermark at source. After a successful upsert, bump the watermark. For sources that support it, SQL Server Change Tracking also works well; pattern reference (ADF): Change Tracking incremental pattern.


If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

Helpful resources

Announcements
December Fabric Update Carousel

Fabric Monthly Update - December 2025

Check out the December 2025 Fabric Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Users online (3,623)