Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

View all the Fabric Data Days sessions on demand. View schedule

Shubham_rai955

Incremental Loading in Fabric Pipelines: Warehouse to Lakehouse

Why Incremental Loading?

 

Incremental loading moves only new or changed records instead of reprocessing entire datasets. Benefits include:

 

  • Reduced execution time for large fact tables
  • Lower compute costs
  • Minimal data transfer overhead
  • Efficient synchronization between systems

The watermark pattern uses a timestamp column to identify new records since the last pipeline execution.

 

Prerequisites

 

  • Warehouse and Lakehouse created in Fabric workspace
  • Source table FactImport exists in Warehouse with OrderDate column

 

Step 1: Create the Watermark Table

 

Create a tracking table in your Warehouse:

 
CREATE TABLE tblWaterMark (
    Watermark DATETIME2(6)
);

INSERT INTO tblWaterMark VALUES ('2024-01-01 00:00:00');

Note: Set the initial watermark to a date before your earliest data or when the last full load occurred.

 

1.png

 

Step 2: Verify Your Tables

 

Your Warehouse now contains:

  • FactImport - Source fact table
  • tblWaterMark - Tracking table

 

Step 3: Create the Update Procedure

 

Create a stored procedure to update the watermark after each successful load:

 
CREATE OR ALTER PROCEDURE updateWatermark @WaterMark DATETIME2(6) AS UPDATE tblWaterMark SET Watermark = @WaterMark

This procedure accepts a datetime parameter and persists the new high-water mark.

 

2.png

 

Step 4: Build the Pipeline

 

Configure four activities in your Fabric pipeline:

 

Activity 1: Lookup (Watermark table lookup)

 

  • Query tblWaterMark in the Warehouse
  • Returns the timestamp of the last successful load
  • Output: Watermark value

 

Activity 2: Lookup (Maximum Watermark)

 

  • Query FactImport to find the most recent OrderDate
  • Query: Select MAX(OrderDate) as LatestWatermark from FactImport
  • Determines upper boundary for current load
  • Output: LatestWatermark value

 

Activity 3: Copy Data

 

  • Source: Warehouse table FactImport
  • Destination: Lakehouse table
  • Source query:
 
Select * from FactImport WHERE OrderDate > '@{activity('Watermark table lookup').output.firstRow.Watermark}' and OrderDate <= '@{activity('Maximum Watermark').output.firstRow.LatestWatermark}'
  • Uses dynamic expressions to reference Lookup activity outputs
  • Copies only delta records between last watermark and current maximum

 

Activity 4: Stored Procedure

 

  • Execute updateWatermark procedure
  • Pass LatestWatermark value from Activity 2 as @WaterMark parameter
  • Updates tblWaterMark for next pipeline run

3.png

 

How It Works

 

Each pipeline execution:

  1. Reads the last processed timestamp
  2. Identifies the current maximum timestamp
  3. Copies records between these two points
  4. Updates the watermark for the next run

Subsequent runs load only new records added since the previous execution, making data movement efficient and scalable.

 
Comments