Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric certified for FREE! Don't miss your chance! Learn more

Native Data Branching and Merging in OneLake via Git Integration

The Problem:

Currently, Microsoft Fabric supports Git integration for workspace items (metadata). However, developers cannot "branch" the actual underlying data in OneLake. If a developer wants to test a new ETL logic, they must manually copy entire datasets to a "Dev" lakehouse, which is slow and consumes extra capacity.

The Idea:

Introduce Zero-Copy Data Branching directly within the Fabric UI. Using the power of Delta Lake’s "Time Travel," Fabric should allow users to create a "Data Branch" in a workspace.

1. Branching: A developer creates a "Dev Branch" of a Lakehouse. This creates a virtual pointer to the existing Parquet files without duplicating them.

2. Isolated Testing: The developer runs new Spark jobs or Pipelines on this branch. Only changed data is written to new files.

3. Merging: Once the code is verified, the user "merges" the data branch back to the "Main" Lakehouse, similar to a Git Pull Request for code.

Why this makes Fabric better:

DevOps Excellence: It brings true "Data-as-Code" to Fabric.

Cost Efficiency: Reduces the need to store multiple physical copies of data for development and testing.

Safety: Prevents accidental corruption of production data by allowing a full "Sandbox" environment that mirrors production data perfectly. 

Status: New