Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the FabCon + SQLCon recap series. Up next: Power BI, Real-Time Intelligence, IQ and AI, and Data Factory take center stage. All sessions are available on-demand after the live show. Register now

Reply
going_grey
Frequent Visitor

Stream On-Prem SQL Server Data

Hi

 

We want to stream data from an on-prem SQL Server database (cdc-enabled). We've initially looked at using an Evenstream to do this, but there isn't an available on-prem data source connector to use.

 

We're currently exploring the mirrored database item option where we've attempted to tap into the cdc functionality by using a spark job to stream changes from the mirrored database further downstream. However, this doesn't seem to be supported:

"[DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update (for example Merge (null)) in the source table at version xx. This is currently not supported"

I've found this idea that appears to be closest to what we're currently experiencing:

https://community.fabric.microsoft.com/t5/Fabric-Ideas/Enable-Change-Data-Feed-CDF-on-a-Mirror-Datab...

 

Is there any way to reliably stream data from a mirrored database currently? It feels like we're hitting a roadblock at every turn on this one.

1 ACCEPTED SOLUTION

Hi @going_grey,

 

You are correct, and sorry for your confusion, which is valid. Let me clarify this cleanly.

A Lakehouse cannot be used as a source for Eventstream.

Eventstream is designed for event-driven, push-based sources such as:

  • Azure Event Hubs

  • Kafka

  • IoT Hub

A Lakehouse, on the other hand, is:

  • Delta-based

  • Pull-oriented

  • Supports updates and merges

So there is no supported pattern like:
Lakehouse → Eventstream

Where the confusion comes from

In Option 1, the flow should be interpreted like this:

SQL Server (CDC) → Ingestion → Lakehouse → Processing (Spark / downstream)

Not:

Lakehouse → Eventstream

Once data lands in the Lakehouse, you are already past the streaming ingestion layer.

 

You mentioned using Copy Job with MERGE for CDC tables.

This introduces a constraint:

  • MERGE creates updates in Delta tables

  • Spark Structured Streaming expects append-only sources

  • This leads to errors like
    DELTA_SOURCE_TABLE_IGNORE_CHANGES

So even without mirroring, MERGE prevents true streaming.

Option A. Micro-batch processing (most stable)

  • Keep your current ingestion (CDC → Lakehouse via Copy Job)

  • Run a scheduled Spark notebook

  • Process incrementally using:

    • timestamp

    • LSN

    • watermark logic

This is the most reliable pattern in Fabric today.

Option B. Append-only landing (stream-compatible)

If streaming is important:

  • Land CDC data as append-only

  • Include operation type (Insert / Update / Delete)

  • Avoid MERGE in the landing layer

Then:

  • Spark readStream works correctly

  • No Delta change conflicts

You handle deduplication downstream.

Option C. True real-time streaming (Eventstream)

If you need near real-time:

  • SQL Server CDC → Event Hub (Debezium or custom capture)

  • Event Hub → Eventstream

  • Eventstream → Lakehouse

This pattern:

  • Preserves event semantics

  • Supports continuous streaming

  • Avoids Delta update limitations

Final takeaway

  • A mirrored database is not a streaming source

  • Lakehouse is not an Eventstream source

  • MERGE-based CDC ingestion blocks streaming scenarios

Let me know how it goes. I wish this would help to resolve your question.


Thank you!
Proud to be a Super User!
📩 Need more help?
✔️ Don’t forget to Accept as Solution if this guidance worked for you.
💛 Your Like motivates me to keep helping

View solution in original post

5 REPLIES 5
v-veshwara-msft
Community Support
Community Support

Hi @going_grey ,
We wanted to kindly follow up regarding your query. If you need any further assistance, please reach out.
Thank you.

v-veshwara-msft
Community Support
Community Support

Hi @going_grey ,
Thanks for reaching out to Microsoft Fabric Community.

Just wanted to check if the response provided by @MJParikh was helpful. If further assistance is needed, please reach out.
Thank you.

MJParikh
Super User
Super User

Hi @going_grey,

 

You are running into a real limitation, not a misconfiguration.

Fabric currently does not support your intended workflow: CDC → mirrored database → Spark streaming.

Mirrored databases land data as Delta tables. Those tables do not expose a true change feed in a way Spark Structured Streaming expects.

When you see:

DELTA_SOURCE_TABLE_IGNORE_CHANGES

It means the engine detected updates or merges in the source Delta table. Streaming reads in Fabric today expect append-only behavior unless change data feed is explicitly supported, which mirrored databases do not provide yet.

So your current approach will continue to fail.

 

Below are some options you can try.

Option 1. Go direct, skip mirroring
This is the most stable pattern today.

  • Use Azure Data Factory or Fabric Data Pipeline

  • Use SQL Server CDC as the source

  • Land data into Lakehouse (Delta) in append mode

  • Then use Eventstream or Spark on top of that

This avoids the Delta “change” limitation entirely.

Option 2. Use staging with append-only pattern
If you must stay close to Fabric:

  • Pull CDC changes into a staging table (append-only)

  • Do not use MERGE in that layer

  • Stream from that staging Delta table

You trade some modeling elegance for streaming compatibility.

Option 3. Event-based approach (closer to real-time)
For near real-time:

  • Use SQL Server CDC → Azure Event Hubs (via Debezium or custom capture)

  • Ingest into Fabric Eventstream

  • Land into Lakehouse

This is the only pattern that behaves like true streaming today.

Option 4. Mirroring, but batch, not streaming
If you want to keep the mirrored database:

  • Treat it as batch or micro-batch

  • Use scheduled Spark jobs, not streaming

  • Read latest snapshot or incremental logic manually

No continuous streaming.

 

Key takeaway
Mirrored database ≠ streaming source. It is optimized for replication and analytics, not CDC streaming semantics.

If your goal is real-time pipelines, move upstream of mirroring. Maintain mirroring while accepting batch processing if simplicity is your main objective.



Thank you!
Proud to be a Super User!
📩 Need more help?
✔️ Don’t forget to accept this as a solution if this guidance worked for you.
💛 Your Like motivates me to keep helping

Hi @MJParikh 

Thank you for providing all of these options. I'm looking into Option 1:

I'm using a Copy Job to ingest CDC tables into a Lakehouse using the Merge write method (this is only available method for CDC tables). I guess I'll have to configure the most frequent schedule here (without spiking the CU too much). I haven't tested this yet.


You've mentioned this under Option 1:

"Then use Eventstream or Spark on top of that"

 

How can I use a Lakehouse as an Evenstream data source? Unless I'm misunderstanding?

Hi @going_grey,

 

You are correct, and sorry for your confusion, which is valid. Let me clarify this cleanly.

A Lakehouse cannot be used as a source for Eventstream.

Eventstream is designed for event-driven, push-based sources such as:

  • Azure Event Hubs

  • Kafka

  • IoT Hub

A Lakehouse, on the other hand, is:

  • Delta-based

  • Pull-oriented

  • Supports updates and merges

So there is no supported pattern like:
Lakehouse → Eventstream

Where the confusion comes from

In Option 1, the flow should be interpreted like this:

SQL Server (CDC) → Ingestion → Lakehouse → Processing (Spark / downstream)

Not:

Lakehouse → Eventstream

Once data lands in the Lakehouse, you are already past the streaming ingestion layer.

 

You mentioned using Copy Job with MERGE for CDC tables.

This introduces a constraint:

  • MERGE creates updates in Delta tables

  • Spark Structured Streaming expects append-only sources

  • This leads to errors like
    DELTA_SOURCE_TABLE_IGNORE_CHANGES

So even without mirroring, MERGE prevents true streaming.

Option A. Micro-batch processing (most stable)

  • Keep your current ingestion (CDC → Lakehouse via Copy Job)

  • Run a scheduled Spark notebook

  • Process incrementally using:

    • timestamp

    • LSN

    • watermark logic

This is the most reliable pattern in Fabric today.

Option B. Append-only landing (stream-compatible)

If streaming is important:

  • Land CDC data as append-only

  • Include operation type (Insert / Update / Delete)

  • Avoid MERGE in the landing layer

Then:

  • Spark readStream works correctly

  • No Delta change conflicts

You handle deduplication downstream.

Option C. True real-time streaming (Eventstream)

If you need near real-time:

  • SQL Server CDC → Event Hub (Debezium or custom capture)

  • Event Hub → Eventstream

  • Eventstream → Lakehouse

This pattern:

  • Preserves event semantics

  • Supports continuous streaming

  • Avoids Delta update limitations

Final takeaway

  • A mirrored database is not a streaming source

  • Lakehouse is not an Eventstream source

  • MERGE-based CDC ingestion blocks streaming scenarios

Let me know how it goes. I wish this would help to resolve your question.


Thank you!
Proud to be a Super User!
📩 Need more help?
✔️ Don’t forget to Accept as Solution if this guidance worked for you.
💛 Your Like motivates me to keep helping

Helpful resources

Announcements
FabCon and SQLCon Highlights Carousel

FabCon &SQLCon Highlights

Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

March Fabric Update Carousel

Fabric Monthly Update - March 2026

Check out the March 2026 Fabric update to learn about new features.