Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
XhevahirMehalla
Helper II
Helper II

Public Preview Consideratation - Data Mirroring

I have a solution build on Synapse for a client which includes:

 

  1. Data lake gen 2 account as storage layer where we store CSV files extracted from Oracle OCI db
  2. Synapse analytics Pipelines which copy the data and transform then data within Data lake (directories raw/staging).
  3. Synapse pipelines where we transform data from staging container on DaTa lake and store them into Azure SQL database.
  4. Use Power BI  Desktop and Service using Pro/PPU licenses.
  5. An Angular application which loads some cSV files into Data lake gen 2 and Synapse Transform them.

 

I want to deploy the same solution into Fabric  now on another client.

We have full control of the code and client subscription.

The new client want to have some reports real time or happy with near-real time. 

For that I am considerting data Mirroring which in Public Preview and I want to consider implications of choosing that as the solution.

 

All code is on Azure DevOps.

 

Please can some one point me how can I proceed.

 

Happy to give more details on this.

 

Thanks

Xhev

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Hi @XhevahirMehalla ,

 

Thanks for your detailed follow-up. Let me clarify and summarize the solution so you have a clear picture for your diagram and next steps.

No separate Kafka setup is required  Azure Event Hubs acts as a Kafka-compatible broker, so Debezium can publish directly to Event Hubs without maintaining your own Kafka cluster. You will need a small Azure VM or container to run the Debezium Oracle connector. This VM must have secure network connectivity to your OCI Oracle database, either via VPN, ExpressRoute, or a secure public endpoint, and the connector will use the CDC/log-mining user to access redo logs. On the Oracle side, ensure supplemental logging is enabled, the database is in ARCHIVELOG mode, and a dedicated user with proper privileges exists for Debezium.

Once Debezium is running, it reads changes from Oracle redo logs and streams them as JSON events (inserts, updates, deletes) to Event Hubs.

From Event Hubs, you have flexibility depending on your target. If you want to land data in Fabric Lakehouse, a lightweight consumer process (e.g., Azure Function, Synapse notebook, or Databricks job) can read events and write them into Delta tables in OneLake.

Once in the Lakehouse, your Fabric workspace and Power BI reports can directly query the data. If your goal is Azure SQL Database without Fabric, a small Stream Analytics job or consumer app can read the Event Hubs events and apply the changes to your tables, enabling Power BI or other reporting layers to access them as usual.

Regarding monitoring and correctness: the Azure VM running Debezium should be continuously available (Azure Monitor can help), and Debezium preserves event ordering per table and primary key. Downstream processes must handle idempotency to avoid applying the same change twice and ensure correct order when processing events from multiple tables or partitions. Planning for retries, restarts, and error handling is essential, but these are standard practices in any CDC pipeline.

Overall, this approach is cost-effective, avoids expensive GoldenGate licensing, and is robust enough for production workloads like your digital banking statements scenario. Starting with a proof-of-concept on one table and one Event Hub throughput unit is recommended to validate connectivity, latency, and processing before scaling to all three tables. This setup allows a hybrid approach you can keep your existing Synapse/Azure SQL solution for production stability while experimenting with Fabric Lakehouse for future migration, and cross-querying between these sources is fully supported if needed.

Thank you,

Tejaswi.

View solution in original post

13 REPLIES 13
v-tejrama
Community Support
Community Support

Hi @XhevahirMehalla ,
Thanks for reaching out to the Microsoft fabric community forum.

 

Thank you for providing the detailed scenario. As you consider Fabric, I’d like to offer some clarification. Data Mirroring in Fabric, including support for Oracle as a source, is currently in Public Preview. This status means it is fully supported for evaluation, pilots, and proof-of-concept projects, but it does not yet include SLA guarantees and may undergo changes before general availability. If your client is comfortable with preview features, you can explore Oracle mirroring into a Fabric Warehouse or Lakehouse, which enables near real-time replication without ETL and can streamline your current Synapse configuration. However, for business-critical workloads that require production-grade stability, it is advisable to maintain a traditional approach, such as using Azure Data Factory or Event Hubs/Kafka for CDC ingestion from Oracle OCI into Azure SQL Database, followed by DirectQuery with Power BI for real-time reporting.

Within Fabric, Synapse pipelines would be replaced by Dataflows Gen2 or Notebooks for transformations, utilizing OneLake/Lakehouse for central storage. Since you are already using Azure DevOps, you can leverage Fabric Git integration for version control and deployment management. Power BI’s native integration in Fabric will also keep your reporting processes straightforward.

To summarize, Fabric Mirroring is a forward-looking solution worth exploring for near real-time scenarios. For immediate production needs, maintaining the Oracle to Azure SQL setup via CDC/Event Hub is recommended, allowing you to experiment with Fabric while ensuring ongoing stability for your client.

Best Regards,

Tejaswi.

Community Support

 

Thanks for all this insights. Now I undestand the implications of using Fabcic Data Mirroring.

It looks that we need to continue using Azure Synapse, Data lake Gen 2 and Azure SQL db and Power BI.

 

I am not clear how to expand to get to existing solution without being on Fabric to fulfill :

  1.  Near -real time data from Oracle OCI database
  2. or The perfect solution would be real-time solution (getting data real time from Oracle OCI db).

I was consider using Event Hubs and see if this will do the job although I don't know how complext is this to bring real data into an azure sql db. 

Please can you give me some guidance on this or point me to the right team?

 

 

Thanks

Xhev 

Hi @XhevahirMehalla ,

 

Thank you for providing additional information. Exploring Event Hubs for near real-time data movement from Oracle OCI to Azure SQL Database is a solid approach. However, it is important to note that Event Hubs does not directly capture database changes. You will need to implement a CDC (Change Data Capture) mechanism or a log-based streaming tool on the Oracle side to send data to Event Hubs. Once the data is in Event Hubs, you can process it with Azure Stream Analytics or a similar service before writing to Azure SQL Database.

This solution offers low-latency or near real-time updates but requires extra configuration and ongoing monitoring. If you prefer a more managed CDC solution, consider options such as Oracle GoldenGate, Debezium, or Azure Data Factory’s CDC feature to capture changes and transfer them to Azure SQL.

For most reporting scenarios, near real-time refresh using CDC or frequent micro-batch loads is typically sufficient and easier to maintain than a fully event-driven pipeline. The Microsoft Learn documentation on Oracle CDC and Event Hubs provides comprehensive guidance for implementation.

Thank you,

Tejaswi.

That bits of CDC I am not clear about:

 

The request is simple:

  1. Our client is a digital bank
  2. They want to create a online statements which will be build on Power BI/Builder maybe and send to the digital app for consumption. 
  3. We have three main tables which are update regularyl in Oracle OCI and want access to these table through a Data warehouse solution. Not direct access to Oracle DB. 
  4. Using Oracle Golden Gate , Kafka and Event Hubs is one wany but Golden Gate is very expensive.

Please can you see if any of your architects would have any other ideas.

 

We look at:

  1. Costs
  2. Less Complex
  3. Manage them easily.

At the moment how yeto suggest a solution as I don't understand the whole thing.

 

I might consider: 

  1. leave the existing solution as is in Synapse - don't migrate to Fabric yet.
  2. Have just Fabric license just to do the Data Mirroring (Oracle) into a LakeHouse and produce the statement using these data. 

A question is:

I will have data on Azure SQL DB and some data on Fabric LakeHouse.

I guess I can do cross querying between these sources if I need to and at some point in the future we can migrate to Fabric from Synapse.

 

Let me know what's your thinking and If I can use Data Mirroring in oracle I need a sets of instrucrtions what Do I need to setup on Oracle OCI db.

We have the full control of the db although for the licensing we need to ask the client.

 

Please help!

 

Thank you

 

 

Hi @XhevahirMehalla ,

 

Thank you for providing such a clear overview of your requirements. To recap, your main goal is to integrate Oracle (OCI) data into Fabric/Power BI for online statement generation, which necessitates a CDC or replication solution. While GoldenGate is one option, it may be cost-prohibitive. Below are several practical alternatives:

  • Fabric Data Mirroring (Preview): Microsoft’s mirroring feature enables direct replication of Oracle tables to a Fabric Lakehouse, simplifying the process. Please refer to the relevant tutorials for setup instructions.
  • Open-source approach (Debezium + Event Hubs): Debezium’s Oracle connector streams redo log changes to Kafka, and Azure Event Hubs supports Kafka protocols, allowing data transfer to ADLS Gen2/OneLake for use in Fabric Lakehouse. This method is cost-effective but requires DBA involvement for log configuration.
  • Commercial CDC tools: Solutions like Qlik Replicate, BryteFlow, and Striim offer robust interfaces and support for replicating Oracle data to ADLS Gen2, Azure SQL DB, or Delta tables, but involve licensing costs.
  • Azure Data Factory (batch mode): If real-time updates are not essential, ADF pipelines can periodically copy Oracle data, offering a straightforward and economical option.

Oracle prerequisites include:

  • Enabling supplemental logging.
  • Access to redo/archived logs.
  • Provisioning a user with SELECT and log mining privileges.
  • Establishing secure OCI-to-Azure connectivity.

Power BI and Fabric allow cross-querying between Azure SQL DB and Fabric Lakehouse, but consolidating reporting data in Fabric Lakehouse is recommended for efficiency.

To select the most suitable approach:

  • Use Debezium or a commercial CDC tool for real-time needs.
  • Debezium with Event Hubs is effective for slight delays.
  • ADF is ideal for hourly or daily refreshes.
  • Consider Oracle Data Mirroring in Fabric (Preview) for a managed, Fabric-native solution.

Since you have a Fabric license, starting with the mirroring option is recommended. The referenced Microsoft Learn tutorials can guide your DBA through the process.

Thank you,

Tejaswi.

Hi V-Tejrama,

 

Out of all these solution the one stands out for me is this below which you suggest:

  • Open-source approach (Debezium + Event Hubs): Debezium’s Oracle connector streams redo log changes to Kafka, and Azure Event Hubs supports Kafka protocols, allowing data transfer to ADLS Gen2/OneLake for use in Fabric Lakehouse. This method is cost-effective but requires DBA involvement for log configuration.

Just so I understand better the solution and please confirm with me:

 

  1. On OCI I deploy Debezium Product which is Oracle product or opensource product? How much will this costs - again we only need few tables - 3-5 tables at most. How is easy to configure? we have access to DB so not issue to DBA access.
  2. From Debenzium data goes to kafka where Azure Event Hubs subscribe to?
  3. I need Kafka to setup as well and how complex and how much will this going to costs; just roughly?
  4. Then data goes from Kafka and Event Hubs pick up the changes? Would would I see changes (inserts, updates and deletes)? 
  5. Once Event Hubs is picking up the data - do I store the changes on data lake as a CSV file and process this files and insert into final table on Azure SQL db or Azure Lakehouse fabric ? 
  6.  

Please can you review and let me know if this solution is understood from me.

 

After this reply from you I will accept the solution and propose to my CTO and we move forward.

 

Thanks

Xhev

Hi @XhevahirMehalla ,

 

Thank you for outlining your understanding of the Debezium and Event Hubs integration and for your follow-up questions. I would like to clarify several important aspects to assist in your evaluation of this solution.

Debezium is an open-source project, not an Oracle product, so there are no associated license costs. You will need to provision compute resources for Debezium (as a Kafka Connect worker) and set up an Event Hubs namespace in Azure. For your scenario with 3–5 tables, a small VM or container deployment should be sufficient. On the Oracle side, some DBA tasks are necessary: enabling supplemental logging, ensuring the database is in ARCHIVELOG mode, and creating a user with appropriate privileges to access redo logs. These steps are detailed in the Debezium Oracle connector documentation.

 

Once connected, Debezium reads changes from Oracle redo logs and streams them as JSON events, indicating inserts, updates, or deletes with before and after values. These events are published to topics, and since Azure Event Hubs provides a Kafka-compatible endpoint, Debezium can publish directly to Event Hubs without the need for a separate Kafka cluster.

From Event Hubs, you have flexibility in consuming the data, such as landing it in a Lakehouse or OneLake (in Parquet or Delta format) for use in Fabric, or pushing it to Azure SQL Database via a consumer or Stream Analytics job. For digital banking scenarios like online statements, landing data in a Lakehouse with Delta tables is often preferred for scalability and ease of handling various data changes.

 

This approach is generally more cost-effective and less complex than GoldenGate, as you only pay for Event Hubs throughput, storage, and minimal compute for Debezium. However, it is important to plan for monitoring and managing the Debezium runtime and to design downstream processes to ensure idempotency and correct event ordering.

  • Debezium is open-source; costs are limited to Azure resources.
  • No separate Kafka cluster is necessary—Event Hubs serves as the broker.
  • Full CDC events are available (insert, update, delete).
  • Data can be directed to SQL Database or Lakehouse as required.
  • Oracle configuration includes supplemental logging, ARCHIVELOG mode, and a user for log mining.

This provides a clear and cost-effective path forward. You may consider starting with a proof of concept using a single table and one Event Hub throughput unit to validate the setup.

Thank  you,

Tejaswi.

Hi Tejaswi,

 

This is much clearer now. 

I understand this better but still have few questions as I am putting this into a diagram:

  1. No Kafka setup/support is needed? Confirm?
  2. Need a azure VM where we will install Debenzium? If so what connection is needed to access OCI oracle db? Pls send me some high level steps and what should I look for?
  3. Referrin to this " or pushing it to Azure SQL Database via a consumer or Stream Analytics job. For digital banking scenarios like online statements, landing data in a Lakehouse with Delta tables is often preferred for scalability and ease of handling various data changes."
  4. What would be the consumer (Fabric or Azure Sql DB) ? 
  5. Event hubs publishing the changes to : If in Fabric what do I need to setup (In fabric I know what to setup, Workspace, Lakehouse and all that but what else I need to do for data to reach Fabric - Event hub can land data on Lakehouse?).
  6. If I use the azure sql db but not on Fabric which will be most likely scenario - you mention I need to deploy Azure Stream Analytics. I want only to get the changes in a table on Azure that's all and then I can access them how I wnated. Pls confirm my understanding.
  7. Referring to this: "However, it is important to plan for monitoring and managing the Debezium runtime and to design downstream processes to ensure idempotency and correct event ordering." 
  8. What do I need to look for in this as looks a bit worring that I have to:
    1. Azure VM has to stay up all the time - that's fine. Azure Monitor can do that
    2. Correct Event Ordering - what do you mean? doesnt Debezium handle that? Please can you elaborate a bit more on this?

Will solution be resbust enough in your view and has anyone has this setup?

 

Thanks

Xhev

Hi @XhevahirMehalla ,

 

Thanks for your detailed follow-up. Let me clarify and summarize the solution so you have a clear picture for your diagram and next steps.

No separate Kafka setup is required  Azure Event Hubs acts as a Kafka-compatible broker, so Debezium can publish directly to Event Hubs without maintaining your own Kafka cluster. You will need a small Azure VM or container to run the Debezium Oracle connector. This VM must have secure network connectivity to your OCI Oracle database, either via VPN, ExpressRoute, or a secure public endpoint, and the connector will use the CDC/log-mining user to access redo logs. On the Oracle side, ensure supplemental logging is enabled, the database is in ARCHIVELOG mode, and a dedicated user with proper privileges exists for Debezium.

Once Debezium is running, it reads changes from Oracle redo logs and streams them as JSON events (inserts, updates, deletes) to Event Hubs.

From Event Hubs, you have flexibility depending on your target. If you want to land data in Fabric Lakehouse, a lightweight consumer process (e.g., Azure Function, Synapse notebook, or Databricks job) can read events and write them into Delta tables in OneLake.

Once in the Lakehouse, your Fabric workspace and Power BI reports can directly query the data. If your goal is Azure SQL Database without Fabric, a small Stream Analytics job or consumer app can read the Event Hubs events and apply the changes to your tables, enabling Power BI or other reporting layers to access them as usual.

Regarding monitoring and correctness: the Azure VM running Debezium should be continuously available (Azure Monitor can help), and Debezium preserves event ordering per table and primary key. Downstream processes must handle idempotency to avoid applying the same change twice and ensure correct order when processing events from multiple tables or partitions. Planning for retries, restarts, and error handling is essential, but these are standard practices in any CDC pipeline.

Overall, this approach is cost-effective, avoids expensive GoldenGate licensing, and is robust enough for production workloads like your digital banking statements scenario. Starting with a proof-of-concept on one table and one Event Hub throughput unit is recommended to validate connectivity, latency, and processing before scaling to all three tables. This setup allows a hybrid approach you can keep your existing Synapse/Azure SQL solution for production stability while experimenting with Fabric Lakehouse for future migration, and cross-querying between these sources is fully supported if needed.

Thank you,

Tejaswi.

Hi  @XhevahirMehalla ,

 

I hope the information provided has been useful. Please let me know if you need further clarification or would like to continue the discussion.

Thank you.

 

Yes the information has been useful indeed and I am taking this as th solution but I have some further question.

 

Please can you look into my Friday's email and let me know.

 

 

If you think that must I accept this as solution now then I can do that but please can you send me some reply to my questions?

 

 

Thanks

Xhev

suparnababu8
Super User
Super User

Hi @XhevahirMehalla 

 

 

Microsoft Fabric REST APIs for automation and embedded analytics - Microsoft Fabric REST APIs | Micr...

Overview of Fabric Git integration - Microsoft Fabric | Microsoft Learn

 

Hope it may helps you.

 

Thank you!

 

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

 

Thanks very useful however I am more interested on Oracle database on OCI (cloud) as the sourcer of data. 

At Vienna conference it announced Data Mirroring for Oracle too. 

How safe is this to use a product in Prublic preview?

 

Another thing is:

 

If I can't migrate to Fabric - how can do something similar - produce real time reports using Azure SQL database as target DB and source again Oracle DB on OCI as the source. 

 

Can Event Hub help and how this will be used with Kafka? how much work I need to do to setup this?

 

 

Do I need to speak toi someone else withing MSFT ?

 

Thanks

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Fabric Update Carousel

Fabric Monthly Update - October 2025

Check out the October 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.