Can't miss session! The 9 FabCon and SQLCon takeaways the community can't stop talking about. Join us on April 2nd. Register now
Hello Community,
I have a design question, we are building a enterprise data warehouse using Fabric Warehouse and I have to stage the data from our Oracle database (source). Currently the way i designed it is, i have .sql files sitting in OneLake Lakehouse which has the extraction script from Oracle. I then run a notebook using high concurrency to read the .sql file and assign it to a variable. This is later used in the for loop copy activity where the source is Oracle and the query is the variable passed from Notebook and dumps into the Lakehouse table. This is all meta data driven and the config comes from a table.
The question i have, is this the best way to do this? What other alternatives have people used?
Challenges:
1. Since HC notebooks can only have 5 connections at the same time i can run only 5 pulls in parallel.
2. I tired to use OPENROWSET from fabric warehouse but it failed reading the contet as a SINGLE_BLOB
Please adivse.
Appreciate your responses however this is a third party system that uses Oracle and in order to use Mirroring they need to enable golden gate which is not an option at the moment hence the approach that we are taking.
Given that mirroring is not an option, would love to get additional feedback on this approach.
Hi @Ira_27 ,
all right.
So that means you’re not able to configure anything on the Oracle server itself?
In that case, your only real options are the Copy Activity or the Copy Job.
You can, of course, work with watermarks to ensure that only new or previously uncopied data is loaded. CDC would be an alternative, but that would require configuration either on the server or within the database.
Another advantage of using a Copy Activity or Copy Job is that it avoids the overhead associated with notebooks.
I can imagine the following approach, though it would need to be tested in practice:
Later, you can use Notebooks to further process the data.
Would that be an option for you?
Best regards
René
Hello @Ira_27
You should default to Oracle Mirroring rather than a High Concurrency notebook approach when you’re staging Oracle data for an enterprise data warehouse. Mirroring keeps your Oracle tables continuously replicated into OneLake using database logs, so you avoid notebook execution limits, Spark session contention, and the operational overhead that comes with orchestrating ingestion through HC notebooks.
Once the data is mirrored, Delta Change Data Feed (CDF) gives you a clean and reliable way to pick up inserts, updates, and deletes and push those changes downstream. You can incrementally build your Bronze and Silver layers and keep your Fabric Warehouse in sync without relying on watermark queries, SQL files, or loop‑based copy logic.
Compared to using HC notebooks for extraction, this takes Spark out of the ingestion path entirely. You’re no longer constrained by shared session limits, capped parallelism, or capacity throttling issues that tend to show up as volumes and table counts increase.
From a day‑to‑day operations perspective, this is also much easier to live with. You’re not managing notebooks that prepare SQL, tuning parallel copy settings, or handling retries and partial failures. Instead, you rely on a managed replication mechanism and a standard change feed that’s designed exactly for incremental processing.
It’s also a safer approach than query‑based or notebook‑driven Copy patterns, because changes are captured directly from Oracle redo logs rather than inferred from timestamps or custom logic. That reduces the risk of missed updates, race conditions, and messy reprocessing scenarios as your platform grows.
Overall, Oracle Mirroring combined with CDF gives you a much cleaner separation of concerns: ingestion runs continuously and independently, change tracking is consistent and reusable, and EDW modelling stays downstream where the business logic belongs. The result feels simpler, more robust, and far more future‑proof than using High Concurrency notebooks as an ingestion and orchestration engine.
Hi @Ira_27,
That’s an interesting approach you’re taking.
We run a few Oracle databases ourselves, and we use Oracle Mirroring to get the data into Fabric in the first place. In my opinion, this works very well, and it’s free up to your capacity limit. That means you don’t pay anything for storage.
Here you can read this.
https://learn.microsoft.com/en-us/fabric/mirroring/overview
Once you have the data in Fabric, you can easily process it using notebooks and populate your lakehouse.
Here’s an overview of how we do it.
1. Transfer Oracle data to Fabric using mirroring
2. Process the data further using notebooks (we use Bronze, Silver, and Gold)
3. Use the data in reports.
I hope that helps
Many greetz
Rene