Solved: Re: Data Extraction from Oracle

Ira_27 · ‎03-27-2026

Hello Community,

I have a design question, we are building a enterprise data warehouse using Fabric Warehouse and I have to stage the data from our Oracle database (source). Currently the way i designed it is, i have .sql files sitting in OneLake Lakehouse which has the extraction script from Oracle. I then run a notebook using high concurrency to read the .sql file and assign it to a variable. This is later used in the for loop copy activity where the source is Oracle and the query is the variable passed from Notebook and dumps into the Lakehouse table. This is all meta data driven and the config comes from a table.

The question i have, is this the best way to do this? What other alternatives have people used?

Challenges:

1. Since HC notebooks can only have 5 connections at the same time i can run only 5 pulls in parallel.

2. I tired to use OPENROWSET from fabric warehouse but it failed reading the contet as a SINGLE_BLOB

Please adivse.

v-hjannapu · ‎03-31-2026

Hi @Ira_27 ,
Thanks for the clarification. I understand you are using notebook only to read the .sql file and pass it to Copy activity.

To avoid notebook, one option you can try is storing the SQL query in a config table and reading it using Lookup activity in pipeline. This way you can directly pass it to Copy activity and remove the HC notebook dependency.

If maintaining queries in table is difficult, another option is to store the SQL files in OneLake and use pipeline activities like Lookup/Get Metadata + read content to read the file instead of notebook. This helps you keep the files separate and also avoids the 5 connection limit from HC notebooks.

Please check if this works for you. Let us know if you need any help on this.
Regards,
Community Support Teams.

View solution in original post

Ira_27 · ‎03-29-2026

Appreciate your responses however this is a third party system that uses Oracle and in order to use Mirroring they need to enable golden gate which is not an option at the moment hence the approach that we are taking.

Given that mirroring is not an option, would love to get additional feedback on this approach.

spaceman127 · ‎03-29-2026

Hi @Ira_27 ,

all right.

So that means you’re not able to configure anything on the Oracle server itself?
In that case, your only real options are the Copy Activity or the Copy Job.

You can, of course, work with watermarks to ensure that only new or previously uncopied data is loaded. CDC would be an alternative, but that would require configuration either on the server or within the database.

Another advantage of using a Copy Activity or Copy Job is that it avoids the overhead associated with notebooks.

I can imagine the following approach, though it would need to be tested in practice:

Configuration table (Lakehouse Tables or the Fabric Warehouse)
Pipeline that reads the configuration table row by row
ForEach activity with dynamic parallelization
Copy Activity that loads data from Oracle → Lakehouse Tables

Later, you can use Notebooks to further process the data.

Would that be an option for you?

Best regards

René

Ira_27 · ‎03-30-2026

Thank you @spaceman127 for your response and that is exactly what i am doing right now. The only caviat is that since my Oracle source queries are stored in .sql file i am using notebook to read the content of the file and passing it as a variable in the copy activity. I initially thought of saving the source queries as TEXT in a table but then it add's overhead when someone has to review or update the source queries. Do you have any other recommendation?

v-hjannapu · ‎03-31-2026

Hi @Ira_27 ,
Thanks for the clarification. I understand you are using notebook only to read the .sql file and pass it to Copy activity.

To avoid notebook, one option you can try is storing the SQL query in a config table and reading it using Lookup activity in pipeline. This way you can directly pass it to Copy activity and remove the HC notebook dependency.

If maintaining queries in table is difficult, another option is to store the SQL files in OneLake and use pipeline activities like Lookup/Get Metadata + read content to read the file instead of notebook. This helps you keep the files separate and also avoids the 5 connection limit from HC notebooks.

Please check if this works for you. Let us know if you need any help on this.
Regards,
Community Support Teams.

Ira_27 · ‎03-31-2026

Thanks, i totally missed out on using the Get Metadata, i guess i was too much stuck on using Notebook that i overlooked our good old "Get Metadata" friend.

deborshi_nag · ‎03-29-2026

Hello @Ira_27

You should default to Oracle Mirroring rather than a High Concurrency notebook approach when you’re staging Oracle data for an enterprise data warehouse. Mirroring keeps your Oracle tables continuously replicated into OneLake using database logs, so you avoid notebook execution limits, Spark session contention, and the operational overhead that comes with orchestrating ingestion through HC notebooks.

Once the data is mirrored, Delta Change Data Feed (CDF) gives you a clean and reliable way to pick up inserts, updates, and deletes and push those changes downstream. You can incrementally build your Bronze and Silver layers and keep your Fabric Warehouse in sync without relying on watermark queries, SQL files, or loop‑based copy logic.

Compared to using HC notebooks for extraction, this takes Spark out of the ingestion path entirely. You’re no longer constrained by shared session limits, capped parallelism, or capacity throttling issues that tend to show up as volumes and table counts increase.

From a day‑to‑day operations perspective, this is also much easier to live with. You’re not managing notebooks that prepare SQL, tuning parallel copy settings, or handling retries and partial failures. Instead, you rely on a managed replication mechanism and a standard change feed that’s designed exactly for incremental processing.

It’s also a safer approach than query‑based or notebook‑driven Copy patterns, because changes are captured directly from Oracle redo logs rather than inferred from timestamps or custom logic. That reduces the risk of missed updates, race conditions, and messy reprocessing scenarios as your platform grows.

Overall, Oracle Mirroring combined with CDF gives you a much cleaner separation of concerns: ingestion runs continuously and independently, change tracking is consistent and reusable, and EDW modelling stays downstream where the business logic belongs. The result feels simpler, more robust, and far more future‑proof than using High Concurrency notebooks as an ingestion and orchestration engine.

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

spaceman127 · ‎03-29-2026

Hi @Ira_27,

That’s an interesting approach you’re taking.

We run a few Oracle databases ourselves, and we use Oracle Mirroring to get the data into Fabric in the first place. In my opinion, this works very well, and it’s free up to your capacity limit. That means you don’t pay anything for storage.

Here you can read this.

https://learn.microsoft.com/en-us/fabric/mirroring/overview

Once you have the data in Fabric, you can easily process it using notebooks and populate your lakehouse.

Here’s an overview of how we do it.

1. Transfer Oracle data to Fabric using mirroring
2. Process the data further using notebooks (we use Bronze, Silver, and Gold)
3. Use the data in reports.

I hope that helps

Many greetz

Rene

Data Extraction from Oracle

Helpful resources

FabCon & SQLCon – Barcelona 2026

Data Days 2026

Fabric Monthly Update - June 2026

Get Fabric or SQL Certified for Free.

Data Extraction from Oracle

Helpful resources

FabCon & SQLCon – Barcelona 2026

Data Days 2026

Fabric Monthly Update - June 2026