Re: Oracle Source to Lakehouse

trebllaw · ‎12-18-2024

What is the best way to load huge amount of data from an Oracle source to a Lakehouse? We are trying to load our FACT tables which has millions of rows onto a Fabric Lakehouse. Typically, these tables are terabytes in size. Just wanted to know the best way to approach this with a balance of speed and efficiency. Do we go using the Dataflow Gen2 route or Data Pipeline route?

We are currently testing the pipeline route, but its taking too long to copy over the data. We had to cancel the current run of this pipeline since its still running after 8 hours.

Anonymous · ‎12-18-2024

Hi @trebllaw,

Perhaps you can try to use shortcut feature to getting data. It will create a shortcut to load data from remote data source without data movement.
Since current not support direct create oracle shortcut, you may need to mapping oracle data to supported data source(e.g. Azure, Dataverse…), then you can use shortcut to getting data from azure.

Unify data sources with OneLake shortcuts - Microsoft Fabric | Microsoft Learn
Regards,

Xiaoxin Sheng

trebllaw · ‎12-19-2024

As much as possible, we'd like to avoid additional costs from using Azure since we are already paying for 3 P3 capacities as of the moment. That is why we didnt go the use staging route as that would consume Data Lake Gen 2 storage, which is an additional cost.

Anonymous · ‎12-22-2024

HI @trebllaw,

Perhaps you can consider using sempy link which can used to getting data from Semantic model.

What is semantic link? - Microsoft Fabric | Microsoft Learn
You can design a PBI report on desktop with live connection model to the oracle data source and publish it to power bi service workspace that assigned with fabric capacity.
After you assign gateway to handle its refresh, you can use notebook to getting data from that model and loading to Lakehouse.

Semantic link and Power BI connectivity - Microsoft Fabric | Microsoft Learn

Regards,

Xiaoxin Sheng

trebllaw · ‎01-09-2025

Unfortunately, this wouldnt work for us since we are going to be loading huge amount of data from multiple FACT tables.

The main reason we are trying to use pipelines is that it is advertised to be the ideal option to move large amounts of data, being able to load petabytes of data pretty quick, but based from our tests, we are getting about 5Kb/s of throughput. For 3.5, we only managed to process and transfer about 1.5M rows. We may be doing something wrong here we are not sure.

Here are some of the things that might be of note:

1. Oracle source system isnt partitioned.

2. We've set the pipeline to dynamic partitioning and set one of the key fields as index.

3. Parallel copies automatically set to 3

coding_too_much · ‎03-25-2025

Hi,

I'm encountering the same performance issues in my testing. Have you had any success? I'm also loading very large onprem oracle tables that are not partitioned into MS Fabric Lakehouses.

Anonymous · ‎01-14-2025

HI @trebllaw,

What type of SKU are you worked? The endpoint and data pipeline also existed data read/write limitations. You can check the following links to know more about them if they helps for your scenario:

What is Power BI Premium? - Power BI | Microsoft Learn

Data Factory limitations overview - Microsoft Fabric | Microsoft Learn
Regards,

Xiaoxin Sheng

trebllaw · ‎12-18-2024

Currently with our testing, it would be a one time thing since we are trying to replicate what we have in the on prem oracle warehouse to the Fabric Lakehouse.

In the future, we would also like do updates on the Lakehouse whenever the warehouse changes, maybe this would be scheduled like once a day or something.

Its a direct copy and paste, no transformation whatsoever. Well, does changing of the detected datatype count? Sometimes Fabric detects the data type of a column as float when really its an integer, so we change that during the copy.

DataBard · ‎12-18-2024

Hi @trebllaw,

Some questions to clarify what you're looking for:

Is this a one-time moving of data from Oracle to Fabric, or would this be on a routine schedule?
1. If routine, will you need to process all historical data every time, or can you go incremental?
Are any transformations occurring while moving the data to Fabric, or is it a basic 'copy and paste'?

Oracle Source to Lakehouse

Helpful resources

FabCon Global Hackathon

Fabric Monthly Update - September 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Oracle Source to Lakehouse

Helpful resources

FabCon Global Hackathon

Fabric Monthly Update - September 2025

FabCon Atlanta 2026