Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
egons11
Helper I
Helper I

Using open source ETL tools for Fabric Lakehouse/DWH

Hi, is there crrently no way to use open source tools to load data into Fabric lakehouse?
For example if we want use Airbyte and Airflow, or you have to use only Data Factory to load and transform data.

Thanks in advance.

 

 

16 REPLIES 16
v-nikhilan-msft
Community Support
Community Support

Hi @egons11 
Thanks for using Fabric Community.
There are several ways to load data into Fabric Lakehouse:
Data Factory: These services can be used to copy data from and to Microsoft Fabric Lakehouse and transform data in Microsoft Fabric Lakehouse. You can also use a pipeline to load sample data into Lakehouse at high performance copy and then transform the data by dataflow.

Apache Spark: You can use Apache Spark libraries in notebook code to connect to a data source directly, load data to a data frame, and then save it in a lakehouse.

While Airbyte can connect to various sources and transform data, it currently doesn't have native support for writing directly to Fabric Lakehouse. Appreciate if you could share the feedback on our feedback channel, which would be open for the user community to upvote & comment on. This allows our product teams to effectively prioritize your request against our existing feature backlog and gives insight into the potential impact of implementing the suggested feature.

Feedback Link:  https://ideas.fabric.microsoft.com/


I hope this information is helpful. Please let me know if you have any other questions.

 

You didn't reply specifically to the question itself. Spark notebooks and ADF are inside Fabric.

Is there currently a way to use outside ETL tools to load data into Fabric DWH. Like Airflow or Airbyte, I am not asking about native connections, but is it even possible to load data into Fabric DWH with anything outside Fabric itself.
Thanks in advance.

We can use Fivetran(If that's helpful) to ingest data to lakehouse but it's not open source like Airflow or Airbytes I'm afraid.

Hi @egons11 
Apologies from my side. Currently Fabric does not support external ETL tools to load data. But if you can create an idea our team would consider the request.
Appreciate if you could share the feedback on our feedback channel, which would be open for the user community to upvote & comment on. This allows our product teams to effectively prioritize your request against our existing feature backlog and gives insight into the potential impact of implementing the suggested feature.

Feedback Link:  https://ideas.fabric.microsoft.com/


I hope this information is helpful. Please let me know if you have any other questions.



Hi, can you please reconfirm this? Is there really no way, with API for example, using external tools? It's really hard for me to believe that the only possible way on getting the data in the Fabric DWH is only via ADF or only Fabric tools.

Hi @egons11 
At this time, we are reaching out to the internal team to get some help on this. We will update you once we hear back from them.
Thanks

Hi @egons11 
One option are SDKs for OneLake which can be used by 3P tools to perform read-write operations on OneLake.
How do I connect to OneLake? - Microsoft Fabric | Microsoft Learn

Since OneLake is built on top of ADLS Gen2, in principal, any 3P tool which supports ADLS Gen2 should also work with OneLake.
Connecting to OneLake  | Microsoft Fabric Blog | Microsoft Fabric

Hope this helps. Please let me know if you have any further questions.

Wouldn't this be quite limited from what a practical ETL should look like with all transformations? Is it meant to be used for this purpose, with transformations, medallion architecture, etc.?

Hi @egons11 
1)  For Fabric Lakehouse - all we care is that data should either be in OneLake or accessible via shortcut for it to be published as a table in Fabric lakehouse. You can implement medallion architecture on Fabric in your own way.

2) We also allow 3P/open source tools such as dbt, Informatica etc. to load data to Fabric Warehouse. Any tool that is capable of executing SQL can also write to Fabric Warehouse. 

3) With Fabric, we store data in open format; ship APIs which enable external apps/services to write/transform data. Not quite sure how is that limiting? Are you looking for a specific functionality/support for specific tool? If that's the case, please mention it specifically.

4) Reality of today's analytics landscape is that there are several tools and several ways of building analytics ecosystem. There's no silver-bullet. Fabric has plenty of features to build an entire ecosystem using its native capabilities and it has interfaces (connectors, APIs, SDKs etc.) which allow for external services/apps to interact with it.

Hope this helps.

I am worried about API throttling here and if there would come any additional costs by using this method besides the Fabric capacity. Also I think this will not allow me to create incremental pipelines, only rewriting the full file.

As I understand the write limit would only be 1200 writes / per hour. So only 1200 file writes per hour?

Hi @egons11 
Where did you come across this ? Did you go through any documentation? 
Please provide the details.
Thanks.

Hi @egons11 
This document is related to Microsoft Azure. We cannot come to a conclusion regarding Fabric by using this document.

Hi @egons11 

We haven’t heard from you on the last response and was just checking back to see if you got some insights regarding your query. Otherwise, will respond back with the more details and we will try to help.
Thanks

Hi @egons11 

We haven’t heard from you on the last response and was just checking back to see if you got some insights regarding your query. Otherwise, will respond back with the more details and we will try to help.
Thanks

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

MayFabricCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.