Solved: Fabric data ingestion best practice - data factory...

eurenergy · ‎02-14-2025

Hi all,

I have to move data from Azure blob storage/ Azure data lake gen2 (ADLS Gen 2) to Fabric.

It concerns parquet files which are uploaded to ADLS Gen 2 every day in a folder of that day.

One hard requirement: If moving the data of a certain day failed, the process should be able to recognise that it has to catch up for this day and also move the data for the missing day. This way, ADLS Gen2 and Fabric are always in sync.

As you all know, Fabric has multiple methods to do the same thing. I want to ask what is best practice in this scenario. Also do I need to copy the parquet files to Fabric or should I move the data directly to delta tables. We have people who have experience with all methods mentiond below in our team. I see the following possiblities:

1. Shortcut: best solution on paper as you do not have to copy data. However, this one is not possible on our tenant.

2. Copy data via DataFactory

3. Python Notebook (scheduled in Data Factory)

4. Dataflow (scheduled in Data Factory)

Which one is considered best practice/ is the most robust/scalable.

ObungiNiels · ‎02-15-2025

Hi @eurenergy ,

the circumstances you described pose a perfect example where a metadata driven framework can be implemented to satisfy your requirements.

In your case, my suggestions would be as follows.

Set up a SQL database item insight fabric which inherits your metadata tables. For ingest, you can create a table (e. g. called "ImportConfig), in which you save information about your data source. This information can be the data source name, path in the datalake gen2 and most importantly, you want to add a column which contains a watermark value. The watermark value should provide the information on when you last loaded the data successfully, most often as timestamp.

Next, in a data pipeline, create three activities:

Lookup Activity: This lookup is connected to your SQL database item and retrieves the last available load data of your ImportConfig table
Copy Activity: This copy activity will do the actual copying activity. In it, you can use the retrieved watermark value in the Source settings under Advanced if you are copying files or in the Query field directly as parameter when you are loading from a table.
Lookup/Stored Procedure Activity: After the copy activity, set up an activity which sends an update statement to your metadata table to update the watermark value, e. g. to @utcnow(). Make sure to execute this activity after you loaded the data successfully. This is important because you do not want to update the watermark value if the load wasn't successful.

While the copying can also be done with a notebook, I think it makes sense to first try it with a copy activity since they inherit the connector you need for your source.

I hope this helps! Let me know if you have additional questions.

If this solves your request, make sure to Accept it as solution, so other people can find it quickly. 🙂

Kind regards,

Niels

View solution in original post

v-menakakota · ‎02-18-2025

Hi @eurenergy

Thank you @ObungiNiels for providing helpful solution,please follow the steps provided by ObungiNiels which may help you in resolving the issue. If the response has addressed your query, please accept it as a solution so other members can easily find it.

Regards,
Menaka.

ObungiNiels · ‎02-15-2025