The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi all,
I have to move data from Azure blob storage/ Azure data lake gen2 (ADLS Gen 2) to Fabric.
It concerns parquet files which are uploaded to ADLS Gen 2 every day in a folder of that day.
One hard requirement: If moving the data of a certain day failed, the process should be able to recognise that it has to catch up for this day and also move the data for the missing day. This way, ADLS Gen2 and Fabric are always in sync.
As you all know, Fabric has multiple methods to do the same thing. I want to ask what is best practice in this scenario. Also do I need to copy the parquet files to Fabric or should I move the data directly to delta tables. We have people who have experience with all methods mentiond below in our team. I see the following possiblities:
1. Shortcut: best solution on paper as you do not have to copy data. However, this one is not possible on our tenant.
2. Copy data via DataFactory
3. Python Notebook (scheduled in Data Factory)
4. Dataflow (scheduled in Data Factory)
Which one is considered best practice/ is the most robust/scalable.
Solved! Go to Solution.
Hi @eurenergy ,
the circumstances you described pose a perfect example where a metadata driven framework can be implemented to satisfy your requirements.
In your case, my suggestions would be as follows.
Set up a SQL database item insight fabric which inherits your metadata tables. For ingest, you can create a table (e. g. called "ImportConfig), in which you save information about your data source. This information can be the data source name, path in the datalake gen2 and most importantly, you want to add a column which contains a watermark value. The watermark value should provide the information on when you last loaded the data successfully, most often as timestamp.
Next, in a data pipeline, create three activities:
While the copying can also be done with a notebook, I think it makes sense to first try it with a copy activity since they inherit the connector you need for your source.
I hope this helps! Let me know if you have additional questions.
If this solves your request, make sure to Accept it as solution, so other people can find it quickly. 🙂
Kind regards,
Niels
Hi @eurenergy
Thank you @ObungiNiels for providing helpful solution,please follow the steps provided by ObungiNiels which may help you in resolving the issue. If the response has addressed your query, please accept it as a solution so other members can easily find it.
Regards,
Menaka.
Hi @eurenergy ,
the circumstances you described pose a perfect example where a metadata driven framework can be implemented to satisfy your requirements.
In your case, my suggestions would be as follows.
Set up a SQL database item insight fabric which inherits your metadata tables. For ingest, you can create a table (e. g. called "ImportConfig), in which you save information about your data source. This information can be the data source name, path in the datalake gen2 and most importantly, you want to add a column which contains a watermark value. The watermark value should provide the information on when you last loaded the data successfully, most often as timestamp.
Next, in a data pipeline, create three activities:
While the copying can also be done with a notebook, I think it makes sense to first try it with a copy activity since they inherit the connector you need for your source.
I hope this helps! Let me know if you have additional questions.
If this solves your request, make sure to Accept it as solution, so other people can find it quickly. 🙂
Kind regards,
Niels