Solved: Best practice for moving data from files to delta ...

eurenergy · ‎02-28-2025

Hi all,

I set up a weekly scheduled data pipeline which copies data from Azure Datalake Storage Gen2 (ADLS gen2) to the Fabric LakeHouse.

It works perfectly, it simply takes all files added last week, and moves it to Lakehouse. If a file is already in Lakehouse, it is simply overwritten, which is fine.

However, I am strugging to find the best practice on how to move the data from these files to a delta table, as this can be done in several ways (a notebook, dataflows, copy activity). It is important that data only should be appended if it is not already in the delta table.

What is the best practice way to do this? It is strange that I am not able to find a lot of infomation about this, as this is one of the most common things to occur in Fabric.

spencer_sa · ‎02-28-2025

We do this in three ways;
1) if we have local copies of the files, we have a folder structure that has an unprocessed folder and a processed folder. A pipeline with a Get Metadata activity on the unprocessed folder feeding a For Each activity. Inside that is a Copy Data activity and some file copy/delete steps. (No Move File activity 😞 )
2) if all we have is a shortcut and we don't want to copy the files, we maintain a list of processed files in a table (using a notebook). In a second notebook, we then do a left anti join on the files in the shortcut against the processed list and output the list of unprocessed files. This then feeds a For Each activity to do the Copy Data and then using the first notebook append the processed file.
3) If we use a shortcut *and* keep a local copy of the processed files, then we can do somethin like 2), just subbing in a directory listing instead of the processed file table.

If this helps, please consider Accepting as a solution to help other people find it more easily.

View solution in original post

lbendlin · ‎02-28-2025

When you move files into the Lakehouse are they not automatically written into Parquet/delta format?

spencer_sa · ‎02-28-2025

We do this in three ways;
1) if we have local copies of the files, we have a folder structure that has an unprocessed folder and a processed folder. A pipeline with a Get Metadata activity on the unprocessed folder feeding a For Each activity. Inside that is a Copy Data activity and some file copy/delete steps. (No Move File activity 😞 )
2) if all we have is a shortcut and we don't want to copy the files, we maintain a list of processed files in a table (using a notebook). In a second notebook, we then do a left anti join on the files in the shortcut against the processed list and output the list of unprocessed files. This then feeds a For Each activity to do the Copy Data and then using the first notebook append the processed file.
3) If we use a shortcut *and* keep a local copy of the processed files, then we can do somethin like 2), just subbing in a directory listing instead of the processed file table.

If this helps, please consider Accepting as a solution to help other people find it more easily.

Best practice for moving data from files to delta table in Fabric

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

FabCon is coming to Atlanta

Best practice for moving data from files to delta table in Fabric

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026