Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Did you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now

Reply
arpost
Post Prodigy
Post Prodigy

Replicating Move behavior in Fabric pipelines

Greetings, community. Mainly posting this thread to create awareness around this idea (Add Move behavior for Copy activity in Fabric pipe... - Microsoft Fabric Community) asking for a metadata-only Move operation that can be performed on Azure Storage, Lakehouse, and so on. I know these capabilities exist; they just aren't available in pipelines, which is unfortunate.

 

I'll share the scenario we're facing. We deal with large volumes of files each day. Each file may have 2-4+ location changes over the lifetime of the file:

  1. Copy #1: Copy from Azure Storage to Lakehouse
  2. Move #1: Archive SFTP file in archive folder
  3. Move #2: Archive Lakehouse file after processing
  4. Move #3: Archive exported file after processing

Currently, all of these steps consist of Copy + Delete operations, which means we incur significant capacity cost at scale just because we have to literally duplicate and then delete files. If there were a simple metadata-only Move option, that would significantly help, but there isn't without doing a decent amount of custom coding in a Notebook to tap into file-system utilities.

 

Has anyone else faced this issue? If so, how have you approached a Move-only behavior in your pipelines? Also, be sure to upvote so this idea gets on the Fabric team's radar.

1 ACCEPTED SOLUTION
4iurchenko
Resolver I
Resolver I

Hi @arpost 

When we were implementing this sort of behaviour, I typically used mv operation in the Spark notebook.

 

The link on the library:

https://learn.microsoft.com/en-us/fabric/data-engineering/microsoft-spark-utilities

 

While I can't guarantee how it works under the hood, but the "mv" command, it seems, does exactly what you want.

 

Also we observed using notebooks is very convenient way for that sort of operations. While you are mentioning the Data Pipeline, this approach is slightly different. But anyway, we didn't have any problems with that. Also, due to coding essence of the notebook, we noticed, it is much easier to configure exception handling during this sort of operations.

 

Conclusion. If notebook way is good for you, I strongly suggest to try it. 

 

Thanks for good question and proactive position. I hope it helps. Kudo (like) and making answer as a solution will help me and others to contribute and use that contribution more effectively.

 

BR, Yurri

View solution in original post

4 REPLIES 4
v-karpurapud
Community Support
Community Support

Hi @arpost 

We have not received a response from you regarding the query and were following up to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions.

 

Thank You.

 

v-karpurapud
Community Support
Community Support

Hi @arpost 

Thank you for reaching out to the Microsoft Fabric Community Forum. Also, thanks to @4iurchenko  and @deborshi_nag for those inputs on this thread.

Could you let us know if the suggested solution resolved your issue? If not, please share any additional details so we can assist further.

Best regards,
Community Support Team.


deborshi_nag
Community Champion
Community Champion

Hello @arpost 

 

I’d recommend moving each file only once into a Lakehouse landing folder (for example via SFTP/FTP) and then treating that landing zone as immutable. After this initial move, we avoid any further physical file movement.

 

Incremental ingestion using Spark
From the landing area, I’d use Spark Structured Streaming with checkpointing (Auto Loader pattern) to ingest files into the Bronze layer. The checkpoint becomes the source of truth for which files have already been processed.

 

How this avoids reprocessing
Rather than moving or deleting files to signal progress, Spark uses the checkpoint to automatically skip files it has already ingested. This allows landing files to remain in place without any risk of duplicate processing.

 

Removing old files with retention policies
To control storage growth, I’d apply retention policies on the landing folders to automatically delete files after a defined number of days. This provides clean-up and compliance without introducing archive moves or extra compute cost.

 

Why this is better than a move-based approach
Compared to copy‑and‑delete “moves”, this significantly reduces Fabric capacity and IO cost by eliminating repeated data duplication. State is tracked logically via checkpoints, not physically by moving files around.

 

Operational benefits
Overall, this keeps the architecture simpler, more reliable, and easier to scale. It reduces custom filesystem code, lowers operational risk, and aligns well with modern Lakehouse ingestion best practices.

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.
4iurchenko
Resolver I
Resolver I

Hi @arpost 

When we were implementing this sort of behaviour, I typically used mv operation in the Spark notebook.

 

The link on the library:

https://learn.microsoft.com/en-us/fabric/data-engineering/microsoft-spark-utilities

 

While I can't guarantee how it works under the hood, but the "mv" command, it seems, does exactly what you want.

 

Also we observed using notebooks is very convenient way for that sort of operations. While you are mentioning the Data Pipeline, this approach is slightly different. But anyway, we didn't have any problems with that. Also, due to coding essence of the notebook, we noticed, it is much easier to configure exception handling during this sort of operations.

 

Conclusion. If notebook way is good for you, I strongly suggest to try it. 

 

Thanks for good question and proactive position. I hope it helps. Kudo (like) and making answer as a solution will help me and others to contribute and use that contribution more effectively.

 

BR, Yurri

Helpful resources

Announcements
April Fabric Update Carousel

Fabric Monthly Update - April 2026

Check out the April 2026 Fabric update to learn about new features.

Fabric SQL PBI Data Days

Data Days 2026 coming soon!

Sign up to receive a private message when registration opens and key events begin.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.