Solved: Copy data de duplication

Elisa112 · ‎07-23-2024

Hello, I am very new to synapse and am tasked with getting data from cloud azure database to a dedicated pool. So far I have created a copy task to bring the data in, so far so good, the copy task is now populating tables (around 20 tables) , however now the issue is duplication and increasing data. What is the best option for deduplicating the tables, should I now create pipeline tasks to drop the tables each day, since a full data refresh is needed daily and if so how should this be done in the simplest and most efficient way?

All assistance greatly received.

Thank you

Anonymous · ‎07-23-2024

HI @Elisa112,

Perhaps you can try to invoke the query editor for further data cleanup in the data pipeline if it suitable for your requirements:

Use a dataflow in a pipeline - Microsoft Fabric | Microsoft Learn

Regards,

Xiaoxin Sheng

View solution in original post

Element115 · ‎04-04-2025

@Elisa112 You need to implement an incremental copy algorithm so you only ingest new data every day instead of ingesting everything from scratch again, day after day.