Solved: Re: How does Destination > Overwrite with a partit...

gregbortolotti · ‎08-23-2024

Hello,

I would like to understand how does a partition column work.

For the context, I have a pipeline that goes from a MySQL database to a Lakehouse, I would like to do is an upsert (with a query that only get the lines updated based on an update date).

Back to my pipeline, I choose Overwrite and a partition column on my ID. My thought was that it would just "upsert" the data, so insert the news and update the olds.
What it is doing is just overwriting everything, so I only have the result of my request.

Is it a misconfiguration ?

What is a point of choosing a partition columns ?

Also, Append with a partition column is not an option, it is not updating the olds lines if my understanding is correct.

Thank you for your time,
Sincerely,

Greg

jwinchell40 · ‎08-25-2024

@gregbortolotti - Overwrite with a Partition specific will overwrite the entire partition, it does not perform a merge. Likewise, the Append will add new rows the partition but will not update the old versions. You can do a merge however in Notebooks, we do a lot of that with Partitioned and Non-Partitioned data, link to the Databricks documentation (still applies to Fabric, I just like the documentation better).

Merge Into

Partitioning for very large tables to help speed up data access activities based on the partition field. Think of it as grouping data/transactions together. If you have transactional data that doesn't change, sometimes you would partition that by Transaction Date; that way when looking for Transactions in that time period it only queries that Partition vs the whole dataset.

View solution in original post

frithjof_v · ‎08-25-2024

Some more options for how to update or upsert by using the Delta Lake Python API in a Notebook:

https://docs.delta.io/latest/delta-update.html

View solution in original post

gregbortolotti · ‎08-27-2024

Thank you very much for explanation. Now it is clearer to me now.
Best regards

frithjof_v · ‎08-27-2024

Here is also a Fabric Idea to get the Upsert functionality natively in Data Pipeline.

Consider voting to highlight this need:

Support UPSERTs and DELETEs when copying data into Lakehouse Tables from Pipeline copy activity, as opposed to Appending new rows

https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=87f3d168-6022-ee11-a81c-6045bdc01ce4

jwinchell40 · ‎08-25-2024

@gregbortolotti - Overwrite with a Partition specific will overwrite the entire partition, it does not perform a merge. Likewise, the Append will add new rows the partition but will not update the old versions. You can do a merge however in Notebooks, we do a lot of that with Partitioned and Non-Partitioned data, link to the Databricks documentation (still applies to Fabric, I just like the documentation better).

Merge Into

Partitioning for very large tables to help speed up data access activities based on the partition field. Think of it as grouping data/transactions together. If you have transactional data that doesn't change, sometimes you would partition that by Transaction Date; that way when looking for Transactions in that time period it only queries that Partition vs the whole dataset.

frithjof_v · ‎08-25-2024

Some more options for how to update or upsert by using the Delta Lake Python API in a Notebook:

https://docs.delta.io/latest/delta-update.html

How does Destination > Overwrite with a partition columns work ?

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

How does Destination > Overwrite with a partition columns work ?

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025