March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hello,
I would like to understand how does a partition column work.
For the context, I have a pipeline that goes from a MySQL database to a Lakehouse, I would like to do is an upsert (with a query that only get the lines updated based on an update date).
Back to my pipeline, I choose Overwrite and a partition column on my ID. My thought was that it would just "upsert" the data, so insert the news and update the olds.
What it is doing is just overwriting everything, so I only have the result of my request.
Is it a misconfiguration ?
What is a point of choosing a partition columns ?
Also, Append with a partition column is not an option, it is not updating the olds lines if my understanding is correct.
Thank you for your time,
Sincerely,
Greg
Solved! Go to Solution.
@gregbortolotti - Overwrite with a Partition specific will overwrite the entire partition, it does not perform a merge. Likewise, the Append will add new rows the partition but will not update the old versions. You can do a merge however in Notebooks, we do a lot of that with Partitioned and Non-Partitioned data, link to the Databricks documentation (still applies to Fabric, I just like the documentation better).
Partitioning for very large tables to help speed up data access activities based on the partition field. Think of it as grouping data/transactions together. If you have transactional data that doesn't change, sometimes you would partition that by Transaction Date; that way when looking for Transactions in that time period it only queries that Partition vs the whole dataset.
Some more options for how to update or upsert by using the Delta Lake Python API in a Notebook:
Thank you very much for explanation. Now it is clearer to me now.
Best regards
Here is also a Fabric Idea to get the Upsert functionality natively in Data Pipeline.
Consider voting to highlight this need:
Support UPSERTs and DELETEs when copying data into Lakehouse Tables from Pipeline copy activity, as opposed to Appending new rows
https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=87f3d168-6022-ee11-a81c-6045bdc01ce4
@gregbortolotti - Overwrite with a Partition specific will overwrite the entire partition, it does not perform a merge. Likewise, the Append will add new rows the partition but will not update the old versions. You can do a merge however in Notebooks, we do a lot of that with Partitioned and Non-Partitioned data, link to the Databricks documentation (still applies to Fabric, I just like the documentation better).
Partitioning for very large tables to help speed up data access activities based on the partition field. Think of it as grouping data/transactions together. If you have transactional data that doesn't change, sometimes you would partition that by Transaction Date; that way when looking for Transactions in that time period it only queries that Partition vs the whole dataset.
Some more options for how to update or upsert by using the Delta Lake Python API in a Notebook:
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
6 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
13 | |
7 | |
7 | |
5 | |
4 |