The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
All
Is it possible to put constraints on a delta parquet file. I'm experiencing scenarios where data is duplicating. On sql server I would have a primary key to manage duplication by erroring. I have adapted my code to try and prevent duplication. However in some cases it is due to incorrect data in the source system. I dont want to hide this and want the error to surface so it can be fixed in the source system. Therefore can I apply a constraint to a parquet file or do I have to manage this with code?
Thanks
Solved! Go to Solution.
Hi @MisterSmith ,
You can use Delta Lake's merge operation to de-duplicate data. This operation allows you to merge new data into an existing Delta table and specify conditions to handle duplicate data.
You can try sql like below:
MERGE INTO logs
USING newDedupedLogs
ON logs.uniqueId = newDedupedLogs.uniqueId
WHEN NOT MATCHED
THEN INSERT *
For more details, you can refer to below document:
Upsert into a Delta Lake table using merge - Azure Databricks | Microsoft Learn
Best Regards,
Adamk Kong
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @MisterSmith ,
You can use Delta Lake's merge operation to de-duplicate data. This operation allows you to merge new data into an existing Delta table and specify conditions to handle duplicate data.
You can try sql like below:
MERGE INTO logs
USING newDedupedLogs
ON logs.uniqueId = newDedupedLogs.uniqueId
WHEN NOT MATCHED
THEN INSERT *
For more details, you can refer to below document:
Upsert into a Delta Lake table using merge - Azure Databricks | Microsoft Learn
Best Regards,
Adamk Kong
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.