March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hello,
Currently I have two tables in a lakehouse and I am trying to complete some updates to the main table. Let's call them metadata and d_metadata.
Metadata contains all data to date, and is the main table our teams use for reporting. d_metadata contains data loaded from source, where the records were updated in the previous day.
Both tables are currently in a lakehouse, and right now, we are using a Dataflow Gen2 to load the metadata table where rows do not match rows in the d_metadata table (joined based on a unique id).
This is returning the old records from metadata, and then I am appending updated records from d_metadata and pushing the final results the metadata table.
I would like to appraoch this differently by simply deleting the updated records in metadata, and then appending d_metadata to avoid doing a full overwrite of what is a very large table.
I cannot find a way to achieve this within a Fabric lakehouse, and other Spark SQL suggestions have not worked.
Can someone help guide me in the right direction? Thanks in advance 🙂
Solved! Go to Solution.
Dataflows only offers two data destination operations:
In that sense, it doesn't support any sort of delete operations or "Upserts" to update existing records.
If your destination was a SQL Database, you could perhaps leverage a combination of a Data Pipeline, a Dataflow Gen2 and a SQL Stored procedure to perhaps load the data inside of a "staging table" in your SQL Database and then run a stored procedure that will take that table and figure out, based on your own logic, what records to delete, update or add to your main table.
I'm not sure how feasible that is with a Lakehouse as a destination, but you could try asking more questions specific to Lakehouse in the Lakehouse community forum (link below):
Hi @elavery95 ,
Thanks for using Fabric Community.
Can you please check this - Pattern to incrementally amass data with Dataflow Gen2 - Microsoft Fabric | Microsoft Learn
I suggest you incremental refresh rather than maintaining 2 tables.
FYI: We will have more features in upcoming releases.
Docs to refer - What's new and planned for Data Factory in Microsoft Fabric - Microsoft Fabric | Microsoft Learn
Hope this is helpful. Please let me know incase of further queries.
Hi @elavery95 ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help .
Hi @elavery95 ,
We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .
Thanks
Dataflows only offers two data destination operations:
In that sense, it doesn't support any sort of delete operations or "Upserts" to update existing records.
If your destination was a SQL Database, you could perhaps leverage a combination of a Data Pipeline, a Dataflow Gen2 and a SQL Stored procedure to perhaps load the data inside of a "staging table" in your SQL Database and then run a stored procedure that will take that table and figure out, based on your own logic, what records to delete, update or add to your main table.
I'm not sure how feasible that is with a Lakehouse as a destination, but you could try asking more questions specific to Lakehouse in the Lakehouse community forum (link below):
User | Count |
---|---|
3 | |
2 | |
2 | |
1 | |
1 |