Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
minseok_kr
Regular Visitor

I want to know how to update and insert data without changing or deleting the Guid values.

There is customer data in Azure SQL DB. Each customer stored here has a Guid value. I want to incrementally load this customer data into a Lakehouse.
If the customer data is modified, I want to keep the Guid value and only update the changed information. If new customer data is created, I want to simply insert it.
I want to configure this using all the resources of Fabric without using Query statements.
Is there a method that can be used universally in various projects?

1 ACCEPTED SOLUTION
smpa01
Super User
Super User

If you are building a managed delta table in a lakehouse to bring the data from Sql, you can use Delta Table API in notebook to perform a fully qualified UPSERT using merge between source and target delta table.

from delta.tables import *

deltaTablePeople = DeltaTable.forPath(spark, '/tmp/delta/people-10m')
deltaTablePeopleUpdates = DeltaTable.forPath(spark, '/tmp/delta/people-10m-updates')

dfUpdates = deltaTablePeopleUpdates.toDF()

deltaTablePeople.alias('people') \
  .merge(
    dfUpdates.alias('updates'),
    'people.id = updates.id'
  ) \
  .whenMatchedUpdate(set =
    {
      "id": "updates.id",
      "firstName": "updates.firstName",
      "middleName": "updates.middleName",
      "lastName": "updates.lastName",
      "gender": "updates.gender",
      "birthDate": "updates.birthDate",
      "ssn": "updates.ssn",
      "salary": "updates.salary"
    }
  ) \
  .whenNotMatchedInsert(values =
    {
      "id": "updates.id",
      "firstName": "updates.firstName",
      "middleName": "updates.middleName",
      "lastName": "updates.lastName",
      "gender": "updates.gender",
      "birthDate": "updates.birthDate",
      "ssn": "updates.ssn",
      "salary": "updates.salary"
    }
  ) \
  .execute()

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

View solution in original post

2 REPLIES 2
smpa01
Super User
Super User

If you are building a managed delta table in a lakehouse to bring the data from Sql, you can use Delta Table API in notebook to perform a fully qualified UPSERT using merge between source and target delta table.

from delta.tables import *

deltaTablePeople = DeltaTable.forPath(spark, '/tmp/delta/people-10m')
deltaTablePeopleUpdates = DeltaTable.forPath(spark, '/tmp/delta/people-10m-updates')

dfUpdates = deltaTablePeopleUpdates.toDF()

deltaTablePeople.alias('people') \
  .merge(
    dfUpdates.alias('updates'),
    'people.id = updates.id'
  ) \
  .whenMatchedUpdate(set =
    {
      "id": "updates.id",
      "firstName": "updates.firstName",
      "middleName": "updates.middleName",
      "lastName": "updates.lastName",
      "gender": "updates.gender",
      "birthDate": "updates.birthDate",
      "ssn": "updates.ssn",
      "salary": "updates.salary"
    }
  ) \
  .whenNotMatchedInsert(values =
    {
      "id": "updates.id",
      "firstName": "updates.firstName",
      "middleName": "updates.middleName",
      "lastName": "updates.lastName",
      "gender": "updates.gender",
      "birthDate": "updates.birthDate",
      "ssn": "updates.ssn",
      "salary": "updates.salary"
    }
  ) \
  .execute()

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
NandanHegde
Super User
Super User

unfortunately UPSERT option is not available in Copy / Dataflow gen2 with lakehouse as a sink.

you would have write some custom logic to handle the same , Below blog provides some details :

https://www.linkedin.com/pulse/implementing-incremental-load-lakehouse-table-insert-update-malshan-b...




----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.