Solved: I want to know how to update and insert data witho...

minseok_kr · ‎07-30-2024

There is customer data in Azure SQL DB. Each customer stored here has a Guid value. I want to incrementally load this customer data into a Lakehouse.
If the customer data is modified, I want to keep the Guid value and only update the changed information. If new customer data is created, I want to simply insert it.
I want to configure this using all the resources of Fabric without using Query statements.
Is there a method that can be used universally in various projects?

smpa01 · ‎08-02-2024

If you are building a managed delta table in a lakehouse to bring the data from Sql, you can use Delta Table API in notebook to perform a fully qualified UPSERT using merge between source and target delta table.

from delta.tables import *

deltaTablePeople = DeltaTable.forPath(spark, '/tmp/delta/people-10m')
deltaTablePeopleUpdates = DeltaTable.forPath(spark, '/tmp/delta/people-10m-updates')

dfUpdates = deltaTablePeopleUpdates.toDF()

deltaTablePeople.alias('people') \
  .merge(
    dfUpdates.alias('updates'),
    'people.id = updates.id'
  ) \
  .whenMatchedUpdate(set =
    {
      "id": "updates.id",
      "firstName": "updates.firstName",
      "middleName": "updates.middleName",
      "lastName": "updates.lastName",
      "gender": "updates.gender",
      "birthDate": "updates.birthDate",
      "ssn": "updates.ssn",
      "salary": "updates.salary"
    }
  ) \
  .whenNotMatchedInsert(values =
    {
      "id": "updates.id",
      "firstName": "updates.firstName",
      "middleName": "updates.middleName",
      "lastName": "updates.lastName",
      "gender": "updates.gender",
      "birthDate": "updates.birthDate",
      "ssn": "updates.ssn",
      "salary": "updates.salary"
    }
  ) \
  .execute()

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

My custom visualization projects

Plotting Live Sound: Viz1

Beautiful News:Viz1, Viz2, Viz3

Visual Capitalist: Working Hrs

Others:Easing Graph, Animated Calendar

View solution in original post

smpa01 · ‎08-02-2024

If you are building a managed delta table in a lakehouse to bring the data from Sql, you can use Delta Table API in notebook to perform a fully qualified UPSERT using merge between source and target delta table.

from delta.tables import *

deltaTablePeople = DeltaTable.forPath(spark, '/tmp/delta/people-10m')
deltaTablePeopleUpdates = DeltaTable.forPath(spark, '/tmp/delta/people-10m-updates')

dfUpdates = deltaTablePeopleUpdates.toDF()

deltaTablePeople.alias('people') \
  .merge(
    dfUpdates.alias('updates'),
    'people.id = updates.id'
  ) \
  .whenMatchedUpdate(set =
    {
      "id": "updates.id",
      "firstName": "updates.firstName",
      "middleName": "updates.middleName",
      "lastName": "updates.lastName",
      "gender": "updates.gender",
      "birthDate": "updates.birthDate",
      "ssn": "updates.ssn",
      "salary": "updates.salary"
    }
  ) \
  .whenNotMatchedInsert(values =
    {
      "id": "updates.id",
      "firstName": "updates.firstName",
      "middleName": "updates.middleName",
      "lastName": "updates.lastName",
      "gender": "updates.gender",
      "birthDate": "updates.birthDate",
      "ssn": "updates.ssn",
      "salary": "updates.salary"
    }
  ) \
  .execute()

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

My custom visualization projects

Plotting Live Sound: Viz1

Beautiful News:Viz1, Viz2, Viz3

Visual Capitalist: Working Hrs

Others:Easing Graph, Animated Calendar

NandanHegde · ‎07-30-2024

unfortunately UPSERT option is not available in Copy / Dataflow gen2 with lakehouse as a sink.

you would have write some custom logic to handle the same , Below blog provides some details :

https://www.linkedin.com/pulse/implementing-incremental-load-lakehouse-table-insert-update-malshan-b...

----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

I want to know how to update and insert data without changing or deleting the Guid values.

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - February 2025

Fabric Community Update - February 2025

New Offer! Become a Certified Fabric Data Engineer

I want to know how to update and insert data without changing or deleting the Guid values.

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - February 2025

Fabric Community Update - February 2025