Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more

Reply
adamlob
Advocate I
Advocate I

Hash Function for Row Compare

Hi,

 

I'm working on a Data Pipeline that loads data into a Dataverse table. I do a row compare to detect changes between loads, so I am only loading rows that have changed.

 

Is there anyway to hash the concat of rows? At the moment it seems I can only do plain-text and then convert it to Binary. Hashing would help save on space.

1 ACCEPTED SOLUTION
AntoineW
Super User
Super User

Hi @adamlob,

 

In Dataflow Gen2 / Fabric Data Pipeline (Data Factory), there’s no native “Hash” transformation yet.

You can use Notebooks to do that : 

 

from pyspark.sql.functions import sha2, concat_ws

df_hashed = df.withColumn(
"row_hash",
sha2(concat_ws("|", *df.columns), 256)
)

 

Then save it back to your Lakehouse table and use that hash for change-detection.

Benefits:

  • Very fast and scalable,

  • Produces fixed-length SHA-256 strings (~64 chars),

  • Easy to use as a comparison key.

Doc : 

https://spark.apache.org/docs/latest/api/sql/index.html#sha2

 

 

Hope it can help you !

Best regards,

Antoine

View solution in original post

1 REPLY 1
AntoineW
Super User
Super User

Hi @adamlob,

 

In Dataflow Gen2 / Fabric Data Pipeline (Data Factory), there’s no native “Hash” transformation yet.

You can use Notebooks to do that : 

 

from pyspark.sql.functions import sha2, concat_ws

df_hashed = df.withColumn(
"row_hash",
sha2(concat_ws("|", *df.columns), 256)
)

 

Then save it back to your Lakehouse table and use that hash for change-detection.

Benefits:

  • Very fast and scalable,

  • Produces fixed-length SHA-256 strings (~64 chars),

  • Easy to use as a comparison key.

Doc : 

https://spark.apache.org/docs/latest/api/sql/index.html#sha2

 

 

Hope it can help you !

Best regards,

Antoine

Helpful resources

Announcements
Fabric Data Days is here Carousel

Fabric Data Days 2026

Don't miss out on Data Days, June 15 through August 7. Learn Fabric, Power BI, SQL, AI and more.

June Fabric Update Carousel

Fabric Monthly Update - June 2026

Check out the June 2026 Fabric update to learn about new features.