Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM. Register now.

Reply
adamlob
Frequent Visitor

Hash Function for Row Compare

Hi,

 

I'm working on a Data Pipeline that loads data into a Dataverse table. I do a row compare to detect changes between loads, so I am only loading rows that have changed.

 

Is there anyway to hash the concat of rows? At the moment it seems I can only do plain-text and then convert it to Binary. Hashing would help save on space.

1 ACCEPTED SOLUTION
AntoineW
Solution Sage
Solution Sage

Hi @adamlob,

 

In Dataflow Gen2 / Fabric Data Pipeline (Data Factory), there’s no native “Hash” transformation yet.

You can use Notebooks to do that : 

 

from pyspark.sql.functions import sha2, concat_ws

df_hashed = df.withColumn(
"row_hash",
sha2(concat_ws("|", *df.columns), 256)
)

 

Then save it back to your Lakehouse table and use that hash for change-detection.

Benefits:

  • Very fast and scalable,

  • Produces fixed-length SHA-256 strings (~64 chars),

  • Easy to use as a comparison key.

Doc : 

https://spark.apache.org/docs/latest/api/sql/index.html#sha2

 

 

Hope it can help you !

Best regards,

Antoine

View solution in original post

1 REPLY 1
AntoineW
Solution Sage
Solution Sage

Hi @adamlob,

 

In Dataflow Gen2 / Fabric Data Pipeline (Data Factory), there’s no native “Hash” transformation yet.

You can use Notebooks to do that : 

 

from pyspark.sql.functions import sha2, concat_ws

df_hashed = df.withColumn(
"row_hash",
sha2(concat_ws("|", *df.columns), 256)
)

 

Then save it back to your Lakehouse table and use that hash for change-detection.

Benefits:

  • Very fast and scalable,

  • Produces fixed-length SHA-256 strings (~64 chars),

  • Easy to use as a comparison key.

Doc : 

https://spark.apache.org/docs/latest/api/sql/index.html#sha2

 

 

Hope it can help you !

Best regards,

Antoine

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

September Fabric Update Carousel

Fabric Monthly Update - September 2025

Check out the September 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.