Get certified for free when you join Fabric Data Days 2026 and dive into Fabric, Power BI, SQL, AI, and other essential data skills.
Join nowData Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more
Hi,
I'm working on a Data Pipeline that loads data into a Dataverse table. I do a row compare to detect changes between loads, so I am only loading rows that have changed.
Is there anyway to hash the concat of rows? At the moment it seems I can only do plain-text and then convert it to Binary. Hashing would help save on space.
Solved! Go to Solution.
Hi @adamlob,
In Dataflow Gen2 / Fabric Data Pipeline (Data Factory), there’s no native “Hash” transformation yet.
You can use Notebooks to do that :
from pyspark.sql.functions import sha2, concat_ws
df_hashed = df.withColumn(
"row_hash",
sha2(concat_ws("|", *df.columns), 256)
)
Then save it back to your Lakehouse table and use that hash for change-detection.
✅ Benefits:
Very fast and scalable,
Produces fixed-length SHA-256 strings (~64 chars),
Easy to use as a comparison key.
Doc :
- https://spark.apache.org/docs/latest/api/sql/index.html#sha2
Hope it can help you !
Best regards,
Antoine
Hi @adamlob,
In Dataflow Gen2 / Fabric Data Pipeline (Data Factory), there’s no native “Hash” transformation yet.
You can use Notebooks to do that :
from pyspark.sql.functions import sha2, concat_ws
df_hashed = df.withColumn(
"row_hash",
sha2(concat_ws("|", *df.columns), 256)
)
Then save it back to your Lakehouse table and use that hash for change-detection.
✅ Benefits:
Very fast and scalable,
Produces fixed-length SHA-256 strings (~64 chars),
Easy to use as a comparison key.
Doc :
- https://spark.apache.org/docs/latest/api/sql/index.html#sha2
Hope it can help you !
Best regards,
Antoine