This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreJoin the FabCon + SQLCon recap series. Up next: Power BI, Real-Time Intelligence, IQ and AI, and Data Factory take center stage. All sessions are available on-demand after the live show. Register now
Hi,
I'm working on a Data Pipeline that loads data into a Dataverse table. I do a row compare to detect changes between loads, so I am only loading rows that have changed.
Is there anyway to hash the concat of rows? At the moment it seems I can only do plain-text and then convert it to Binary. Hashing would help save on space.
Solved! Go to Solution.
Hi @adamlob,
In Dataflow Gen2 / Fabric Data Pipeline (Data Factory), there’s no native “Hash” transformation yet.
You can use Notebooks to do that :
from pyspark.sql.functions import sha2, concat_ws
df_hashed = df.withColumn(
"row_hash",
sha2(concat_ws("|", *df.columns), 256)
)
Then save it back to your Lakehouse table and use that hash for change-detection.
✅ Benefits:
Very fast and scalable,
Produces fixed-length SHA-256 strings (~64 chars),
Easy to use as a comparison key.
Doc :
- https://spark.apache.org/docs/latest/api/sql/index.html#sha2
Hope it can help you !
Best regards,
Antoine
Hi @adamlob,
In Dataflow Gen2 / Fabric Data Pipeline (Data Factory), there’s no native “Hash” transformation yet.
You can use Notebooks to do that :
from pyspark.sql.functions import sha2, concat_ws
df_hashed = df.withColumn(
"row_hash",
sha2(concat_ws("|", *df.columns), 256)
)
Then save it back to your Lakehouse table and use that hash for change-detection.
✅ Benefits:
Very fast and scalable,
Produces fixed-length SHA-256 strings (~64 chars),
Easy to use as a comparison key.
Doc :
- https://spark.apache.org/docs/latest/api/sql/index.html#sha2
Hope it can help you !
Best regards,
Antoine
| User | Count |
|---|---|
| 8 | |
| 7 | |
| 6 | |
| 5 | |
| 5 |
| User | Count |
|---|---|
| 34 | |
| 18 | |
| 12 | |
| 10 | |
| 6 |