Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

60 Days of Data Days! Live and on-demand sessions, challenges, study groups and more! And it's all FREE!. Join now. Learn more

Reply
joanfatz
Frequent Visitor

Using GUIDs

I have a dataframe that I assign a GUID as an ID column, then you have to cache the dataframe so that the ID retains it's value.

This has been working since May, then in mid November 2026, it started regenerating the GUID value causing all sorts of issues.

 

Is there a limit to the number of records in a dataframe that a GUID will retain it's value? 

I know that is it regenerating, because after the GUID is added to the dataframe, I write that dataframe out to a table, then that same dataframe is used to send the records to an API.  The return from the API contains a different GUID in the message. 

 

Since this is not working for me I changed my process not to use the GUID anymore, but I would like to know  what caused it 

to stop working as it had been for the last 7 months.

1 ACCEPTED SOLUTION
Vinodh247
Super User
Super User

There is no record-count limit where a GUID suddenly stops being stable. What likely changed is execution behaviour, not data size. In distributed engines such as Spark (including environments like Databricks/Microsoft Fabric), functions that generate GUIDs or UUIDs are non-deterministic. If the dataframe is recomputed (due to cache eviction, lineage re-evaluation, optimisation changes, cluster restart, or engine version update), the GUID column is recalculated and new values appear. That explains why it worked earlier and then changed after a platform/runtime update or caching behaviour shift in november. The fix is to materialise immediately after generation (write to storage or checkpoint) instead of relying on cache to freeze non-deterministic columns.

 

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Regards,
Vinodh
Microsoft MVP [Fabric]
LI: https://www.linkedin.com/in/vinodh-kumar-173582132
Blog: vinsdata.in/blog

View solution in original post

1 REPLY 1
Vinodh247
Super User
Super User

There is no record-count limit where a GUID suddenly stops being stable. What likely changed is execution behaviour, not data size. In distributed engines such as Spark (including environments like Databricks/Microsoft Fabric), functions that generate GUIDs or UUIDs are non-deterministic. If the dataframe is recomputed (due to cache eviction, lineage re-evaluation, optimisation changes, cluster restart, or engine version update), the GUID column is recalculated and new values appear. That explains why it worked earlier and then changed after a platform/runtime update or caching behaviour shift in november. The fix is to materialise immediately after generation (write to storage or checkpoint) instead of relying on cache to freeze non-deterministic columns.

 

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Regards,
Vinodh
Microsoft MVP [Fabric]
LI: https://www.linkedin.com/in/vinodh-kumar-173582132
Blog: vinsdata.in/blog

Helpful resources

Announcements
FabCon and SQLCon Barcelona 2026

FabCon & SQLCon – Barcelona 2026

Join us in Barcelona for FabCon and SQLCon, the Fabric, Power BI, SQL, and AI community event. Save €200 with code FABCMTY200.

60 days of Data Days Carousel

Data Days 2026

Join Fabric Data Days 2026: 60 days of free live/on-demand sessions, challenges, study groups, and certification opportunities.

June Fabric Update Carousel

Fabric Monthly Update - June 2026

Check out the June 2026 Fabric update to learn about new features.