Solved: How to replace string keys with integers in Direct...

jaryszek · ‎11-05-2025

Hi everyone,

I’m working with DirectLake over OneLake, and I’m trying to optimize relationships in my model by replacing string keys with integer keys — so the VertiPaq engine can store them more efficiently in its dictionary.

Right now, my key columns look like this (example, not real data):

2025-04-01_/subscriptions/xxxxxx/resourcegroups/demo-storage/providers/microsoft.storage/storageaccounts/abc123

These are long text values coming from a data source (e.g., Azure resource IDs).

I’d like to understand:

1. What’s the recommended integer size/type (INT32, INT64, etc.) for relationship keys in DirectLake / VertiPaq

2. What’s the best way to convert such long string IDs into integers — for example, should I use a hash function or generate a numeric surrogate key?

3. Is there any Microsoft documentation that explains what kind of string values can (or should) be converted to integer keys for VertiPaq dictionary encoding?

My goal is to reduce memory usage and improve performance, but I want to make sure I’m doing it in a supported and efficient way.

Any best practices or doc references would be really helpful — especially for DirectLake models.

Thanks!

Best,
Jacek

DataVitalizer · ‎11-05-2025

Hi @jaryszek

If you want your model to run leaner and faster, swap those long text IDs for simple numbers, I suggest using a big integer (INT64) so you never run into size limitsm don’t bother hashing the strings that just adds complexity without saving much, Instead create a surrogate key in your data prep (like an auto‑number or mapping table) and use that for relationships. Keep the original long string only as a descriptive column if you still need to display it, this way, VertiPaq/DirectLake stores less, compresses better, and queries run quicker.

Did it work? 👍 A kudos would be appreciated
🟨 Mark it as a solution to help spread knowledge 💡

🟩 Follow me on LinkedIn

View solution in original post

v-tejrama · ‎11-06-2025

Hi @jaryszek ,

You are on the right path with your approach. Introducing an integer surrogate key to the Resources table is an effective and efficient solution, as both DirectLake and VertiPaq process integer relationships much better than long text strings. This allows your fact tables to join using a compact numeric value, optimizing model size, memory usage, and query performance. Using a 64-bit integer (BIGINT) is recommended to avoid size constraints as your data grows. Using a hash function is unnecessary in this scenario, as it adds complexity without improving compression. If modifying the main Resources table is not possible, creating a lookup table to map string IDs to integer keys is a good alternative, though adding the surrogate key directly is typically more straightforward. Retain the original string ID as a descriptive column if needed, but ensure relationships are based on the numeric key. This method aligns with Microsoft’s modeling recommendations for VertiPaq and DirectLake and will deliver optimal performance.

Best Regards,
Tejaswi.
Community Support

jaryszek · ‎11-06-2025

thank you,

1. "This method aligns with Microsoft’s modeling recommendations for VertiPaq and DirectLake and will deliver optimal performance."

do you have docs for it?

2. If modifying the main Resources table is not possible, creating a lookup table to map string IDs to integer keys is a good alternative

Ok this is important. what does it mean is not possible? It where to create look up table if i will keep my Resources table also in a model?

Best,
Jacek

v-tejrama · ‎11-06-2025

Hi @jaryszek ,

That’s a great follow-up, and you’re asking exactly the right questions.

When I mentioned that this approach aligns with Microsoft’s recommendations, the guidance comes from the broader documentation around model design and performance in Power BI and Fabric. The key idea is that Direct Lake relies on the VertiPaq engine under the hood, and VertiPaq performs best when relationships are based on numeric surrogate keys rather than long text strings. This is part of the general modeling best practices described in Microsoft’s official guidance on star schema design and VertiPaq optimization. You can find more details in these Microsoft articles:
https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-overview?
https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-understand-storage
https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-analyze-query-processing?
These exp...

Regarding the lookup table, if modifying the main Resources table is not possible simply means situations where you don’t have control over that table, for example when it’s sourced from another system or managed by another team and you can’t add new columns. In that case, you can create a small lookup or mapping table in your data preparation layer, such as in your Lakehouse or Dataflow. This table would store the integer key and the original resource ID, and your fact tables would join to it using the numeric key. You can still keep your original Resources table in the model for descriptive fields or metadata, but make sure that the relationships for filtering and joins use the numeric key from the lookup. This gives you the same performance benefit while preserving the original string values for reporting or reference.

Best Regards,
Tejaswi.
Community Support

DataVitalizer · ‎11-05-2025

Hi @jaryszek

If you want your model to run leaner and faster, swap those long text IDs for simple numbers, I suggest using a big integer (INT64) so you never run into size limitsm don’t bother hashing the strings that just adds complexity without saving much, Instead create a surrogate key in your data prep (like an auto‑number or mapping table) and use that for relationships. Keep the original long string only as a descriptive column if you still need to display it, this way, VertiPaq/DirectLake stores less, compresses better, and queries run quicker.

Did it work? 👍 A kudos would be appreciated
🟨 Mark it as a solution to help spread knowledge 💡

🟩 Follow me on LinkedIn

jaryszek · ‎11-05-2025

Thank you,

ok so would you add just a INT surrogate key to my Resources table where NormalizedResourceId is should solve the issue with displaying string?

Or You would create a small lookuptable (Int Key, String value for NormalizedResourceId) and connected this into Fct Table?

Best,
Jacek

How to replace string keys with integers in DirectLake (VertiPaq dictionary)?

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta