Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
jaryszek
Memorable Member
Memorable Member

How to replace string keys with integers in DirectLake (VertiPaq dictionary)?

Hi everyone,

I’m working with DirectLake over OneLake, and I’m trying to optimize relationships in my model by replacing string keys with integer keys — so the VertiPaq engine can store them more efficiently in its dictionary.

Right now, my key columns look like this (example, not real data):

 

2025-04-01_/subscriptions/xxxxxx/resourcegroups/demo-storage/providers/microsoft.storage/storageaccounts/abc123

 

These are long text values coming from a data source (e.g., Azure resource IDs).

I’d like to understand:

1. What’s the recommended integer size/type (INT32, INT64, etc.) for relationship keys in DirectLake / VertiPaq

2. What’s the best way to convert such long string IDs into integers — for example, should I use a hash function or generate a numeric surrogate key?

3. Is there any Microsoft documentation that explains what kind of string values can (or should) be converted to integer keys for VertiPaq dictionary encoding?

My goal is to reduce memory usage and improve performance, but I want to make sure I’m doing it in a supported and efficient way.

Any best practices or doc references would be really helpful — especially for DirectLake models.

Thanks!

Best,
Jacek

1 ACCEPTED SOLUTION
DataVitalizer
Solution Sage
Solution Sage

Hi @jaryszek 

 

If you want your model to run leaner and faster, swap those long text IDs for simple numbers, I suggest using a big integer (INT64) so you never run into size limitsm don’t bother hashing the strings that just adds complexity without saving much, Instead create a surrogate key in your data prep (like an auto‑number or mapping table) and use that for relationships. Keep the original long string only as a descriptive column if you still need to display it, this way, VertiPaq/DirectLake stores less, compresses better, and queries run quicker.


Did it work? 👍 A kudos would be appreciated
🟨 Mark it as a solution to help spread knowledge 💡

 

🟩 Follow me on LinkedIn

View solution in original post

5 REPLIES 5
v-tejrama
Community Support
Community Support

Hi @jaryszek ,

 

You are on the right path with your approach. Introducing an integer surrogate key to the Resources table is an effective and efficient solution, as both DirectLake and VertiPaq process integer relationships much better than long text strings. This allows your fact tables to join using a compact numeric value, optimizing model size, memory usage, and query performance. Using a 64-bit integer (BIGINT) is recommended to avoid size constraints as your data grows. Using a hash function is unnecessary in this scenario, as it adds complexity without improving compression. If modifying the main Resources table is not possible, creating a lookup table to map string IDs to integer keys is a good alternative, though adding the surrogate key directly is typically more straightforward. Retain the original string ID as a descriptive column if needed, but ensure relationships are based on the numeric key. This method aligns with Microsoft’s modeling recommendations for VertiPaq and DirectLake and will deliver optimal performance.

Best Regards,
Tejaswi.
Community Support

thank you, 

1. "This method aligns with Microsoft’s modeling recommendations for VertiPaq and DirectLake and will deliver optimal performance."

do you have docs for it? 

2. If modifying the main Resources table is not possible, creating a lookup table to map string IDs to integer keys is a good alternative

Ok this is important. what does it mean is not possible? It where to create look up table if i will keep my Resources table also in a model? 

Best,
Jacek

Hi @jaryszek ,

 

That’s a great follow-up, and you’re asking exactly the right questions.

When I mentioned that this approach aligns with Microsoft’s recommendations, the guidance comes from the broader documentation around model design and performance in Power BI and Fabric. The key idea is that Direct Lake relies on the VertiPaq engine under the hood, and VertiPaq performs best when relationships are based on numeric surrogate keys rather than long text strings. This is part of the general modeling best practices described in Microsoft’s official guidance on star schema design and VertiPaq optimization. You can find more details in these Microsoft articles: 
https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-overview?
https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-understand-storage
https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-analyze-query-processing?
These exp...

 

Regarding the lookup table, if modifying the main Resources table is not possible simply means situations where you don’t have control over that table, for example when it’s sourced from another system or managed by another team and you can’t add new columns. In that case, you can create a small lookup or mapping table in your data preparation layer, such as in your Lakehouse or Dataflow. This table would store the integer key and the original resource ID, and your fact tables would join to it using the numeric key. You can still keep your original Resources table in the model for descriptive fields or metadata, but make sure that the relationships for filtering and joins use the numeric key from the lookup. This gives you the same performance benefit while preserving the original string values for reporting or reference.

 

Best Regards,
Tejaswi.
Community Support

 

DataVitalizer
Solution Sage
Solution Sage

Hi @jaryszek 

 

If you want your model to run leaner and faster, swap those long text IDs for simple numbers, I suggest using a big integer (INT64) so you never run into size limitsm don’t bother hashing the strings that just adds complexity without saving much, Instead create a surrogate key in your data prep (like an auto‑number or mapping table) and use that for relationships. Keep the original long string only as a descriptive column if you still need to display it, this way, VertiPaq/DirectLake stores less, compresses better, and queries run quicker.


Did it work? 👍 A kudos would be appreciated
🟨 Mark it as a solution to help spread knowledge 💡

 

🟩 Follow me on LinkedIn

Thank you,

ok so would you add just a INT surrogate key to my Resources table where NormalizedResourceId is should solve the issue with displaying string? 

jaryszek_0-1762354373518.png



Or You would create a small lookuptable (Int Key, String value for NormalizedResourceId) and connected this into Fct Table?

Best,
Jacek

Helpful resources

Announcements
November Power BI Update Carousel

Power BI Monthly Update - November 2025

Check out the November 2025 Power BI update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors
Top Kudoed Authors