Solved: Hashed Primary Keys for Idempotence

info-assets · ‎06-02-2022

One of the major trends in data warehousing right now is the avoidance of integer surrogate keys because the eliminate idempotence (i.e. you have to reload the tables to regeneate them). As an alternative, and especially in Snowflake, Redshift, and BigQuery, people are increasingly using a hash of the natural keys in order to join data warehouse tables.

Does anyone know how hashed keys perform on large tables (e.g. 1 billion row fact, 1 million row dimension) in Power BI?

I know that Power BI builds a hash-integer lookup table itself, which takes up memory. I'm wondering how much it takes up, and what effect this has on load and user performance?

Thanks!

lbendlin · ‎06-03-2022

You may want to read a couple of the articles that discuss Vertipaq compression techniques and their performance, like this one

Inside VertiPaq - Compress for success - Data Mozart (data-mozart.com)

View solution in original post

info-assets · ‎06-06-2022

What a helpful article, thanks! I spoke to a friend who shared that he's tried this and using hashed keys instead of integer keys doesn't make a huge difference. This seems to fit with what the article describes. Thanks!