Solved: Model Design - Using a Surrogate Key

sgeheeb · ‎03-15-2023

Hi,

I am trying to figure out which way of designen my data model is better.

Background:

I am building a data model based on Business Central (SaaS) data, so I cannot perform any SQL operations to prepare the data outside of Power Query.

I have a fact table (Sales) and a dimension table (Item).

A simplified version of the fact table:

Document No_ | Item No_ | Variant Code | Quantity

---------------------------------------------------------

D1 I1 V1 2

D2 I2 V1 3

A simplified version of the dimension table:

Item No_ | Variant Code | Description

----------------------------------------------

I1 V1 Item 1 Variant 1

I2 V1 Item 2 Variant 1

An Item is identified by its [Item No_] and [Variant Code]. I need to create a relationship the fact and the dimension table.

Now, two possiblities come to mind:

Version 1: Surrogate Key

---------------------------

This would be the approach I would use if I had SQL at my disposal. Create a surrogate key for every ([Item No_]; [Variant Code]) in the dimension table and set the correct foreign key in the fact table when merging from my staging tables into my DWH tables.

I have replicated this approach in Power Query:

Step 1: Build an Index in the [Item] table (called [Item Key])

Step 2: Merge [Item] to [Sales] with a left join

Step 3: Expand [Item].[Item Key]

Step 4: Delete [Item No_] and [Variant Code] from the [Sales] table

Step 5: Create the relationship in the Model view

This was the intuitive approach I took. As soon as a get to the Merge-part, everything becomes terribly slow. Preview refreshes take forever and so does refreshing the data in Power BI. The performance of the reports themselves seems fine so far.

Version 2: Composite Business Key

--------------------------------------

I am not sure if this is the correct term. But basically, I would not do any Merges. Instead, I would concentenate the [Item No_] and [Variant Code] fields in both the fact and the dimension table and then use these new colums to create the relationship.

Step 1: Concatenate [Item No_] and [Variant Code] in the [Sales] table

Step 2: Concatenate [Item No_] and [Variant Code] in the [Item] table

Step 3: Delete [Item No_] and [Variant Code] from the [Sales] table

Step 4: Create the relationship in the Model view

This provides much better performance when refreshing previews and data. Now, is there a downside to using Solution 2 over Solution 1? Initially, Solution 2 is much faster, but I am worried that having the concatenated relationship-columns (Strings) instead of the Surrogate Keys (Integers) might have a negative performance impact in the long run (probalby more on the report performance then on Power Query / Data refresh).

Any input / experience on this topic is much appreciated. If you have a third solution which I did not consider, please let me know.

Cheers

johnt75 · ‎03-15-2023

You should be fine using the concatenated values as strings, they will be compressed and stored the same way as integers. There's a really good article with benchmark tests at https://www.maxwikstrom.se/performance/power-bi-data-types-in-relationships-does-it-matter/

View solution in original post

johnt75 · ‎03-15-2023

You should be fine using the concatenated values as strings, they will be compressed and stored the same way as integers. There's a really good article with benchmark tests at https://www.maxwikstrom.se/performance/power-bi-data-types-in-relationships-does-it-matter/

sgeheeb · ‎03-15-2023

Thanks! That's a great artilcle indeed! I think I'll go with the concatenated string, it will also make the transformation process easier to understand

Model Design - Using a Surrogate Key

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

Fabric Data Days starts November 4th!