The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hi,
I am trying to figure out which way of designen my data model is better.
Background:
I am building a data model based on Business Central (SaaS) data, so I cannot perform any SQL operations to prepare the data outside of Power Query.
I have a fact table (Sales) and a dimension table (Item).
A simplified version of the fact table:
Document No_ | Item No_ | Variant Code | Quantity
---------------------------------------------------------
D1 I1 V1 2
D2 I2 V1 3
A simplified version of the dimension table:
Item No_ | Variant Code | Description
----------------------------------------------
I1 V1 Item 1 Variant 1
I2 V1 Item 2 Variant 1
An Item is identified by its [Item No_] and [Variant Code]. I need to create a relationship the fact and the dimension table.
Now, two possiblities come to mind:
Version 1: Surrogate Key
---------------------------
This would be the approach I would use if I had SQL at my disposal. Create a surrogate key for every ([Item No_]; [Variant Code]) in the dimension table and set the correct foreign key in the fact table when merging from my staging tables into my DWH tables.
I have replicated this approach in Power Query:
Step 1: Build an Index in the [Item] table (called [Item Key])
Step 2: Merge [Item] to [Sales] with a left join
Step 3: Expand [Item].[Item Key]
Step 4: Delete [Item No_] and [Variant Code] from the [Sales] table
Step 5: Create the relationship in the Model view
This was the intuitive approach I took. As soon as a get to the Merge-part, everything becomes terribly slow. Preview refreshes take forever and so does refreshing the data in Power BI. The performance of the reports themselves seems fine so far.
Version 2: Composite Business Key
--------------------------------------
I am not sure if this is the correct term. But basically, I would not do any Merges. Instead, I would concentenate the [Item No_] and [Variant Code] fields in both the fact and the dimension table and then use these new colums to create the relationship.
Step 1: Concatenate [Item No_] and [Variant Code] in the [Sales] table
Step 2: Concatenate [Item No_] and [Variant Code] in the [Item] table
Step 3: Delete [Item No_] and [Variant Code] from the [Sales] table
Step 4: Create the relationship in the Model view
This provides much better performance when refreshing previews and data. Now, is there a downside to using Solution 2 over Solution 1? Initially, Solution 2 is much faster, but I am worried that having the concatenated relationship-columns (Strings) instead of the Surrogate Keys (Integers) might have a negative performance impact in the long run (probalby more on the report performance then on Power Query / Data refresh).
Any input / experience on this topic is much appreciated. If you have a third solution which I did not consider, please let me know.
Cheers
Solved! Go to Solution.
You should be fine using the concatenated values as strings, they will be compressed and stored the same way as integers. There's a really good article with benchmark tests at https://www.maxwikstrom.se/performance/power-bi-data-types-in-relationships-does-it-matter/
You should be fine using the concatenated values as strings, they will be compressed and stored the same way as integers. There's a really good article with benchmark tests at https://www.maxwikstrom.se/performance/power-bi-data-types-in-relationships-does-it-matter/
Thanks! That's a great artilcle indeed! I think I'll go with the concatenated string, it will also make the transformation process easier to understand