Solved: Modelling datawarehouse

NEJO · ‎01-28-2025

Hi Experts

Looking for general thoughts on the best practice for modelling my data.

Scenario

I have about 6 main fact tables, and amongst other things have both a team dimension and a person dimension. My fact tables are about 70 million rows of data, and the dimensions in question are around 70'000 (Person), 1000 (Team).

Both person and team are key, and are used for multiple purposes. Each fact has against it a number of person keys to support different roles played within the business(eg: Timekeeper, supervisor, both for the current view and the view at the time of the transaction.

My current set up (in an On premise Tabular Cube) , has all the necessarly foreing keys stamped on the fact table, so that each would join to the necessary dimension table. The Dax measures will activate an inactive relationship as required to calculate the necessary measure using the correct key to join to person. Allowing for visuals/tables in Power BI to report out metrics from both profiles when necessary. This means that I end up with more complex dax using the USERELATIONSHIP statement in each metric, for both person and team.

I'm considering changing this layout as part of the fabric development, firstly to combine Person and Team into one larger dim table, but also to bring in both the current and historical view of the details into one larger dimension. So I have one join to the table, but allows me to get the necessary data needed for both current and historical. This wont get around the need for inactive relatioships totally - but will help reducing the need, and also reduces the joins if I bring in both team and person into one table.

Another option I have is to create a bridge table that sits between the fact and the dimension (that holds the primary key from the fact, the role, and the key necessary to join to the person/team dim table, but for each fact - this would need to contain many rows (number of rows in fact * the number of profiles (sometimes 8)) - resulting a table that is now >500 million (for each fact). I would then be able to code the metrics to use FILTER based on the role.

Wondering if anyone has experience of this, and what might be best practice in this situation. I'm keen to make the querying of this data as optimal as possible for the end user.

Any thoughts are welcome.

Thanks

nilendraFabric · ‎01-29-2025

Hello @NEJO

First of all thanks for sharing this detailed question.

community user will be greatly benefited from such type of discussiona.

here are some of the best practices which I will try to follow in your case :

1.Keep separate Dimensions for Person and Team.
2. Use multiple foreign keys in the Fact for each person‐role or team‐role. Although it means some measures will require inactive‐relationship logic, it avoids blowing up the fact table and keeps the schema simpler.
3. Use Type 2 SCD within Person or Team if you need both “current” and “historical” attributes in the same dimension. That technique is standard in dimensional modeling and does not require bridging.
4. Only use a Bridge if you truly need to associate one fact row with a variable‐length set of persons or teams at once. Otherwise, the row explosion and complexity typically outweigh any gains.

In short, most Kimball‐style models would maintain a standard star schema, keep Person and Team dimensions separate, and stamp each relevant foreign key onto the fact table (TimekeeperKey, SupervisorKey, etc.). Although this necessitates some role‐based DAX, it typically remains the most straightforward and performant solution.

Hope this helps.

Thanks

View solution in original post

v-veshwara-msft · ‎02-06-2025

Hi @NEJO ,
We haven't heard from you since last response and just wanted to check whether the solutions provided met your needs. If yes, please accept as solution to help others benefit. If not, please reach out.
Thank you.

v-veshwara-msft · ‎02-03-2025

Hi @NEJO ,

Thank you for sharing your detailed scenario. To the existing approach of maintaining multiple foreign keys in the fact table for different roles, you can try these steps to optimize performance in Fabric while ensuring flexibility.

To improve performance by reducing joins and minimizing the need for USERELATIONSHIP, here are a few suggestions:

Use Pre-Aggregated Tables for Common Queries:
Instead of running calculations dynamically for each role, pre-aggregated summary tables can store commonly used metrics like TimeWorked for different roles. This reduces the need for complex DAX and improves query efficiency. Fabric's Direct Lake mode can further enhance performance by allowing Power BI to query the data lake without import delays.
Combine Data for Frequently Used Reports:
If certain reports frequently query both historical and current role-based data, creating a denormalized table that includes both sets of details can reduce joins. This approach avoids excessive relationship switching and speeds up reporting.
Optimize Role-Based Metrics with Dynamic Selection:
Instead of duplicating measures for each role (e.g., TKWorkedAmount, CurrentTKWorkedAmount), a parameter table can be introduced to let users dynamically select which role-based metric to analyze. A SWITCH() based DAX measure can then return the appropriate result, simplifying the model.

Your current approach remains valid, and these optimizations can help enhance performance while keeping the model flexible.
Please let us know if you'd like more details or assistance in implementing these strategies in Fabric.

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and a kudos would be appreciated.

Best regards,
Vinay.

nilendraFabric · ‎01-29-2025

Hello @NEJO

First of all thanks for sharing this detailed question.

community user will be greatly benefited from such type of discussiona.

here are some of the best practices which I will try to follow in your case :

1.Keep separate Dimensions for Person and Team.
2. Use multiple foreign keys in the Fact for each person‐role or team‐role. Although it means some measures will require inactive‐relationship logic, it avoids blowing up the fact table and keeps the schema simpler.
3. Use Type 2 SCD within Person or Team if you need both “current” and “historical” attributes in the same dimension. That technique is standard in dimensional modeling and does not require bridging.
4. Only use a Bridge if you truly need to associate one fact row with a variable‐length set of persons or teams at once. Otherwise, the row explosion and complexity typically outweigh any gains.

In short, most Kimball‐style models would maintain a standard star schema, keep Person and Team dimensions separate, and stamp each relevant foreign key onto the fact table (TimekeeperKey, SupervisorKey, etc.). Although this necessitates some role‐based DAX, it typically remains the most straightforward and performant solution.

Hope this helps.

Thanks

NEJO · ‎01-29-2025

Thanks for the quick response @nilendraFabric

I think what you're suggesting - is pretty much how I intially modelled it in the SSAS Tabular model. As I look to migrate to Fabric - was hoping there was a more optimal way.

If I look at my scenarion for 3 profiles (I think we have 6) (Timekeeper, Supervisor, RelationshipManager), and one metric - I'll end up with something like this.

The fact table would need to contain..

TimekeeperKey
CurrentTimekeperKey
SupervisorKey
CurrentSupervisorKey
RelationshipKey
CurrentRelationshipKey
TimekeeperTeamKey
CurrentTimekeperTeamKey
SupervisorTeamKey
CurrentSupervisorTeamKey
RelationshipTeamKey
CurrentRelationshipTeamKey

One metric - say TimeWorked

I'll need the ability to report out the total TimeWorked on any given profile, but also will need to report out against the person and team details at the point of the transaction, but also grouped up to the current person and team details depending on the requirement for the output - so the join to person and team requires a different key for current.

This seems to create the need to have the TimeWorked metric duplicated for all profiles

TKWorkedAmount

CurrentTKWorkedAMount

etc.

Needs must, and I'm happy with that approach if it really is the most optimal. I suppose i'm trying to increase the performance as much as possible here for the end user, reducing joins, reducing the need for USERELATIOSHIPS (which I dont think are the most optimimal when it come to performance), whilist also trying to allow for as many scenarios to be catered for.

Thanks

Modelling datawarehouse

Helpful resources

Fabric Monthly Update - September 2025

FabCon Atlanta 2026

Fabric Data Days starts November 4th!

Modelling datawarehouse

Helpful resources

Fabric Monthly Update - September 2025

FabCon Atlanta 2026