Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.
I am trying to build a solid data model to support the budget vs sales analysis.
Sales is project based, it has to be possible to slice by project. Sales facts do not store customer information, this comes only through project. Budget exists with two different granularities: all the customer categories have a budget, but also some customers have their own budget. Sum of customer budgets <= customer category budget.
It has to be possible to slice both sales and budget by customer and customer category.
The data model currently looks like this. I have drawn up all the possible relations (this would not be possible in PBI as some of them create ambiguity).
So, how to turn this into a perfect star schema? If perfect can't be done then what could be the preferred way to go?
Anything can be changed in this data model as long as the existing data is used.
Problem 1. Sales, Project and Customer are currently in snowflake, not a star schema.
I could copy the CustomerKey to Sales table. Is this a good or bad idea?
+ Project and Customer would be independent dimensions.
- It would not be obvious (without documentation) that the current business logic dictates that sales can't have customers other than the project customer.
- There would be a redundant column in Sales table.
+ Sales, Project and Customer would form a star schema.
+ One less join in PBI, better performance in PBI.
Problem 2. How to slice the sales and budget both by Customer and by Customer Category?
Option 1. Make Customer and Customer Category independent dimensions (similar to Problem 1)
+ Star schema
- Not obvious that Customer Category is an attribute of Customer
Option 2. Create an extra table that somehow combines Customers and CustomerCategories, give it a Key column and calculate suitable Keys into the budget table (rephrasing, create dummy rows in to Customer table for each Classifier row so that the dummy rows could be connected to the rows in Budget table that do not have specific customer).
- Messy, extra work
+ Star schema
- There will be blank customer values in vDimCustomerAndClassifier
- Customer and Classifier relation not obvious without documentation
Any other options? Any other considerations?
Code to recreate the original data model in dbdiagram.io
// Creating tables
Table vSales as SALE {
ProjectKey int
TransactionDate date
ProjectRevenueEurAmt numeric
}
Table vBudget as BUD {
BudgetDate date
CustomerKey int
ClassifierCustomerKey int
PlannedRevenueEurAmt numeric
}
Table vDimProject as DP {
Key int [pk]
ProjectNumber varchar
ProjectName varchar
CustomerKey int
}
Table vDimCustomer as DC {
Key int [pk]
Name varchar
Category varchar
Industry varchar
ClassifierCustomerKey int
}
Table vClassifierCustomer as CC {
Key int [pk]
Industry varchar
Category varchar
}
Table vDimDate as DD {
Key int [pk]
Date date
}
// Creating references
// You can also define relaionship separately
// > many-to-one; < one-to-many; - one-to-one
Ref: SALE.ProjectKey > DP.Key
Ref: DP.CustomerKey > DC.Key
Ref: SALE.TransactionDate > DD.Date
Ref: BUD.BudgetDate > DD.Date
Ref: BUD.CustomerKey > DC.Key
Ref: BUD.ClassifierCustomerKey > CC.Key
Ref: DC.ClassifierCustomerKey > CC.Key
Option 1
// Creating tables
Table vSales as SALE {
ProjectKey int
CustomerKey int
ClassifierCustomerKey int
TransactionDate date
ProjectRevenueEurAmt numeric
}
Table vBudget as BUD {
BudgetDate date
CustomerKey int
ClassifierCustomerKey int
PlannedRevenueEurAmt numeric
}
Table vDimProject as DP {
Key int [pk]
ProjectNumber varchar
ProjectName varchar
CustomerKey int
}
Table vDimCustomer as DC {
Key int [pk]
Name varchar
Category varchar
Industry varchar
ClassifierCustomerKey int
}
Table vClassifierCustomer as CC {
Key int [pk]
Industry varchar
Category varchar
}
Table vDimDate as DD {
Key int [pk]
Date date
}
// Creating references
// You can also define relaionship separately
// > many-to-one; < one-to-many; - one-to-one
Ref: SALE.ProjectKey > DP.Key
Ref: SALE.CustomerKey > DC.Key
Ref: SALE.TransactionDate > DD.Date
Ref: BUD.BudgetDate > DD.Date
Ref: BUD.CustomerKey > DC.Key
Ref: BUD.ClassifierCustomerKey > CC.Key
Ref: SALE.ClassifierCustomerKey > CC.Key
Option 2
Table vSales as SALE {
ProjectKey int
CustomerAndClassifierKey int
TransactionDate date
ProjectRevenueEurAmt numeric
}
Table vBudget as BUD {
BudgetDate date
CustomerAndClassifierKey int
PlannedRevenueEurAmt numeric
}
Table vDimProject as DP {
Key int [pk]
ProjectNumber varchar
ProjectName varchar
CustomerKey int
}
Table vDimCustomerAndClassifier as CAC {
Key int [pk]
Name varchar
Category varchar
Industry varchar
CustomerAndClassifierKey int
}
Table vDimDate as DD {
Key int [pk]
Date date
}
// Creating references
// You can also define relaionship separately
// > many-to-one; < one-to-many; - one-to-one
Ref: SALE.ProjectKey > DP.Key
Ref: SALE.CustomerAndClassifierKey > CAC.Key
Ref: SALE.TransactionDate > DD.Date
Ref: BUD.BudgetDate > DD.Date
Ref: BUD.CustomerAndClassifierKey > CAC.Key
Hello,
I assume in PowerBI it is best to keep the Sales & Budget Data in separate tables as opposed to union the data together?
Thanks,
Michael
Most of the time you want to keep your fact data separate, especially if it serves a very different purpose. If you had Sales and Backlog it would be a slightly different story as you could argue that they supplement each other. But in the case of Sales and Budget there is no conceptual overlap.
I would go with your very first version and deal with the ambiguities via inactive relationships and USERELATIONSHIP
And what would be the reasoning?
It's the closest you can get to a snowflake schema, and it seems to match your business process best.
Check out the November 2023 Power BI update to learn about new features.
Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.