The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hello everyone,
I'm new to this forum and to Power BI and trying to understand how to properly model my data to analyze economic inequalities in a country. I have several datasets, and the structure is a bit unusual, so I'm not sure how to connect everything effectively.
Here are the tables I'm working with:
This table mixes salary data by job type and by age category, which is confusing. There is no age data in job-type rows and no job-type data in age rows:
Town_Code | Sexe | Job Type | Age in 3 category | Salary |
abcde | Female | Executives | 19.1 | |
abcde | Female | Mid_Managers | 13.2 | |
abcde | Female | 18to25 | 9.7 | |
abcde | Male | 50plus | 18.6 | |
(etc.) |
Shows number of enterprises by size, per town:
Town_Code | Total_Enterprises | Enterprises_1_to_5 | ... | Enterprises_500_plus |
abcde | 996 | 272 | ... | 0 |
This one contains population by age group, sex, and cohabitation type:
Town_Code | Sexe | Age in 3 Category | Cohabitation Type (Code) | # Peoples |
abcde | Male | 15 to 25 | 11 | 340 |
abcde | Female | 25 to 50 | 12 | 16 |
Contains the town names and geographic hierarchy (Region > Department > Commune, etc.)
I'd like to analyze and compare:
I'm grateful for any guidance, schema suggestions, or resources you can share. I'm a beginner, so simple explanations are really appreciated!
Thanks in advance 🙏
Solved! Go to Solution.
Hi @DanieleR , to overcome this, you need to do some modeling.
At first, let's rebuild SalaryData with these columns
Town_Code | GenderKey | CategoryKey | Salary |
abcde | 2 | 1 | 19.1 |
abcde | 2 | 2 | 13.2 |
abcde | 2 | 4 | 9.7 |
abcde | 1 | 6 | 18.6 |
Now, we need to create a bridge table that handles both job type and age group
DimCategory
CategoryKey | CategoryType | CategoryName | SortOrder |
1 | JobType | Executives | 1 |
2 | JobType | Mid_Managers | 2 |
3 | JobType | Employees | 3 |
4 | AgeGroup | 18to25 | 10 |
5 | AgeGroup | 26to49 | 11 |
6 | AgeGroup | 50plus | 12 |
We have location informations so create another table for that
DimTown
Town_Code | TownName | Department | Region |
abcde | Paris | Paris | Île-de-France |
fghij | Lyon | Rhône | Auvergne-Rhône |
klmno | Marseille | Bouches-du-Rhône | Provence-Alpes-Côte |
and To get the gender Create another table
DimGender
GenderKey | Gender |
1 | Male |
2 | Female |
And, finally create this relationship with your other old tables
SalaryData → DimCategory (CategoryKey)
SalaryData → DimTown (Town_Code)
SalaryData → DimGender (Sexe)
Population → DimTown (Town_Code)
Enterprises → DimTown (Town_Code)
Now, you have an organized model to answer your all queries.
If I answered your question please mark my post as the solution, it helps others with the same challenge find the answer!
Thanks for your help! Just to clarify: my dataset doesn't contain individual-level data : it's already aggregated. What I showed you, it was my data already partially modified. The original format includes several structured tables like the ones below:
Population
Entreprise
Geo_Info
Salary
Hi @DanieleR
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions.
Thank you.
Hi @DanieleR , to overcome this, you need to do some modeling.
At first, let's rebuild SalaryData with these columns
Town_Code | GenderKey | CategoryKey | Salary |
abcde | 2 | 1 | 19.1 |
abcde | 2 | 2 | 13.2 |
abcde | 2 | 4 | 9.7 |
abcde | 1 | 6 | 18.6 |
Now, we need to create a bridge table that handles both job type and age group
DimCategory
CategoryKey | CategoryType | CategoryName | SortOrder |
1 | JobType | Executives | 1 |
2 | JobType | Mid_Managers | 2 |
3 | JobType | Employees | 3 |
4 | AgeGroup | 18to25 | 10 |
5 | AgeGroup | 26to49 | 11 |
6 | AgeGroup | 50plus | 12 |
We have location informations so create another table for that
DimTown
Town_Code | TownName | Department | Region |
abcde | Paris | Paris | Île-de-France |
fghij | Lyon | Rhône | Auvergne-Rhône |
klmno | Marseille | Bouches-du-Rhône | Provence-Alpes-Côte |
and To get the gender Create another table
DimGender
GenderKey | Gender |
1 | Male |
2 | Female |
And, finally create this relationship with your other old tables
SalaryData → DimCategory (CategoryKey)
SalaryData → DimTown (Town_Code)
SalaryData → DimGender (Sexe)
Population → DimTown (Town_Code)
Enterprises → DimTown (Town_Code)
Now, you have an organized model to answer your all queries.
If I answered your question please mark my post as the solution, it helps others with the same challenge find the answer!
Thank you! I think the solution you proposed is very clever and addresses my problem well. I just have one more question: would it make sense to apply the same approach you used for the Salary table to the Population table (because also in this case I have multiple variable combinations I'd like to keep in my data model), using a bridge table to connect and manage relationships?
The example tables you have provide are "reports" withj totals.
They contaion no underlying details to build relationships.
If you have access to underlying data then consider dumping it into one simple table with one record per person.
User | Count |
---|---|
86 | |
84 | |
36 | |
34 | |
34 |
User | Count |
---|---|
93 | |
79 | |
66 | |
55 | |
52 |