Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hey,
I want to store data on customers and some calculated, categoric behavioural labels, e.g. X and Y customers are millenials and male. I have too many labels to have columns in a customer table (about 100 labels).
Therefore there's a many to many relationship between labels and customers. Customers can have many labels and labels can be assigned to many customers.
There is a similar story with holidays and holiday labels. Customers are then linked to what holidays they go on.
Is the below with multiple linking tables efficient in PowerBI or would it be more efficient to have a central (but very long) fact table that links all the keys? Or is this approach completely wrong.
Any help massively appreciated!
(Example is a dummy example so apologies if there are flaws)
hi @badger123
I would suggest you use star schema in power bi.
For more details, please refer to these:
https://docs.microsoft.com/en-us/power-bi/guidance/star-schema
https://radacad.com/power-bi-basics-of-modeling-star-schema-and-how-to-build-it
Regards,
Lin
Having a bit of trouble visualizing the data, can you post example data (not all 100 columns!!)
Hi @Greg_Deckler, Sorry I can't, it's sensitive data. Does the diagram not give you enough to understand? Let me know if you have specific questions. Thank you!!
Not looking for the data, just some representation of the data. For example, you talk about customers and lables, well is that (see below). But, no a picture of your data model is not enough to describe what "all the keys" means. The only information that can be provided with just a picture of a data model is that you generall want to use a star schema.
Customer | Label | Value |
Customer 1 | Label 1 | blue |
Customer 1 | Label 2 | green |
Customer 1 | Label 3 | plate |
Customer 2 | Label 1 | red |
Customer 2 | Label 2 | blue |
Customer 2 | Label 3 | cup |
or
Customer | Label 1 | Label 2 | Label 3 |
Customer 1 | blue | green | plate |
Customer 2 | red | blue | cup |
Hello @Greg_Deckler
Thanks for offering your help with this. I am struggling to come up with a simple and efficient solution after a few attempts, so I thought it would be helpful to take a step back and show you what I'm tyring to do.
I've included a sample of what data I have below. Here are a few examples of what I questions I would like to answer with this data:
What is the the total value of a category across countries or for a country for a given date or time period?
What is the the total value of an item across countries or for a country for a given date or time period?
What is the total valoue of negative sentiment vs positive sentiment for an item for a given date or time period?
What is the total value of positive sentiment across all fruits for a given date or time period?
Category | Item | String | Sentiment | Country | Date | Value |
Fruit | Orange | I like oranges and apples | positive | UK | Jan-20 | 10 |
Fruit | Apple | I like oranges and apples | positive | UK | Jan-20 | 10 |
Fruit | Orange | I like oranges and apples | positive | US | Jan-20 | 30 |
Fruit | Apple | I like oranges and apples | positive | US | Jan-20 | 30 |
Fruit | Orange | I like oranges and apples | positive | UK | Feb-20 | 60 |
Fruit | Apple | I like oranges and apples | positive | UK | Feb-20 | 60 |
Fruit | Orange | I like oranges and apples | positive | US | Feb-20 | 40 |
Fruit | Apple | I like oranges and apples | positive | US | Feb-20 | 40 |
Fruit | Orange | I like oranges and apples | positive | UK | Mar-20 | 100 |
Fruit | Apple | I like oranges and apples | positive | UK | Mar-20 | 100 |
Fruit | Orange | I like oranges and apples | positive | US | Mar-20 | 120 |
Fruit | Apple | I like oranges and apples | positive | US | Mar-20 | 120 |
Fruit | Orange | I don't like oranges | negative | UK | Jan-20 | 30 |
Fruit | Orange | I don't like oranges | negative | UK | Feb-20 | 20 |
Fruit | Orange | I don't like oranges | negative | UK | Mar-20 | 50 |
Vegetable | Carrot | Carrots are nice | positive | UK | Jan-20 | 200 |
Vegetable | Carrot | Carrots are nice | positive | UK | Feb-20 | 300 |
Vegetable | Carrot | Carrots are nice | positive | UK | Mar-20 | 400 |
Vegetable | Leek | I don't enjoy leeks | negative | US | Jan-20 | 5 |
Vegetable | Leek | I don't enjoy leeks | negative | US | Feb-20 | 20 |
Vegetable | Leek | I don't enjoy leeks | negative | US | Mar-20 | 15 |
Trying to figure out the best way to split this into separate tables. I'm finding it challenging because value is based on each string at a country level for each date. The same string might exist in more than one country. A string can be mapped to multiple items and an item can have multiple strings. So how can I best build the model so that it recognises that Orange is the same item across countries but it has different strings (and hence values) for each country.
Any ideas would be much appreciated!
Hi,
Could you kinldy show the exact results you are expecting?
User | Count |
---|---|
76 | |
75 | |
46 | |
31 | |
27 |
User | Count |
---|---|
99 | |
91 | |
51 | |
49 | |
45 |