Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
Hello,
I made a table to demonstrate my problem, original_table:
date | client | box_id | product_id |
16-11-2023 | james | fruit_box_01 | orange |
16-11-2023 | james | fruit_box_01 | pineapple |
16-11-2023 | james | fruit_box_01 | pear |
16-11-2023 | james | fruit_box_01 | banana |
16-11-2023 | james | fruit_box_01 | avocado |
16-11-2023 | james | fruit_box_02 | apple |
16-11-2023 | james | fruit_box_02 | orange |
16-11-2023 | james | fruit_box_02 | banana |
16-11-2023 | james | mango | |
16-11-2023 | james | grapes | |
16-11-2023 | peter | fruit_box_02 | apple |
16-11-2023 | peter | fruit_box_02 | orange |
16-11-2023 | peter | fruit_box_02 | banana |
16-11-2023 | peter | mango | |
16-11-2023 | peter | avocado | |
16-11-2023 | peter | banana | |
16-11-2023 | robert | pineapple | |
16-11-2023 | robert | pear | |
16-11-2023 | robert | banana | |
16-11-2023 | robert | mango | |
16-11-2023 | john | fruit_box_01 | orange |
16-11-2023 | john | fruit_box_01 | pineapple |
16-11-2023 | john | fruit_box_01 | pear |
16-11-2023 | john | fruit_box_01 | banana |
16-11-2023 | john | fruit_box_01 | avocado |
16-11-2023 | john | mango | |
16-11-2023 | frank | banana | |
16-11-2023 | frank | orange |
The main problem with this data, is that it becomes very large due to the fact that a lot of people can have a box_id assigned, which can contain a lot of products. So let's say 2000 clients get assigned a box that contains 200 products, I will end up with 400000 rows and that's just for one box and I have a lot of boxes/clients. So in my opinion a lot of unnecessary repetition.
So to solve this, I wanted to split this table into multiple tables, so I can just assign a client his boxes and his separate products. I tried doing this by splitting the table into two separate tables, tablesplit1:
date | client | collection_id |
16-11-2023 | james | fruit_box_01 |
16-11-2023 | james | fruit_box_02 |
16-11-2023 | james | mango |
16-11-2023 | james | grapes |
16-11-2023 | peter | fruit_box_02 |
16-11-2023 | peter | mango |
16-11-2023 | peter | avocado |
16-11-2023 | peter | banana |
16-11-2023 | robert | pineapple |
16-11-2023 | robert | pear |
16-11-2023 | robert | banana |
16-11-2023 | robert | mango |
16-11-2023 | john | fruit_box_01 |
16-11-2023 | john | mango |
16-11-2023 | frank | banana |
16-11-2023 | frank | orange |
and tablesplit2:
date | collection_id | box_id | product_id |
16-11-2023 | fruit_box_01 | fruit_box_01 | orange |
16-11-2023 | fruit_box_01 | fruit_box_01 | pineapple |
16-11-2023 | fruit_box_01 | fruit_box_01 | pear |
16-11-2023 | fruit_box_01 | fruit_box_01 | banana |
16-11-2023 | fruit_box_01 | fruit_box_01 | avocado |
16-11-2023 | fruit_box_02 | fruit_box_02 | apple |
16-11-2023 | fruit_box_02 | fruit_box_02 | orange |
16-11-2023 | fruit_box_02 | fruit_box_02 | banana |
16-11-2023 | mango | mango | |
16-11-2023 | grapes | grapes | |
16-11-2023 | avocado | avocado | |
16-11-2023 | banana | banana | |
16-11-2023 | pineapple | pineapple | |
16-11-2023 | pear | pear | |
16-11-2023 | orange | orange |
The logic behind collection_id is that when there is a box_id then it will be equal to box_id, else it will be equal to the product_id. This logic get's applied in both part 1 and 2, part 1 assigns the client to the collection_id and part 2 assigns the collection_id to the box_id and the product_id.
This however, creates a new problem, because this will create a many-to-many relationship between these tables. Because multiple clients can have multiple collection_id's and vice versa.
So I tried solving this with a bridge table, following this example:
Joining Many to Many with a Bridge in Power BI | Seer Interactive
But that didn't allow me to recreate the original_table in Power BI. I want to create a table in Power BI like this that shows per client the products and the boxes (if applicable). Since using a bridge only allows you to connect to one side and the other side can only be summarizations.
So then it's back to the original many-to-many construction, but combining this with a date table confuses me even more. So in general would like some guidance on how to proceed from here with the following in mind:
Any help is greatly appreciated, and if I need to explain anything else, please let me know.
It seems like you're trying to design a database schema for your data to avoid redundancy and improve efficiency. Based on your description, you've already attempted to split the data into two tables but are facing challenges due to the many-to-many relationship.
Here's a suggested approach based on your requirements:
Tables:
Relationships:
Fact Table:
Queries:
Considerations:
Remember that normalization is about efficiently organizing data and minimizing redundancy. The schema above allows you to represent the relationships between clients, boxes, products, and dates while maintaining the ability to analyze data over time and by different dimensions. Adjust the schema based on specific business requirements and the nature of your data. Astro luna hoy is a best game.
Please read about Normalization. From your sample data you can create the following dimensions:
- Calendar
- Clients
- Products
- Boxes
What's missing is a fact column - do you only want to count the rows?
User | Count |
---|---|
82 | |
72 | |
67 | |
47 | |
36 |
User | Count |
---|---|
111 | |
56 | |
50 | |
42 | |
40 |