Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
vPCvP
New Member

How to improve data-model from 1 large table to multiple smaller ones correctly

Hello,

I made a table to demonstrate my problem, original_table:

dateclientbox_idproduct_id
16-11-2023jamesfruit_box_01orange
16-11-2023jamesfruit_box_01pineapple
16-11-2023jamesfruit_box_01pear
16-11-2023jamesfruit_box_01banana
16-11-2023jamesfruit_box_01avocado
16-11-2023jamesfruit_box_02apple
16-11-2023jamesfruit_box_02orange 
16-11-2023jamesfruit_box_02banana
16-11-2023james mango
16-11-2023james grapes
16-11-2023peterfruit_box_02apple
16-11-2023peterfruit_box_02orange 
16-11-2023peterfruit_box_02banana
16-11-2023peter mango
16-11-2023peter avocado
16-11-2023peter banana
16-11-2023robert pineapple
16-11-2023robert pear
16-11-2023robert banana
16-11-2023robert mango
16-11-2023johnfruit_box_01orange
16-11-2023johnfruit_box_01pineapple
16-11-2023johnfruit_box_01pear
16-11-2023johnfruit_box_01banana
16-11-2023johnfruit_box_01avocado
16-11-2023john mango
16-11-2023frank banana
16-11-2023frank orange 


The main problem with this data, is that it becomes very large due to the fact that a lot of people can have a box_id assigned, which can contain a lot of products. So let's say 2000 clients get assigned a box that contains 200 products, I will end up with 400000 rows and that's just for one box and I have a lot of boxes/clients. So in my opinion a lot of unnecessary repetition.

So to solve this, I wanted to split this table into multiple tables, so I can just assign a client his boxes and his separate products. I tried doing this by splitting the table into two separate tables, tablesplit1:

dateclientcollection_id
16-11-2023jamesfruit_box_01
16-11-2023jamesfruit_box_02
16-11-2023jamesmango
16-11-2023jamesgrapes
16-11-2023peterfruit_box_02
16-11-2023petermango
16-11-2023peteravocado
16-11-2023peterbanana
16-11-2023robertpineapple
16-11-2023robertpear
16-11-2023robertbanana
16-11-2023robertmango
16-11-2023johnfruit_box_01
16-11-2023johnmango
16-11-2023frankbanana
16-11-2023frankorange 


and tablesplit2:

datecollection_idbox_idproduct_id
16-11-2023fruit_box_01fruit_box_01orange
16-11-2023fruit_box_01fruit_box_01pineapple
16-11-2023fruit_box_01fruit_box_01pear
16-11-2023fruit_box_01fruit_box_01banana
16-11-2023fruit_box_01fruit_box_01avocado
16-11-2023fruit_box_02fruit_box_02apple
16-11-2023fruit_box_02fruit_box_02orange 
16-11-2023fruit_box_02fruit_box_02banana
16-11-2023mango mango
16-11-2023grapes grapes
16-11-2023avocado avocado
16-11-2023banana banana
16-11-2023pineapple pineapple
16-11-2023pear pear
16-11-2023orange  orange 


The logic behind collection_id is that when there is a box_id then it will be equal to box_id, else it will be equal to the product_id. This logic get's applied in both part 1 and 2, part 1 assigns the client to the collection_id and part 2 assigns the collection_id to the box_id and the product_id.

This however, creates a new problem, because this will create a many-to-many relationship between these tables. Because multiple clients can have multiple collection_id's and vice versa.

So I tried solving this with a bridge table, following this example:
Joining Many to Many with a Bridge in Power BI | Seer Interactive

But that didn't allow me to recreate the original_table in Power BI. I want to create a table in Power BI like this that shows per client the products and the boxes (if applicable). Since using a bridge only allows you to connect to one side and the other side can only be summarizations.

So then it's back to the original many-to-many construction, but combining this with a date table confuses me even more. So in general would like some guidance on how to proceed from here with the following in mind:

  • I will get a weekly export and need to be able to show changes per week. 
  • I will need to be able to filter on client, box_id and product_id and it needs to filter both ways.
  • I must be able to recreate the original_table in a table in Power BI

Any help is greatly appreciated, and if I need to explain anything else, please let me know.

2 REPLIES 2
SmartSmith
Advocate I
Advocate I

It seems like you're trying to design a database schema for your data to avoid redundancy and improve efficiency. Based on your description, you've already attempted to split the data into two tables but are facing challenges due to the many-to-many relationship.

Here's a suggested approach based on your requirements:

Tables:

  • Calendar Table:
    • Date
    • Week
    • Month
    • Year
    • ...
  • Clients Table:
    • ClientID (Primary Key)
    • ClientName
  • Products Table:
    • ProductID (Primary Key)
    • ProductName
  • Boxes Table:
    • BoxID (Primary Key)
    • BoxName
  • Collection Table:
    • CollectionID (Primary Key)
    • BoxID (Foreign Key referencing Boxes Table)
    • ProductID (Foreign Key referencing Products Table)
    • DateID (Foreign Key referencing Calendar Table)
    • ClientID (Foreign Key referencing Clients Table)

Relationships:

  • Clients to Collection: One-to-Many (A client can have multiple collections over time)
  • Products to Collection: One-to-Many (A product can be part of multiple collections)
  • Boxes to Collection: One-to-Many (A box can contain multiple products in a collection)
  • Calendar to Collection: One-to-Many (A collection can belong to a specific date)

Fact Table:

  • Collection Fact Table:
    • This table will contain any metrics or measures you want to track for each collection, such as the quantity of products.

Queries:

  • To recreate the original table in Power BI, you can use relationships to create visualizations that display data per client, box, and product.
  • To show changes per week, use the Calendar table and filter based on the date.

Considerations:

  • Ensure that you have unique primary keys for each table.
  • Use surrogate keys (auto-incremented integers) for primary keys to improve efficiency.
  • The fact table can include measures like product quantity, allowing you to aggregate data.

Remember that normalization is about efficiently organizing data and minimizing redundancy. The schema above allows you to represent the relationships between clients, boxes, products, and dates while maintaining the ability to analyze data over time and by different dimensions. Adjust the schema based on specific business requirements and the nature of your data. Astro luna hoy is a best game.

lbendlin
Super User
Super User

Please read about Normalization.  From your sample data you can create the following dimensions:

 

- Calendar

- Clients

- Products

- Boxes

 

What's missing is a fact column - do you only want to count the rows?

Helpful resources

Announcements
June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.