Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply

General advice on best way to handle large dataset

I have a file that is around 5.5mb.  The main table in it is 285k rows, containing products by location and date.   There is a second table of around 18K rows containing customer data per product. 

 

Product codes are not unique in either table, so I joined them in M to avoid issues with many to many relationships

 

When I combine the two together in powerquery the table increases to 1.4 million rows and the file size increases to 45mb.

 

I am concerned about refresh times and the overall size of the file.  The dashboard itself is still quite responsive.  It's more the refresh I'm trying to speed up.  What can I do to reduce the file size and keep it quick?

 

Or is there a better way to combine the tables that gets around the many to many relationship issue?

 

1 ACCEPTED SOLUTION
BA_Pete
Super User
Super User

Hi @dapperscavenger ,

 

By the sounds of it, you don't need to join these tables at all, you need a dimension table.

 

In Power Query, create a products dimension as follows:

1) New Source > Blank Query

2) In the formula bar, enter:

= Table.Distinct(Table.SelectColumns(yourTableWithAllProductsIn, {"ProductCode", "ProductName"}))

Where "yourTableWithAllProducts" is the name of one of your queries that contains all products, and "ProductCode" and "ProductName" are the names of the columns from this query that you want to retain in your dimension table.

 

Then Apply this to the data model.

Once your two original (unjoined/unmerged) tables are in the model with your new products dimension table, make relationships between the dimension table and each fact table as follows:

 

dimension[ProductCode] to fact[ProductCode] - One to many respectively.

 

You can now use the ProductName field from your dimension table in any visuals and this will correctly filter the fact values in your two fact tables.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

1 REPLY 1
BA_Pete
Super User
Super User

Hi @dapperscavenger ,

 

By the sounds of it, you don't need to join these tables at all, you need a dimension table.

 

In Power Query, create a products dimension as follows:

1) New Source > Blank Query

2) In the formula bar, enter:

= Table.Distinct(Table.SelectColumns(yourTableWithAllProductsIn, {"ProductCode", "ProductName"}))

Where "yourTableWithAllProducts" is the name of one of your queries that contains all products, and "ProductCode" and "ProductName" are the names of the columns from this query that you want to retain in your dimension table.

 

Then Apply this to the data model.

Once your two original (unjoined/unmerged) tables are in the model with your new products dimension table, make relationships between the dimension table and each fact table as follows:

 

dimension[ProductCode] to fact[ProductCode] - One to many respectively.

 

You can now use the ProductName field from your dimension table in any visuals and this will correctly filter the fact values in your two fact tables.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan25PBI_Carousel

Power BI Monthly Update - January 2025

Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.