Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
newpbiuser01
Helper V
Helper V

Referencing vs Duplicating to Create Dimension Tables

Hello,

 

I have a question around referencing vs. duplicating data tables. I am working with large data tables (>20 million rows) and in order to increase the efficiency of the reports, I break down the master data table into dimension tables (for dates and locations etc.). To create these dimension tables, I have two options - either duplicate the master data table or reference it, remove all in-applicable columns and remove duplicate and use that as the dimention table. 

 

When I duplicated the table, and tried the data refresh on the report in Power BI Service, I got time-out errors because the data refresh was taking too long. I changed my query to instead create dimension tables by referencing the master data table.

newpbiuser01_0-1701461247318.png

 

Although the data refresh time has gone down considerably when I reference the master data table to create the dimension tables, it appears that Power BI when refreshing the dimension tables, doesn't actually just hit the main data source once, and then refresh the dimension tables, it will hit the main data source, get the master table, update dimension table 1, then hit it again, update the master data table and then update dimension table 2. 

 

What I'm struggling to understand is, why is the data refresh time considerably less (went down from over 5 hours to 45 minutes) when the data is being refreshed and loaded from the main data source the same number of times as it would have if I had duplicated the data table. Also, which one do we use for creating dimension tables? 

 

Any help or insight would be greatly appreciated. 

 

Thank you!

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @newpbiuser01 ,

When working with large data tables, it is important to optimize the data model to improve performance. One way to do this is by breaking down the master data table into dimension tables, as you have done.


In your case, it sounds like referencing the master data table to create the dimension tables is the better approach, as it has resulted in a significant reduction in data refresh time. Although Power BI may hit the main data source multiple times when refreshing the dimension tables, the overall performance is still better than when duplicating the data table.

Solved: Difference between Reference and Duplicate - Microsoft Fabric Community

Reference vs Duplicate in Power BI; Power Query Back to Basics - RADACAD

 

 

How to Get Your Question Answered Quickly 

 

If I have misunderstood your meaning, please provide more details with your desired output and pbix file without privacy information (or some sample data) .

 

Best Regards
Community Support Team _ Rongtie

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

 

 

 

View solution in original post

2 REPLIES 2
Anonymous
Not applicable

Hi @newpbiuser01 ,

When working with large data tables, it is important to optimize the data model to improve performance. One way to do this is by breaking down the master data table into dimension tables, as you have done.


In your case, it sounds like referencing the master data table to create the dimension tables is the better approach, as it has resulted in a significant reduction in data refresh time. Although Power BI may hit the main data source multiple times when refreshing the dimension tables, the overall performance is still better than when duplicating the data table.

Solved: Difference between Reference and Duplicate - Microsoft Fabric Community

Reference vs Duplicate in Power BI; Power Query Back to Basics - RADACAD

 

 

How to Get Your Question Answered Quickly 

 

If I have misunderstood your meaning, please provide more details with your desired output and pbix file without privacy information (or some sample data) .

 

Best Regards
Community Support Team _ Rongtie

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

 

 

 

Hi. I am running into the same dilemma. Considering you attempted this an year ago, I am wondering if you have a winner between the two approaches for large master tables?

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan25PBI_Carousel

Power BI Monthly Update - January 2025

Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors