Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
newpbiuser01
Helper IV
Helper IV

Data Refresh Times and Duplicate/Referenced Tables

Hello,

 

I have a question around referencing vs. duplicating data tables. I am working with large data tables (>20 million rows) and in order to increase the efficiency of the reports, I break down the master data table into dimension tables (for dates and locations etc.). To create these dimension tables, I have two options - either duplicate the master data table or reference it, remove all in-applicable columns and remove duplicate and use that as the dimention table. 

 

When I duplicated the table, and tried the data refresh on the report in Power BI Service, I got time-out errors because the data refresh was taking too long. I changed my query to instead create dimension tables by referencing the master data table.

newpbiuser01_0-1701461247318.png

 

Although the data refresh time has gone down considerably when I reference the master data table to create the dimension tables, it appears that Power BI when refreshing the dimension tables, doesn't actually just hit the main data source once, and then refresh the dimension tables, it will hit the main data source, get the master table, update dimension table 1, then hit it again, update the master data table and then update dimension table 2. 

 

What I'm struggling to understand is, why is the data refresh time in Service considerably less (went down from over 5 hours to 45 minutes) when the data is being refreshed and loaded from the main data source the same number of times as it would have if I had duplicated the data table. Also, which one do we use for creating dimension tables? 

 

Any help or insight would be greatly appreciated. 

 

Thank you!

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

20M rows is not considered large (unless you have gazillions of columns)

 

Consider using Table.Buffer  if you keep re-using a source table in your query.  (Doesn't help across queries)

 

Use Query Diagnostics to know for sure where your mashup is spending its time, or use the SQL Profiler option  

Chris Webb's BI Blog: Analysing Dataset Refresh In Power BI Premium Using SQL Server Profiler (cross...

 

View solution in original post

1 REPLY 1
lbendlin
Super User
Super User

20M rows is not considered large (unless you have gazillions of columns)

 

Consider using Table.Buffer  if you keep re-using a source table in your query.  (Doesn't help across queries)

 

Use Query Diagnostics to know for sure where your mashup is spending its time, or use the SQL Profiler option  

Chris Webb's BI Blog: Analysing Dataset Refresh In Power BI Premium Using SQL Server Profiler (cross...

 

Helpful resources

Announcements
Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

February 2024 Update Carousel

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Fabric Career Hub

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.

Top Solution Authors
Top Kudoed Authors