Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
JoyceM
New Member

Data flow incremental refresh with backfill

Good day everyone 

I need to import a big table (billions of rows).

It lives in a Cloudera Impala database and I'm using the native Cloudera ODBC driver to access the data. 

 

The data is "partitioned" daily. Consists of telecoms data for 5 years. The data remains fairly static except for the last 15 days or so.

 

I have created a view on this big table to join in enrichment information.

 

For performance and re-use I read it's best to create a data flow which is then incrementally refreshed every x days. On top of this 1 or more semantic models can then be created and incrementally refreshed as needed. And then on top of that thin reports can be created.

 

This all makes sense.

 

To test if the concept will work I limited the days pulled into the view to 5 days on the database side. 

My problem is now that when I change the view to more days and setup incremental refresh for say 1 year / 15 days / check last updated, my refresh fails with out of memory errors. 

 

I played around with less days/more days. It fails every time.

 

Checked the job log and it seems it's trying to pull the whole years data in one single query - not day by day like I would have expected.

 

 

Any ideas on how I can pull my big dataset into my data flow one day at a time fore the past 5 years and then only do daily incremental refreshes for the last 15 days thereafter?

 

Thanks a lot 

 

 

 

 

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

 

I read it's best to create a data flow which is then incrementally refreshed every x days. On top of this 1 or more semantic models can then be created and incrementally refreshed as needed

 

Where did you read that?  

 

Dataflow incremental refresh is a black box.  You cannot bootstrap anything and you cannot refresh individual partitions.

 

Having both the dataflow and the semantic model do incremental refresh is a waste of time and storage.

 

Drop the dataflow and load the data directly into the semantic model. Use bootstrapping and XMLA to fill the partitions. Each partition needs to be less than 10 GB and needs to refresh within 5 hours, so aim for a safe size of about 5GB maximum per partition or 2.5 hours refresh (whatever is the limit that hits first).

 

Advanced incremental refresh and real-time data with the XMLA endpoint in Power BI - Power BI | Micr...

View solution in original post

1 REPLY 1
lbendlin
Super User
Super User

 

I read it's best to create a data flow which is then incrementally refreshed every x days. On top of this 1 or more semantic models can then be created and incrementally refreshed as needed

 

Where did you read that?  

 

Dataflow incremental refresh is a black box.  You cannot bootstrap anything and you cannot refresh individual partitions.

 

Having both the dataflow and the semantic model do incremental refresh is a waste of time and storage.

 

Drop the dataflow and load the data directly into the semantic model. Use bootstrapping and XMLA to fill the partitions. Each partition needs to be less than 10 GB and needs to refresh within 5 hours, so aim for a safe size of about 5GB maximum per partition or 2.5 hours refresh (whatever is the limit that hits first).

 

Advanced incremental refresh and real-time data with the XMLA endpoint in Power BI - Power BI | Micr...

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.