Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Good day everyone
I need to import a big table (billions of rows).
It lives in a Cloudera Impala database and I'm using the native Cloudera ODBC driver to access the data.
The data is "partitioned" daily. Consists of telecoms data for 5 years. The data remains fairly static except for the last 15 days or so.
I have created a view on this big table to join in enrichment information.
For performance and re-use I read it's best to create a data flow which is then incrementally refreshed every x days. On top of this 1 or more semantic models can then be created and incrementally refreshed as needed. And then on top of that thin reports can be created.
This all makes sense.
To test if the concept will work I limited the days pulled into the view to 5 days on the database side.
My problem is now that when I change the view to more days and setup incremental refresh for say 1 year / 15 days / check last updated, my refresh fails with out of memory errors.
I played around with less days/more days. It fails every time.
Checked the job log and it seems it's trying to pull the whole years data in one single query - not day by day like I would have expected.
Any ideas on how I can pull my big dataset into my data flow one day at a time fore the past 5 years and then only do daily incremental refreshes for the last 15 days thereafter?
Thanks a lot
Solved! Go to Solution.
I read it's best to create a data flow which is then incrementally refreshed every x days. On top of this 1 or more semantic models can then be created and incrementally refreshed as needed
Where did you read that?
Dataflow incremental refresh is a black box. You cannot bootstrap anything and you cannot refresh individual partitions.
Having both the dataflow and the semantic model do incremental refresh is a waste of time and storage.
Drop the dataflow and load the data directly into the semantic model. Use bootstrapping and XMLA to fill the partitions. Each partition needs to be less than 10 GB and needs to refresh within 5 hours, so aim for a safe size of about 5GB maximum per partition or 2.5 hours refresh (whatever is the limit that hits first).
I read it's best to create a data flow which is then incrementally refreshed every x days. On top of this 1 or more semantic models can then be created and incrementally refreshed as needed
Where did you read that?
Dataflow incremental refresh is a black box. You cannot bootstrap anything and you cannot refresh individual partitions.
Having both the dataflow and the semantic model do incremental refresh is a waste of time and storage.
Drop the dataflow and load the data directly into the semantic model. Use bootstrapping and XMLA to fill the partitions. Each partition needs to be less than 10 GB and needs to refresh within 5 hours, so aim for a safe size of about 5GB maximum per partition or 2.5 hours refresh (whatever is the limit that hits first).
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
58 | |
35 | |
26 | |
25 | |
25 |
User | Count |
---|---|
62 | |
52 | |
30 | |
25 | |
23 |