Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
nottyheadedboss
Frequent Visitor

Data Refresh in a Power BI Dataflow

My client has a 135 GB data set in a delta table that has been setup for the purpose of a PoC which needs to be displayed on to a Power BI a report.

 

This is the data count i have for the PoC I am working on

month_num: - row count

202307: - 956,709,036

202308:- 934.470,054

202309: - 937,174,626

202310: - 978,703,128

202311: - 2,320,280,532

 

For this approach I would be first extracting the data into a data flow and then pass this data into a data set. For all my refreshes on a data flow and the data set too will be an incremental Refresh

 

So when I am refreshing the data flow for the first time - Should the initial data flow be a full refresh and then an incremental refresh from the second run there after OR I can start the first refresh  with an incremental refresh.

 

What I have tried so far

Full Refresh (6.87 billlion Records) caused an error as below

Error I received after 12 hours of Processing

DataSource.Error: AzureBlobs failed to get contents from 'https://***********.blob.core.windows.net/**/pbi_poc/part-2023Q410.csv'.

Status code: 400

description: 'The block list may not contain more than 50 000 blocks.'.

DataSourceKind = AzureBlobs

DataSourcePath = https://***********.blob.core.windows.net/**/...

Url = https://***********.blob.core.windows.net/**/...

 

Incremental Dataflow Refresh ( The Incremental load was 15 Months with 30 Days Daily)

When I did a incremental refresh of the dataflow that successfully completed, the data was not visible on the Power BI report. I validated the same by creating another data flow on the service using the previously refreshed data flow via an incremental approach as the source, I could not see the data in the newly created data flow as well.

Approach (DataSize = 500000)

Delta Table to Dataflow (Incremental) to Dataflow (No data)/PowerBI Report (no Data)

However when I removed the incremental refresh on the first data flow and data complete refresh I was able to view the data in the secondary data flow that was created in the service.

Delta Table to Dataflow (Full Refresh) to Dataflow/PowerBI Report (Data was visible)

I would like to know which is the right approach.

 

Help would be greatly appreciated

1 ACCEPTED SOLUTION

Fair enough. Just be aware that data flows are flat files too, so you are shuffling data from one delta table to another. If there is no cost to querying your delta tables multiple times I would cut out the extra step.

View solution in original post

3 REPLIES 3
nottyheadedboss
Frequent Visitor

No the DataSource is fast enough. We are planning to have multiple Datasets depeneding on the granularity. So the idea was to implement a DataFlow and pull in all the data into this dataflow and then the dataSet can use the DataFlow as the source instead of hitting against the Azure Databricks Delta Tables.

Fair enough. Just be aware that data flows are flat files too, so you are shuffling data from one delta table to another. If there is no cost to querying your delta tables multiple times I would cut out the extra step.

lbendlin
Super User
Super User

 

 I would be first extracting the data into a data flow

 

What made you choose that?  Is your data source slow?

 

 

Should the initial data flow be a full refresh and then an incremental refresh from the second run there after OR I can start the first refresh  with an incremental refresh.

 

I am not aware of a way to bootstrap the partitions of a dataflow.  You can do that with datasets semantic models  though, and then refresh individual partitions independently-ish (assuming they have no joins to auto date time tables etc)

Advanced incremental refresh and real-time data with the XMLA endpoint in Power BI - Power BI | Micr...

 

If you are on Fabric you need to beware of long running dataflows. When they complete they will hit your CU limit, and hit it hard, immediately blocking your capacity until the burndown has completed.  Unfortunately  the 5 hr refresh limit does not apply to dataflows.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.