Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
Anonymous
Not applicable

Scalable reading of files from datalake in Power BI dataflows

I'm using dataflows in our premium embedded workspace to pull various files/tables to report in Power BI and make them available across multiple datasets. This works great for all the dataflows I've built so far. Refreshes are taking 5-10 minutes for dataflows in the 500 mb to 2 gb range with sources being sales data in ADLS Gen2 data lake CSV and txt files, along with the Spark tables in our Azure Synapse workspace. However, I've tried to create a dataflow based on an "Archive" structure of hourly files in our data lake. Dataflows based on files in our data lake of one file per day are running fine (15 minutes to refresh the dataflow). This Archive structure has 24 files per day, many of which are empty. This dataflow is taking over 2 hours to refresh. I should note that the actual files for the daily single file and the archive files are VERY similar in data (sales type data of 5-10 string columns and 5-10 numeric columns). 

 

The lake file path structure looks like this:

Customer  / marketplace / Archive or snapshot / Year/month/day (depending on archive or snapshot)

 

I've looked at our Tenant Metrics in the Azure Portal, and the 2 hour refresh finishes and has the same memory and QPU load whether we're on tier A2 or A4. I'm at a loss, and assume its something to do with the backend of Power Query and how it reads files. I'm also worried that as our data lake size increases with 24 new files every day for each customer, the dataflows refresh time will skyrocket and make it unusable. Any ideas? 

2 REPLIES 2
GilbertQ
Super User
Super User

Hi @Anonymous 

 

have you had a look at the updated dataflows connector with the Aug 2021 version of Power BI Desktop with the Enhanced compute engine for data flows?

 

Here is a blog post which I think will help: Chris Webb's BI Blog: How Query Folding And The New Power BI Dataflows Connector Can Help Dataset Refresh Performance Chris Webb's BI Blog (crossjoin.co.uk)





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Anonymous
Not applicable

I have enhanced compute on for this dataflow, but this is all in the service - no Desktop involved. This is all within a data flow within my tenant on app.powerbi.com

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.