Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
NenadV
Helper III
Helper III

Filtering csv files from Azure Blob Storage

I have one question that might sound stupid, but I could not find answer anywere, so here it is. When we connect to data source, is whole data downloaded immediately when we click on transform button, or we are just working with data sample in power query, and when we click on close and apply data is being downloaded? I am asking this because I am creating report using data that is stored in Azure Blob Storage and there are a lot of csv files, 100+, and I only need 10-15 files. I filter files that I need, then combine their content and than load to model. So I am wondering, are all this 100+ files downloaded to my machine and then filtered locally, or that filter is somehow performed in azure blob storage and just necessery files are downloaded?

7 REPLIES 7
edhans
Super User
Super User

All filtering is done locally. Azure DataLake and Storage Blob do nothing on their end.

 

You can filter the file names though before you begin the COMBINE operation. No file contents are downloaded until you click combine.

 

So in this article, you could filter the files before you combine them assuming your can filter by file date or name. If you need to filter based on data in the files then you have no choice but to filter after the combine.

 

As a rule Power Query only shows the first 1,000 records to aid in speeding up development, but with CSV files, it often gets bogged down. I've seen it take 4-5min per step as I am working with them.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

I am filtering files by their name, before combine files step. I tried many combinations, just to test how it is working. When I do not filter anything and combine files and load everything, power bi download over 300 mb. When I do not filter anything and do not combine files (I just load info about files, like name, type, date created...) power bi again download over 300 mb, like in previous case. Even when I remove binary column. So i am not sure that power bi do not download content before combine files step. And when I filter files before combine files step, and load data, power bi download about 30mb (I leave just 10-15 files). So it seems to me that filtering of files by name is done in Azure blob storage and that whole content of files is downloaded when you load data, regardless of presence of combine files step. I do not have much experince with this, so I am not sure who all of that is done.

You are saying that when you go to the storage blob and get a listing of the files, Power BI is downloading 300MB just to show you the folder listing?



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Yeah, over 300 mb just for listing the files

Then that is a function of Azure Blob Storage. I am very surprised to hear that.

 

I use Azure DataLake for CSV/Excel files and I can list hundreds of files in a few seconds in the Power Query editor. I suggest you consider moving the files to that storage medium. Even SharePoint doesn't download the files until you click Combine or click on a binary itself.

 

I use Blob Storage for images, but that is accessed via HTTPS URLs to images from the report. Power Query and Power BI's data model don't actually bring those images in. The service simply shows the images once the report loads.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

This data is generated on daily basis so it amount will increase quickly. I will consider moving it somewhere else for sure.

See this article published just yesterday on using Azure Data Lake gen 2. Seems perfect for your scenario. I know ADL does not download file contents until it begins the combine operation.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors
Top Kudoed Authors