Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
mp390988
Helper IV
Helper IV

Dealing with a huge csv file

I recently saw a post that one person claimed they had difficulties in loading a 12MB csv file into power bi desktop. 

Their solution was to use dataflows - which I don't understand.

 

I don't quite understand why dataflows would be able to load the whole data as opposed to using the power query editor in power bi desktop. 


another solution could be to split the csv into smaller chunks but this maximised the administrative efforts.

 

how would you go about dealing with a large csv file in power in desktop? 

Thank you 

1 ACCEPTED SOLUTION

I heard Dataflows is good for situations where one does not have permissions to access the database but another user does

also called "breaking the security chain of custody" - I can not endorse that.

 

dataflow can be used as tool to reduce the semantic model size by trying to push much of the transformations away from the semantic model into the dataflow

Dataflows are glorified CSV files. Do these transforms at your own risk.

 

can a dataflow be used by a report in a different workspace?

yes. Reusability is one of the positive features of dataflows.  You will still need to have access to that "other"  workspace.

View solution in original post

14 REPLIES 14
HarishKM
Solution Sage
Solution Sage

@mp390988 Hey,
kindly follow the below steps
I will suggest you to upload your csv file to sharepoint folder of your team common folder.
Then you can use sharepoint folder.

HarishKM_0-1743145721701.png
You have to remove x/r from your sharepoint url - https://yourcompanyname.sharepoint.com/yourfoldername/

then you can select or filter down your csv file then you are all set here.

 

Did I answer your question? Mark my post as a solution! and Give Kudos as well


Thanks
Harish M

 

mp390988
Helper IV
Helper IV

The 800MB CV files you append to the semantic model, does it take you a lot of time? Are you able to do it without any issues using Power Bi desktop and selecting the TEXT/CSV connector? Do you use any special method other than the box standard import one would usually do using power bi desktop.

 

So when or under what scenario would dataflows be useful?

Doesn't take long at all. You can assume that CSV and Parquet are the best performing formats for ingesting data from Import Mode sources.

 

We use the SharePoint Folder connector exclusively as we want to combine all the CSVs (in this case 50 CSVs at 800 MB each) into the semantic model.

 

So when or under what scenario would dataflows be useful?

1. dataflows can shield the developer (NOT the report user) from a slow data source.

2. dataflows can be re-used in multiple semantic models.

When you append additonal CSV files to the semantic model as you have mentioned, how does it know which CSV files are the new ones to append? Otherwise it will append the ones that have already been appended in the last refresh? Basically, how does Power BI identify the new CSV files?

Hi @mp390988 ,

I hope the information is helpful. Please let me know if you have any further questions or if you'd like to discuss this further. If this answers your question, please Accept it as a solution and give it a 'Kudos' so others can find it easily.

Thank you.

Hi @mp390988 ,

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

Hi @mp390988 ,

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If the response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

Thank you.

Power BI has no memory. You need to do that partition management yourself.

When you say dataflow shields the developer (NOT the report user) from a slow data source - doesn't a slow data source also affect a report user as well? If so, why does a dataflow only applicable to developer and not report user?

doesn't a slow data source also affect a report user as well? 

No, because dataflows are not even accessible to report users. The developer has to import the dataflow into a semantic model. The report users interact with the semantic model. The semantic model is stored in SSAS Tabular, in memory, with compression etc. That is usually faster than Direct Lake or Direct Query.

I heard Dataflows is good for situations where one does not have permissions to access the database but another user does so they can then create dataflow and allow the other user who deosn't have access to the database to use the dataflow instead?

 

Also, I guess dataflow can be used as tool to reduce the semantic model size by trying to push much of the transformations away from the semantic model into the dataflow, right?

 

Also, can a dataflow be used by a report in a different workspace? Or does the report and dataflow have to be in the same workspace. Let's say I am a report developer and a dataflow lives in workspace X to which I don't have access to, then that means I cannot create a report using this dataflow unless I have been granted access to workspace X right?

I heard Dataflows is good for situations where one does not have permissions to access the database but another user does so they can then create dataflow and allow the other user who deosn't have access to the database to use the dataflow instead?

 

Also, I guess dataflow can be used as tool to reduce the semantic model size by trying to push much of the transformations away from the semantic model into the dataflow, right?

 

Also, can a dataflow be used by a report in a different workspace? Or does the report and dataflow have to be in the same workspace. Let's say I am a report developer and a dataflow lives in workspace X to which I don't have access to, then that means I cannot create a report using this dataflow unless I have been granted access to workspace X right?

I heard Dataflows is good for situations where one does not have permissions to access the database but another user does

also called "breaking the security chain of custody" - I can not endorse that.

 

dataflow can be used as tool to reduce the semantic model size by trying to push much of the transformations away from the semantic model into the dataflow

Dataflows are glorified CSV files. Do these transforms at your own risk.

 

can a dataflow be used by a report in a different workspace?

yes. Reusability is one of the positive features of dataflows.  You will still need to have access to that "other"  workspace.

lbendlin
Super User
Super User

12MB is not larger for a CSV file. We casually ingest 800MB CSV files from SharePoint (and lots of them) and append these into the semantic model.

 

Keep in mind that dataflows are basically glorified CSV files in Azure Blob storage.  So ingesting a CSV file in a dataflow is pointless 95% of the time.

 

The real question is about the performance of the data source. That's where dataflows can help - dataflows can shield the developer (NOT the report user) from a slow data source.

 

Consider using Binary.Buffer to prevent the data source from chunking.

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.