Solved: Re: Dealing with a huge csv file

mp390988 · ‎03-27-2025

I recently saw a post that one person claimed they had difficulties in loading a 12MB csv file into power bi desktop.

Their solution was to use dataflows - which I don't understand.

I don't quite understand why dataflows would be able to load the whole data as opposed to using the power query editor in power bi desktop.

another solution could be to split the csv into smaller chunks but this maximised the administrative efforts.

how would you go about dealing with a large csv file in power in desktop?

Thank you

lbendlin · ‎03-29-2025

I heard Dataflows is good for situations where one does not have permissions to access the database but another user does

also called "breaking the security chain of custody" - I can not endorse that.

dataflow can be used as tool to reduce the semantic model size by trying to push much of the transformations away from the semantic model into the dataflow

Dataflows are glorified CSV files. Do these transforms at your own risk.

can a dataflow be used by a report in a different workspace?

yes. Reusability is one of the positive features of dataflows. You will still need to have access to that "other" workspace.

View solution in original post

HarishKM · ‎03-28-2025

@mp390988 Hey,
kindly follow the below steps
I will suggest you to upload your csv file to sharepoint folder of your team common folder.
Then you can use sharepoint folder.

You have to remove x/r from your sharepoint url - https://yourcompanyname.sharepoint.com/yourfoldername/

then you can select or filter down your csv file then you are all set here.

Did I answer your question? Mark my post as a solution! and Give Kudos as well

Thanks
Harish M

mp390988 · ‎03-27-2025

The 800MB CV files you append to the semantic model, does it take you a lot of time? Are you able to do it without any issues using Power Bi desktop and selecting the TEXT/CSV connector? Do you use any special method other than the box standard import one would usually do using power bi desktop.

So when or under what scenario would dataflows be useful?

lbendlin · ‎03-27-2025

Doesn't take long at all. You can assume that CSV and Parquet are the best performing formats for ingesting data from Import Mode sources.

We use the SharePoint Folder connector exclusively as we want to combine all the CSVs (in this case 50 CSVs at 800 MB each) into the semantic model.

So when or under what scenario would dataflows be useful?

1. dataflows can shield the developer (NOT the report user) from a slow data source.

2. dataflows can be re-used in multiple semantic models.

mp390988 · ‎03-29-2025

When you append additonal CSV files to the semantic model as you have mentioned, how does it know which CSV files are the new ones to append? Otherwise it will append the ones that have already been appended in the last refresh? Basically, how does Power BI identify the new CSV files?

v-menakakota · ‎04-06-2025

Hi @mp390988 ,

I hope the information is helpful. Please let me know if you have any further questions or if you'd like to discuss this further. If this answers your question, please Accept it as a solution and give it a 'Kudos' so others can find it easily.

Thank you.

v-menakakota · ‎04-09-2025

Hi @mp390988 ,

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

v-menakakota · ‎04-03-2025

Hi @mp390988 ,

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If the response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

Thank you.

lbendlin · ‎03-29-2025

Power BI has no memory. You need to do that partition management yourself.

mp390988 · ‎03-27-2025

When you say dataflow shields the developer (NOT the report user) from a slow data source - doesn't a slow data source also affect a report user as well? If so, why does a dataflow only applicable to developer and not report user?

lbendlin · ‎03-27-2025

doesn't a slow data source also affect a report user as well?

No, because dataflows are not even accessible to report users. The developer has to import the dataflow into a semantic model. The report users interact with the semantic model. The semantic model is stored in SSAS Tabular, in memory, with compression etc. That is usually faster than Direct Lake or Direct Query.

mp390988 · ‎03-29-2025

I heard Dataflows is good for situations where one does not have permissions to access the database but another user does so they can then create dataflow and allow the other user who deosn't have access to the database to use the dataflow instead?

Also, I guess dataflow can be used as tool to reduce the semantic model size by trying to push much of the transformations away from the semantic model into the dataflow, right?

Also, can a dataflow be used by a report in a different workspace? Or does the report and dataflow have to be in the same workspace. Let's say I am a report developer and a dataflow lives in workspace X to which I don't have access to, then that means I cannot create a report using this dataflow unless I have been granted access to workspace X right?

mp390988 · ‎03-29-2025

I heard Dataflows is good for situations where one does not have permissions to access the database but another user does so they can then create dataflow and allow the other user who deosn't have access to the database to use the dataflow instead?

Also, I guess dataflow can be used as tool to reduce the semantic model size by trying to push much of the transformations away from the semantic model into the dataflow, right?

Also, can a dataflow be used by a report in a different workspace? Or does the report and dataflow have to be in the same workspace. Let's say I am a report developer and a dataflow lives in workspace X to which I don't have access to, then that means I cannot create a report using this dataflow unless I have been granted access to workspace X right?

lbendlin · ‎03-29-2025

I heard Dataflows is good for situations where one does not have permissions to access the database but another user does

also called "breaking the security chain of custody" - I can not endorse that.

dataflow can be used as tool to reduce the semantic model size by trying to push much of the transformations away from the semantic model into the dataflow

Dataflows are glorified CSV files. Do these transforms at your own risk.

can a dataflow be used by a report in a different workspace?

yes. Reusability is one of the positive features of dataflows. You will still need to have access to that "other" workspace.

lbendlin · ‎03-27-2025

12MB is not larger for a CSV file. We casually ingest 800MB CSV files from SharePoint (and lots of them) and append these into the semantic model.

Keep in mind that dataflows are basically glorified CSV files in Azure Blob storage. So ingesting a CSV file in a dataflow is pointless 95% of the time.

The real question is about the performance of the data source. That's where dataflows can help - dataflows can shield the developer (NOT the report user) from a slow data source.

Consider using Binary.Buffer to prevent the data source from chunking.