Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Thoughts on Best Practice to load a large dataset?

I want to ask if someone could share their best practices on loading / converting large csv files into Power BI to reduce refresh times.

 

I have a 20mb monthly csv file and would like to append each monthly file into one table to perform time intelligence calculations.  First, I used Power Query to append each csv into a large file.  The refresh time is very long with only 2 months of data (20+ minutes)

 

On second approach, I used python to merge the 2 csv files outside of Power Query (using Juypter Notebook), and the size of the csv file with 2 months of data is 40mb. The refresh time on the Power BI is also very long (20+ minutes).

 

What approach should I use to improve refresh time? Should this data sit in a database (eg. Dremio or SQL server or MS Access)?

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Thank you for everyone's suggestions. Your responses allowed me to learn when to use data flow in the future!

 

I solved my problem: the problem was the csv file was stored on the network drive and it was throttling the refresh.  The solution now is to have store the file on OneDrive (or local drive) so I can refresh the file locally!

 

The change reduced refresh time from 20+ minutes to 10 seconds! 🙂

View solution in original post

4 REPLIES 4
Anonymous
Not applicable

Thank you for everyone's suggestions. Your responses allowed me to learn when to use data flow in the future!

 

I solved my problem: the problem was the csv file was stored on the network drive and it was throttling the refresh.  The solution now is to have store the file on OneDrive (or local drive) so I can refresh the file locally!

 

The change reduced refresh time from 20+ minutes to 10 seconds! 🙂

Greg_Deckler
Super User
Super User

@Anonymous Are you doing a lot of heavy transformation? I have loaded 700MB CSV files in 2 minutes. Try taking out your default change type step if you have one. 

 

And yes, always better with an actual database backend like SQL.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
FrankPreusker
Advocate III
Advocate III

I have good experience utilizing Power BI Dataflows to inject and append several hundert (chunky) Excel files. This decouples the first step (merging) from the second one (data load to PBI).

Loading even big dataflows into PBI is a seamless experience in PBI Service.

speedramps
Super User
Super User

Hi donaldm

 

If you must use CSV in Power Query then try remove any unwanted rows or columns as soon as possible in your query before you do any other transformations.

 

SQL will be much quicker that CSV because of "Query Folding".

 

Power BI is not fast and will have to do all the CSV heavy lifting and shifting.

 

Whereas if you use SQL then Power BI will "delgate" the heavy lifting and shifting back to the SQL server which is super fast.

 

Remember we are BI community voluntrees so please click the thumbs-up for me taking the trouble to help you and then accept the solution if it works.  Thank you !

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.