Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
aseagull
Helper III
Helper III

Why does Power BI "Load" much more data (volume) than what's in my source file?

I'm trying to pin down some performance issues in Power Query. One of the quirks I've noticed is that even though my largest file is only 60k records and 10 columns (mostly text in a csv), and sits at under 5MB on disk, when either PQ or PBI is referencing it, the system reports loading MORE THAN 500 MB of data from the file. I see this whether it's in the lower right corner of PQ, or when PBI is loading the model, or even in PQ when I open a filter and click "Load More".

 

Fortunately, I see in Dax Studio / Vertipaq that the loaded data is not that large in memory.

 

Is this something to worry about, or to ignore? 

 

Thanks,

Amon

1 ACCEPTED SOLUTION

Hey @aseagull ,

 

I wouldn't call it a mistake, it's the price you have to pay for sourcing files, and not a relational database.

 

On the one hand you can easily load and transform (implementing your business logic) your data, on the other hand sourcing files comes with the price that this datasource type does not support query folding.

 

Chris Webb has written a lot about optimizing query performance using files as a data source. Consider the following link as a start: Chris Webb's BI Blog: Comparing The Performance Of CSV And Excel Data Sources In Power Query Chris W...

 

You might also considering using the M function Table.Buffer Table.Buffer - PowerQuery M | Microsoft Docs This can reduce the number of re-reading the base file.

 

Next to that, you also have to consider where your files are located, either on a file server or inside a SharePoint library. A file server requires an on-premises gateway if you want to automate the process. If the files are located inside a SharePoint online library a gateway is not necessary.

 

Hopefully, this provides some new ideas for tackling your challenge.

 

Regards,

Tom 



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

5 REPLIES 5
TomMartens
Super User
Super User

Perfect!



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany
HotChilli
Super User
Super User

It will depend on the number of times a source file (query) is duplicated or referenced and also the number of query transformations.  The same file can be loaded multiple times.  It happens all again on a refresh (unless incremental refresh is used)

Is it a problem? Probably not unless it's taking too long or crashing the PC.

---

Once the data is compressed and saved in a pbix , it is sometimes amazing how small the file can be.

------

When visualisations are created with measures is when the powerbi engine starts to work and use memory / processor on the PC

Wow, so if query A loads a file, then query B references A, it will load the file *twice*? That behavior would explain it, for sure. I've got a query that's 8-deep from the source query, plus it feeds from other queries that go back to that source query. I think I've counted 24 distinct paths from my "last query" back to the source file, and I'm not done with my logic yet.

 

So is my mistake thinking that I should use PQ to implement all this business logic? I'm working off a simplistic reporting tool at the moment, which doesn't give me much space to do the logic. 

 

Thanks!
Amon

Hey @aseagull ,

 

I wouldn't call it a mistake, it's the price you have to pay for sourcing files, and not a relational database.

 

On the one hand you can easily load and transform (implementing your business logic) your data, on the other hand sourcing files comes with the price that this datasource type does not support query folding.

 

Chris Webb has written a lot about optimizing query performance using files as a data source. Consider the following link as a start: Chris Webb's BI Blog: Comparing The Performance Of CSV And Excel Data Sources In Power Query Chris W...

 

You might also considering using the M function Table.Buffer Table.Buffer - PowerQuery M | Microsoft Docs This can reduce the number of re-reading the base file.

 

Next to that, you also have to consider where your files are located, either on a file server or inside a SharePoint library. A file server requires an on-premises gateway if you want to automate the process. If the files are located inside a SharePoint online library a gateway is not necessary.

 

Hopefully, this provides some new ideas for tackling your challenge.

 

Regards,

Tom 



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Tom, indeed Table.Buffer() was it!!! Takes a 1.5h refresh down to 2m. So glad to learn about it!

Helpful resources

Announcements
Sept PBI Carousel

Power BI Monthly Update - September 2024

Check out the September 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.