Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
I'm trying to pin down some performance issues in Power Query. One of the quirks I've noticed is that even though my largest file is only 60k records and 10 columns (mostly text in a csv), and sits at under 5MB on disk, when either PQ or PBI is referencing it, the system reports loading MORE THAN 500 MB of data from the file. I see this whether it's in the lower right corner of PQ, or when PBI is loading the model, or even in PQ when I open a filter and click "Load More".
Fortunately, I see in Dax Studio / Vertipaq that the loaded data is not that large in memory.
Is this something to worry about, or to ignore?
Thanks,
Amon
Solved! Go to Solution.
Hey @aseagull ,
I wouldn't call it a mistake, it's the price you have to pay for sourcing files, and not a relational database.
On the one hand you can easily load and transform (implementing your business logic) your data, on the other hand sourcing files comes with the price that this datasource type does not support query folding.
Chris Webb has written a lot about optimizing query performance using files as a data source. Consider the following link as a start: Chris Webb's BI Blog: Comparing The Performance Of CSV And Excel Data Sources In Power Query Chris W...
You might also considering using the M function Table.Buffer Table.Buffer - PowerQuery M | Microsoft Docs This can reduce the number of re-reading the base file.
Next to that, you also have to consider where your files are located, either on a file server or inside a SharePoint library. A file server requires an on-premises gateway if you want to automate the process. If the files are located inside a SharePoint online library a gateway is not necessary.
Hopefully, this provides some new ideas for tackling your challenge.
Regards,
Tom
Perfect!
It will depend on the number of times a source file (query) is duplicated or referenced and also the number of query transformations. The same file can be loaded multiple times. It happens all again on a refresh (unless incremental refresh is used)
Is it a problem? Probably not unless it's taking too long or crashing the PC.
---
Once the data is compressed and saved in a pbix , it is sometimes amazing how small the file can be.
------
When visualisations are created with measures is when the powerbi engine starts to work and use memory / processor on the PC
Wow, so if query A loads a file, then query B references A, it will load the file *twice*? That behavior would explain it, for sure. I've got a query that's 8-deep from the source query, plus it feeds from other queries that go back to that source query. I think I've counted 24 distinct paths from my "last query" back to the source file, and I'm not done with my logic yet.
So is my mistake thinking that I should use PQ to implement all this business logic? I'm working off a simplistic reporting tool at the moment, which doesn't give me much space to do the logic.
Thanks!
Amon
Hey @aseagull ,
I wouldn't call it a mistake, it's the price you have to pay for sourcing files, and not a relational database.
On the one hand you can easily load and transform (implementing your business logic) your data, on the other hand sourcing files comes with the price that this datasource type does not support query folding.
Chris Webb has written a lot about optimizing query performance using files as a data source. Consider the following link as a start: Chris Webb's BI Blog: Comparing The Performance Of CSV And Excel Data Sources In Power Query Chris W...
You might also considering using the M function Table.Buffer Table.Buffer - PowerQuery M | Microsoft Docs This can reduce the number of re-reading the base file.
Next to that, you also have to consider where your files are located, either on a file server or inside a SharePoint library. A file server requires an on-premises gateway if you want to automate the process. If the files are located inside a SharePoint online library a gateway is not necessary.
Hopefully, this provides some new ideas for tackling your challenge.
Regards,
Tom
Tom, indeed Table.Buffer() was it!!! Takes a 1.5h refresh down to 2m. So glad to learn about it!
Check out the October 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
113 | |
96 | |
92 | |
84 | |
69 |
User | Count |
---|---|
160 | |
125 | |
116 | |
110 | |
95 |