Re: PQ Memory and Processing Question - Right Tool...

karen578 · ‎07-23-2022

The vague question:

I'm new to Power BI and don't know if it's going to be the right tool for the task at hand. I can't quite figure out how to provide sanitized data for my exact problem - but I'm not quite looking for a solution with code either. I'm starting to get bogged down by slow performance times in Power Query. I'm getting the results I want, but I don't quite want to go into the deep-end learning about memory management and execution either. Each step takes a long time and I know this is impacted by me having the Preview option on - I'm mostly wondering if I can expect it to take this long every time I load the results into the Data Model? How do data load times into the Data Model compare to the Data Preview in Power Query?

My project highlights:

I'm responsible for replatforming my company's budget model that's currently in Excel.
We are doing most of the work in Python. Generated projected data will be outputted as flat files
Output data mimics our existing enterprise data star schema quite well, though is far more simplified. About 5 fact tables and 5 dimension tables of interest
Using Power BI for 2 main tasks:
- Generating reports to facilitate conversations to set the initial budget (once per year)
- Refresh reports monthly to facilitate actual vs budget reporting

The problematic task:

In the budget-setting process, we need the ability to run different scenarios. We need the ability to take in another "Override" data input that takes in alternative assumptions. We need to refresh our projected fact tables (think replaced values) based on these assumptions.
This input file is in Excel. I am open to considering Power Query to do this work - the final product will be in Power BI, I'm thinking it could make sense and make things easier if everything is within the "Microsoft ecosystem" of things.
I'm confident the transformations needed to create the alternate projected data is possible in Power Query, but it's very messy.
- The biggest issue is that the projection algorithm deals with one fact at a time, and the path is linear. Fact1 (in PQ as Query1) impacts Fact2 (Query2) which impacts Fact3 (Query3), and so on.
- Generating the alternate data for each fact also requires complicated logic - grouping, joins with initial facts and dim tables, calculations based on data in prior rows, etc.
I'm ok figuring out this logic, but it is quite time consuming to see each step get perfomed in the data preview step. I realize this is because Power Query query results are not stored in memory. I'm new but I've been able to improve this a fair bit with Table.Buffer on intermediate steps, especially a few using recursion and List.Generate.
- It currently takes about a minute per new step added. I wouldn't mind if it took a few minutes to load the full data model every time, but this development is brutal. I'm still very new and like building my queries one step at a time, using the UI as much as possible. It's brutal to be constantly waiting in the development stage, where I know this would be near instantaneous in other tools.
So far, I've been able to create the updated logic to replace the 1st fact table and half of the second. I'm wondering if it's worth my time to continue on this track or if I should just do this in Python instead. Since the second fact table references the first, I'm already seeing the time required grow (exponentially?)
My fact tables are actually quite small too - there's 5 and the max size might end up being 25MB. I'm confident that all data could be read into memory once and transformed in Python no problem.
- The downside to this would be adding another application to the mix of running alternate scenarios. It is difficult to get support from IT on one-off issues like this, and we are definitely not developers. It would be harder for us to ensure all users have Python installed, and deal with bugs or setup issues. We don't know Python much either but can get by. 🙂

My ultimate questions:

Is this just too far off from the intended use case of Power Query? I already know it won't work well with DAX. Ultimately, I'm trying to figure out if this step is done in Power Query or Python.

Thanks!!

Anonymous · ‎07-23-2022

My friend,

Power Query can do all of the things that you require efficiently and effectively. Nearly every transformation you mentioned can be "folded" back to native SQL, if you are using SQL data sources; everything except for the prior row calculations, but again, if you know your SQL, you can write the LEAD OVER PARTITION BY SQL. You mention only Excel as a data source, but Power Query is great with .xlsx data sources as well, especially if you have a bunch of the files with similar layouts within folders.
You say you don't want to go I to the deep end; my question would be, would you go into a jiu-jitsu dojo and say, "I just need to know how to beat this one guy. I can't show you what he does, but lots of punches and kicks."
Just learn the jiu-jitsu, and the M. We'll help you.

--Nate

karen578 · ‎07-24-2022

Thanks @Anonymous. Omg, I forgot to include an important detail. My inputs are flat csv files and one Excel file. So not something that can be folded, though that aspect of power query definitely looks intriguing.

I know power query CAN do everything, I'm just wondering if it's the right tool here. I've got small files that I already know how to transform easily in Python without worrying about memory management. Without a database source, I get the sense that I'm just contorting PQ into something that it's not really designed to do. Is that a fair assessment?

PQ Memory and Processing Question - Right Tool for Updating Multiple Fact Tables?

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

Fabric Data Days starts November 4th!

PQ Memory and Processing Question - Right Tool for Updating Multiple Fact Tables?

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026