Power Query - Compare capacity units (CU) usage

frithjof_v · ‎04-17-2024

Hi all,

Let's say I need to do data transformations in Power Query for my new report.

Let's say I am focusing specifically on one table in my semantic model, which requires some transformations in Power Query before I can load it into my semantic model.

There may be different alternative routes I can take (in terms of which Power Query functions to use, sequence of transformation steps in Power Query, etc.) in order to achieve the same data output to be loaded into my semantic model.

I want to compare my different alternative routes (i.e. different M code scripts) in terms of how they will impact the capacity unit (CU) usage on my premium capacity.

I want to choose the M query (M code) which minimizes CU usage.

How can I compare the CU usage of two (or more) alternative Power Query (M) queries, which are all giving me the same required data output (the same output table) but they do so by using different transformation logic.

I am willing to create two or more alternative versions of the .pbix file (and similarly, two or more alternative versions of a dataflow) in order to compare the CU usage of the alternative M codes, if necessary, while I am developing the solution.

What will be the useful ways of comparing the CU usage?

Should I do the comparison using Power BI desktop, or should I publish the alternative .pbix files to the premium service and do the comparison of CU usage there (to get real world numbers)?

Should I

Use the premium (fabric) capacity metrics app?
Look at how long the refresh time takes in the service?
Use diagnostic tool in Power Query?
Look at the live memory/CPU usage figures in the bottom of the screen in dataflows (while in edit mode)?
Look at memory/CPU usage in Task Manager in Windows?
Other approaches?

Thank you 😀

frithjof_v · ‎04-18-2024

Thank you, @lbendlin!

I am curious, what will be the benefit of placing the pbix files in different workspaces?

My plan is to continue to explore this topic.

I also see Log Analytics mentioned in some blog posts, however Log Analytics is not included in the Power BI (Fabric) license (so it will be an additional cost) and the Log Analytics only seems to cover semantic models (not dataflows) from what I read.

So I guess I will use the Fabric Capacity Metrics App, or create my own report connected to the Fabric Capacity Metrics App semantic model (however that will require an additional effort).

If I want to use the CU based approach.

And also for me, it makes totally sense to use the CU based approach. Because my primary aim in this optimization is to ensure I am not wasting the precious CU's in my capacity 😉

I assume that adhering to best practices (like taking advantage of query folding, removing unnecessary data before complex transformations, use incremental refresh, etc.) will help me a lot to avoid excessive usage of CU.

But other than that, there are still some cases where I am not sure about what is the best choice between two alternative M code scripts (regarding CU usage). So it will be interesting to test the alternative M scripts.

Thank you for your insights!

Additional suggestions are also very welcome 😀

lbendlin · ‎04-19-2024

 what will be the benefit of placing the pbix files in different workspaces?

To reduce measurement bias. Same reason why you shouldn't place the metrics app into a premium/fabric workspace.

frithjof_v · ‎04-19-2024

I'm not sure if I understand how that will reduce measurement bias in this case. My first thought is that I should put the pbix files in the same workspace, to give the pbix files the same playing field (give all the pbix files equal test conditions). I may be missing something here. Could you please elaborate on why placing the pbix files in different workspaces will reduce the measurement bias?

I am also trying to understand why placing the Fabric Capacity Metrics App in a Premium/Fabric workspace introduces measurement bias.

Per my understanding, the data source of the Fabric Capacity Metrics App is a Kusto database which is placed somewhere else outside of the workspace.

So I don't understand why placing the Fabric Capacity Metrics App in a pro or premium/fabric workspace will make a difference. I may be missing something here. Could you please elaborate on this as well?

Thank you 😃

Best regards, Frithjof

lbendlin · ‎04-19-2024

You want to run the refreshes in parallel. You can choose to do so in the same workspace if you think that they will not impact each other.

Putting the watcher into the watched environment violates a basic tenet of monitoring. You just don't do that.

frithjof_v · ‎04-20-2024

Thank you @lbendlin !

I don't think I have read or heard anywhere about items in the same workspace impacting each other when it comes to "competing for compute resources".
After all, the items will still be on the same capacity even if I put them into separate workspaces.

My current understanding is that all items on the same capacity are "competing for the compute resources", regardless of whether they are in the same workspace or in separate workspaces.
Maybe I'm missing something here. Please do let me know if you have any further information/insights on this.

Best regards, Frithjof

lbendlin · ‎04-20-2024

As I said - you can choose to do that in the same workspace. Your call.

lbendlin · ‎04-18-2024

The answer is a definitive maybe.

- Ideally place your comparative pbix into different workspaces

- Ideally run all of the tests at the same time

- repeat your tests at least ten times, throw away the outliers and average the rest

(yes, assuming you have Premium/Fabric capacity and are using the metric app)

Or - examine your Power Query code and optimize it manually. There are lots of things you can do like rearranging the order of the steps and liberally applying Table.Buffer and its siblings.

I still like the CU based approach more as it gives you a true cost number. It will be a combination of time spent and computational complexity.