Surprising cost of gen 2 dataflows running in on-p...

dbeavon3 · ‎10-11-2024

Are GEN 2 dataflows common yet? I'm having trouble fitting them into my available CU's in my capacity. They are consuming a shocking amount of my capacity and it is counterintuitive. Here is a chart:

In the past two days I've enabled these two GEN2 dataflows to run in this capacity, and the "background %" is increasing rapidly (blue bars at the bottom)!

The most troubling thing is that these dataflows are spending over 90% of their execution time simply waiting for the on-prem gateway to respond. The end results results, from an internal web API, are subsequently stored to parquet format in the premium workspace. Given that almost all the work is happening on-premise, in our own local environment, I was really not expecting Microsoft to charge this arbitrarily hign number of CU's for us to use our own local resources!

Is there any way to drill into the CU cost of GEN 2 dataflows? By row? By CPU usage? By storage consumption?
As of now there isn't a good way for me to wrap my head around why these CU's are being consumed so greedily. The GEN2 dataflow (on the the Microsoft side) is doing almost nothing besides saving to a parquet file at the very end. It spends its time waiting for a response from the on-prem gateway which runs the mashup. It shouldn't be charging me for just waiting on a response.

Please let me know how to dig deeper into the CU's that are consumed in this example.

Anonymous · ‎10-13-2024

Hi @dbeavon3 ,

Here is the doc for Dataflow Gen2 pricing.

https://learn.microsoft.com/en-us/fabric/data-factory/pricing-dataflows-gen2

Hope it helps.

Best Regards,

Wearsky

dbeavon3 · ‎10-14-2024

Thanks for the link.

I will try to do a review of my gen2 dataflow to see if the behavior can be correlated to that pricing model.

One thing that seems clear to me is that customers are now paying for the passage of time. If a mashup is blocked, waiting on a service to respond (for locking or scheduling, or other reasons) and is NOT using the compute (or the RAM or the network) then it appears that the "meter is still running" and this background operation will still be accruing a CU charge during that period of time.

Paying for the passage of time, is directly contrary to what customers expect. It is not what customers expect when we pay for dedicated capacity. We expect to be paying for the underlying data center resources themselves (compute, RAM, network). When an operation is blocked, and is not consuming any resources (compute, RAM, network) then it is not a substantial cost to Microsoft, and it should not be costing customers anything either. We should be able to make use of that period of the time for our other workloads in that workspace, (ie. get an effective CU credit) rather than get double-billed for these resources that would be otherwise idle. We expect to be entrusted with the power to leverage our limited capacity resources to run as many workloads as possible, assuming there is still some available compute and RAM and network.

There are some very misleading explanations provided to customers about what we are paying for. As an example, check out the following table. This is a table which is currently referenced by the managers of a low-code software teams. The table reflects how they might think about "dedicated capacity":

https://learn.microsoft.com/en-us/fabric/enterprise/licenses

Notice that managers are shown a table showing that the cost of a fabric sku reflects the purchase of a dedicated vcore (or some multiple of vcores). Nowhere does it show that we are also paying for the passage of time, on any fabric item that was started but is currently idle.

An idle fabric item is not something customers expect to be paying for. Taking that one step further, even if the Fabric item is executing and is consuming the CPU on the on-premise gateway, that should NOT be decrementing from our Fabric compute either. The table above is leading managers to think they are paying for the v-cores at the remote service. The table doesn't give anyone reason to believe that Microsoft is charging customers to use our own on-prem CPU as well.

I'm glad I asked the question because I would never have believed Microsoft would be simply double-charging customers for the passage of time. I would have thought that a dedicated capacity referred to the underlying resources. Ie. some sort of tangible product or service. If I go into a fast food restaurant and buy a hamburger, I don't expect the staff to be starting a timer and charge me a secondary fee based on the passage of time while I'm eating it!

lbendlin · ‎10-14-2024

Paying for the passage of time, is directly contrary to what customers expect.  It is not what customers expect when we pay for dedicated capacity.

I think "dedicated" is not the right term here. I think "consumption based" is better.

We have dedicated P SKUs, we love them, and we gladly pay for the passage of time because they are dedicated capacities. There is no anxiety about sudden additional costs. We do not look forward to the time when we will be forced onto F SKUs.

dbeavon3 · ‎10-14-2024

@lbendlin

Yes, I find that there are many words like "dedicated" and "capacity" and "vcores" which are all totally misleading/dishonest in the context of the new fabric items like GEN2 dataflows. They need to drop those words from all their communication about Fabric, yet they are still very heavily used.

"Consumption based" can mean anything Microsoft wants it to mean. If we are consuming nothing more than the hours of our day, they'll happily charge us for that even if there is absolutely no product or service being delivered during that time. The part that bothers me is when these dataflows are blocked altogether, or they are only consuming the CPU in our own gateway servers. That should not be decrementing from our capacity at the PBI service during these moments.

It is no surprise this stuff is over-priced. Much of this functionality was moved from ADF which was massively over-priced as well, especially when running activities in a VNET environment. The problem is Microsoft is hoping that the target audience won't have a framework for understanding the relative value of what they are buying. Better yet, they hope users won't even try to understand the reason why CU's are being consumed at a given rate, or think about the how the CU's relate to the underlying resources that are being used (CPU, RAM).

To your point about "P" sku's, at least were somehow tied to the underlying Azure resources (image below).

... even in the days of P sku's, it was not clear what kind of VM was used under the hood for their "v-cores". But at least customers have a reasonable expectation that we were purchasing a tangible product or service , rather than just paying for the arbitrary passage of time (ie. "consumption based" charges coming out of thin air).

dbeavon3 · ‎10-14-2024

Helping customers manage our costs doesn't seem to be a goal; and it seems that Microsoft couldn't make that any more difficult than they are right now.

There is virtually no transparency to understand the rate of conversion from dollars to CU's, and subsequently from the CU's to some tangible service.

This stuff seems very similar to Wyndham dollars which have an arbitrary and unknown value once you've purchased them with your hard-earned USD.

bcdobbs · ‎10-12-2024

Not sure you can drill down further. I think you might get more information if you hook the workspace into log analytics in azure but not 100% sure on that.

I find a better pattern is to use a data pipeline to do a pure data ingest from on prem into a bronze layer and then do any transformations in the service (mostly use a notebook but I think you should see cost saving from gen2 dataflows as well).

Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Surprising cost of gen 2 dataflows running in on-premise gateway

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Surprising cost of gen 2 dataflows running in on-premise gateway

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026