Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi all, here are some tests and questions regarding the pricing of data copy activities within Fabric pipelines.
tldr;
I'd like to know your opinion about the pricing of data copy activities. In my opinion, the consumption of pipeline data copy tasks is not billed correctly. Almost all operations are measured with the same number of CUs, independent of the duration and optimization. This leads into the conclusion that optimizing the data copy activities does not have a real impact to billing but reducing the number of elements absolutely has.
I did some investigations on a Fabric pipeline with a copy data task loading 12 tables from a test database to parquet. The copy data task is executed within a ForEach loop.
The target was to investigate how we can optimize the CU usage of copy data tasks.
What does Microsoft say? In the price breakdown on how the “Data movement” is charged, Microsoft states that the metrics are…
But, what is “intelligent optimization”? According to Microsoft’s “Copy activity performance and scalability guide” there are several things to consider like the parallel copy for partitioned sources or the intelligent throughput optimization. Source: Copy activity performance and scalability guide - Microsoft Fabric | Microsoft Learn |
So I made three tests with different settings, changing the intelligent throughput optimization setting (use Max vs Auto) and comparing it against setting the batch count in the ForEach loop to 6.
The batch count has a lot of impact to the duration, but the “ITO” setting does not:
Let’s have a look at the consumed CUs in the Fabric Metrics App:
All pipelines are charged the same. If we go deeper into the detail, we can check how many CUs are used by the single activities. According to Microsoft’s pricing calculation, the duration of the operations is important for the calculation of the costs:
https://learn.microsoft.com/en-us/fabric/data-factory/pricing-pipelines
That means that the duration should have direct impact to the CU calculation and costs. If we have a look at the single operations, they all have 360 CUs, independent of the runtime:
Not what I expected.
Taking the next statement out of a blog post here, this is also what I assume for the calculation:
In my eyes:
Source: Solved: Minimum CU (s) billing per copy? Or am I just bad ... - Microsoft Fabric Community
Let’s have a look at a real-life scenario up and running at a customer. If we check the correlation between duration of the operation and CUs, we actually see that almost all data movement operations have 360 CUs!
Actually, 99% of the operations at the customer result in 360 CUs.
If I look at the duration at least the operations with higher CUs are also “long-running” ones:
Here we see another thing: It seems that the CUs are calculated in 360-steps (maybe this is linked to a time calculation in seconds somehow ((60*60)/10)?
Confused by CU-seconds calculation - Microsoft Fabric Community
Since you have a Pro license you can open a Pro ticket at https://admin.powerplatform.microsoft.com/newsupportticket/powerbi
Somewhere they advise against having too many small files. But this sure looks like a bug.