Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
We run F64 in the North Central region. I have a two month outage in the "capacity metrics app". It broke at the start of November. We spoke with the Mindtree organization and were given two different ETA's for a fix, but both of those dates were missed.
This app seems to be very buggy, and needs a great deal of support, yet we can't figure out how it is being supported. The contacts we have at Mindtree seem to be unable to give us accurate information about these outages, and their ETA guestimates are totally unreliable.
The capacity metrics app is pretty critical to PBI customers. While it is one of the most unfriendly admin tools I have ever used, yet there is no other tool for managing the (expensive) resource hogs in Fabric, like pyspark notebooks, vnet gateways, and gen2 dataflows.
So a two month outage in this app, it is crippling to Fabric customers. I have never seen another cloud-hosted product where this would be considered acceptable. For some reason this Fabric SaaS gets free rein to be extremely buggy, and to lack any sort of communication or accountability.
Here are some questions about this app:
- Who should we be contacting to get our metrics app fixed when it is broken? ... I have been using the support tickets in the admin portal but these tickets don't actually go to Microsoft. They go to the Mindtree partners who don't seem to have any awareness of our outages. The tickets result in a lot of second-hand miscommunication which is doing more harm than good. (eg. ETA's were shared with us, which were probably not even from Microsoft to begin with)
- Where would these outages be published? Should we expect to find them in the "known issues" list?
- How are the versions and lifecycles expected to work? Is there any place I can find a table the historical list of versions, and the version numbers that are currently under full support? I believe I'm still on a recent version with full support (v.1.2.1). But knowing how rapidly Microsoft iterates on this buggy app, I'm guessing there are other versions as well. If someone can point me to reference material that lists the versions of the app (including current version and supported versions) then that may resolve some confusion. I have not seen such a thing, despite many attemps to find it in the past. The version number that I'm using "1.2.1" is hardly found in any of my searches. (image below)
Any information would be greatly appreciated. I have often wasted a lot of my amount of time trying to resolve the bugs in this app, and I'm hoping to reduce that time in the future. I appreciate the assistance from Mindtree, but I've found that they are unlikely to know anything more about the bugs and outages than the customer knows. Their back-end PG (.. "power bi administration"?) is ultimately the weak link. The Mindtree engineers may be eager to help, yet they are unable to do so if the PG is not being supportive, nor providing accuracy in the back-end ("IcM") channels.
Solved! Go to Solution.
Hi @dbeavon3
What happens if you had to create a new app and start using that? You could potentially then start seeing the data.
Thanks again for the pointers
I spoke with an engineering manager today after my two-month outage.
He wanted to share that in Q1 Microsoft will be deploying the capacity app as part of the core product,
This will be a very welcome change! Sometimes customers feel like we ourselves are responsible for the software’s reliability, since we had “installed it”. In reality we have little control over the behavior of the app.
Another interesting change that is coming is an event hub (kafka?) source that would allow customers to consume our own CU events in a near-realtime way:
Once these two items are delivered, customers should spend a lot less time with the ongoing maintenance of the capacity metrics app. The EM said we may be able to create our own alerting, or even automate the up-scaling and down-scaling of our Fabric capacity, based on capacity usage. This will probably cost a lot less, and give more flexibility than the current implementation of "autoscale" in premium capacities.
As far as the two-month outage is concerned, it seems to be triggered by overages. Also it seems to be region-specific. (it affects North Central US but not East US). Reinstalling the application is NOT a reliable solution - while there is a small chance it may work, the app will stop working again after the next overage.
Hi @dbeavon3
What I have done for enterprise customers, or customers running premium or fabric capacities, is I have a custom solution which extracts all the data from the metrics app into a different location so that it is not only easier to report on, but also it keeps all the history. That is something I would suggest looking into.
Hi @GilbertQ ,
Thanks for the post.
I once saw a blog about this as well. I think it relies on the semantic model in that workspace.
However, I'm guessing that my immediate problem is a lot deeper than that. I have a two-month-long outage at the moment. (north central US). The "timepoint" data is simply not there.
... bugs are not the problem, per se. Bugs will always arise, in any software. I'm more interested to learn about the supportability of this app. It is critical to our operations, and the fact that we are experiencing an extensive outage without support is very concerning. Have you ever had the need to get in touch with Mindtree to get support for this app? How did it go? Did you ever get to talk to the PG at Microsoft?
After a two month outage, I'm under the impression that the app is not well supported. I don't think it is the fault of Mindtree either. There is no engagement whatsoever from the back-end PG (at Microsoft). It feels they might live on a distant planet. The time that it takes to get communication back and forth is at least two months. I really wish I understood how things came to be this way.
As a sidebar, I tried another version of the app today (1.5.1 rather than 1.2.1). It isn't working either.
Hi @dbeavon3
What happens if you had to create a new app and start using that? You could potentially then start seeing the data.
Hi @GilbertQ
... I tried another version of the app today (1.5.1 rather than 1.2.1). See my last post where I mentioned that.
At first it didn't seem to be picking up the timepoint details. But I checked again this evening, and the timepoint details are in there now!
Thanks for the tip. Can I ask how you came to know about this?
I wish the Mindtree folks would have walked me thru the steps two months ago. Is it possible that they don't know about the trick of creating a new app?
None of this actually answers my questions about the supportability of this metrics app. Who was supposed to help me with this? It is not ideal if customers must spend two months on tickets with Mindtree and it takes them nowhere. I'm glad I'm back up and running again.... but the only lesson I learned out of this is to avoid wasting time with the Mindtree support channels. (I'm not sure that was the right take-away....)
hi @dbeavon3
I guess the reason I know this is through working with Microsoft fabric, since its inception and just having paid around a lot with metrics app in order to do custom analysis for customers. I spent a significant amount of time working through all the details.
@GilbertQ I think exporting the data from the Kusto database is counterproductive. The data needs to be actioned on in real time ( plus the very aggravating 8 minute delay). Looking at data of a capacity crash two days ago is not really useful.
What irks me most is the half-bakedness. The semantic model is really bad mess of two different data models mashed together, the data between them is inconsistent, and KQL queries are very very slow. We have a ticket open since March 2024 to get rid of the "Power BI Service" user CUs. Agree with @dbeavon3 that this app is seriously under supported which makes Power BI and Fabric governance unnecessary hard.
Hi @lbendlin
I totally agree with you. And it is currently not great that is why building a custom solution makes it easier to manage.
Thanks again for the pointers
I spoke with an engineering manager today after my two-month outage.
He wanted to share that in Q1 Microsoft will be deploying the capacity app as part of the core product,
This will be a very welcome change! Sometimes customers feel like we ourselves are responsible for the software’s reliability, since we had “installed it”. In reality we have little control over the behavior of the app.
Another interesting change that is coming is an event hub (kafka?) source that would allow customers to consume our own CU events in a near-realtime way:
Once these two items are delivered, customers should spend a lot less time with the ongoing maintenance of the capacity metrics app. The EM said we may be able to create our own alerting, or even automate the up-scaling and down-scaling of our Fabric capacity, based on capacity usage. This will probably cost a lot less, and give more flexibility than the current implementation of "autoscale" in premium capacities.
As far as the two-month outage is concerned, it seems to be triggered by overages. Also it seems to be region-specific. (it affects North Central US but not East US). Reinstalling the application is NOT a reliable solution - while there is a small chance it may work, the app will stop working again after the next overage.
I'll believe it when I see it. This deliverable has slipped a couple times already.
Did the EM say Kafka or Kusto?
He said Azure Event Hubs, but it isn't clearly stated in the announcement. I'm guessing the decision is finalized if they have a tentative Q1 delivery date. ... but he also mentioned "Log Analytics" once. I think that was in regards to the status-quo (the implementation under the capacity metrics app today).
... I don't actually use it yet, but heard that Azure Event Hubs has a full Kafka API. Event Hubs is a proprietary platform, but I think Microsoft was forced to implement the Kafka API for compatibility reasons. Nobody wants to have a dependency on a super-proprietary messaging API. Messaging isn't all that interesting, to be honest.
Here is a google result, that supports the information that I had heard about "Event Hubs" in the past:
@lbendlin
He said Real-Time hub, but I interpreted that to be Event Hubs. Probably just Fabric's own marketing term:
https://learn.microsoft.com/en-us/fabric/real-time-hub/real-time-hub-overview
At 1:20 in the video (haven't watched it all myself yet):
https://www.youtube.com/live/pSyi7d5NFQM
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
50 | |
32 | |
27 | |
26 | |
25 |
User | Count |
---|---|
62 | |
49 | |
30 | |
24 | |
23 |