Solved: Re: Scale embedded sku level automatically - witho...

ebee · ‎10-24-2025

Hi.

I've used two methods to scale our embedded sku level dynamically. Both have a lag of many minutes, during which the users can't use the reports, or have long waiting times.

First I used azure alerts to trigger a runbook for increasing the sku level. This had a major ~six minute lag from the alert trigger to actually increasing sku level. This also had a downside that it might fail to increase the level sufficiently. Let's say the cpu trigger is at 80%, but it manages to increase up to 170% before the sku level rises. After the sku level change, the cpu drops to half, so 85%. If the level now rises, there won't be a second trigger, as it didn't drop under 80% from the first sku level change. So for the alert to trigger, the cpu level needs to rise from under 80% to over 80%.

Hoping to fix both issues, I switched into using a python script on our server, triggering a runbook webhook every two minutes. I created an azure log workspace to poll the logs from. The runbook checks the current embedded cpu one minute average use percentage, and then rises or lowers sku level according to the limits I've set. But this method too has a lag of about three minutes, which also results in the sku level increasing when it shouldn't, unless I change the webhook polling rate to three or four minutes.

I hope there's some method of achieving automatic sku level scaling without such significant lag? We're currently experiencing quite a bit of problems because of this with only about 40 potential users. We might need to support about 10000 potential users in the near future. Halting or slowing everything up for all of them due to the lag is out of the question. Perhaps webhook polling every minute could work, if it only didn't have such lag.

Actually, writing this I had the idea to research Hybrid Runbook Workers, if using such would help with the lag. Any other ideas?

Could using azure logic apps instead of our own server help at all? I don't think the lag comes from that phase though, and azure logic apps seemed like an overkill for such simple thing.

v-achippa · ‎10-29-2025

Hi @ebee,

Thank you for the response, scaling is the best step for now to help reduce the lag. And also yes logging and access to embedded CPU logs can sometimes introduce latency, once you get a chance to look into it, trying a more streamlined or lightweight logging method could help a lot. For now, I would recommend continuing with capacity scaling and monitoring the performance like how the lag changes.

Thanks and regards,

Anjan Kumar Chippa

View solution in original post

v-achippa · ‎10-27-2025

Hi @ebee,

Thank you for reaching out to Microsoft Fabric Community.

Thank you @lbendlin for the prompt response.

As we haven’t heard back from you, we wanted to kindly follow up to check if the issue is resolved? or let us know if you need any further assistance.

Thanks and regards,

Anjan Kumar Chippa

ebee · ‎10-27-2025

Hi. Yes, I still need help. I don't think we can scale our power bi business unless I get the lag to 30 seconds or less.

As for the hybrid workers I mentioned, I don't think that will help. The culprit is logging, or accessing embedded cpu logs in timely manner afaik. I will have a look at some streaming method mentioned somewhere, but won't have time to work on this soon.

v-achippa · ‎10-29-2025

Hi @ebee,

Thank you for the response, scaling is the best step for now to help reduce the lag. And also yes logging and access to embedded CPU logs can sometimes introduce latency, once you get a chance to look into it, trying a more streamlined or lightweight logging method could help a lot. For now, I would recommend continuing with capacity scaling and monitoring the performance like how the lag changes.

Thanks and regards,

Anjan Kumar Chippa

v-achippa · ‎11-03-2025

Hi @ebee,

We wanted to kindly follow up to check if the solution I have provided for the issue worked? or let us know if you need any further assistance.

Thanks and regards,

Anjan Kumar Chippa

lbendlin · ‎10-24-2025

That lag is systemic, and it is even worse in regular Power BI/Fabric (eight minutes or more). Your telemetry will always be too much in the past. Not sure if this is caused by the batch ingestions in the telemetry eventhouses or by something else. It sure is infuriating.

Scale embedded sku level automatically - without lag?

Helpful resources

FabCon & SQLCon – Barcelona 2026

Data Days 2026

Power BI DataViz World Championships - June 2026

Get Fabric or SQL Certified for Free.

Scale embedded sku level automatically - without lag?

Helpful resources

FabCon & SQLCon – Barcelona 2026

Data Days 2026

Power BI DataViz World Championships - June 2026