Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
Hi.
I've used two methods to scale our embedded sku level dynamically. Both have a lag of many minutes, during which the users can't use the reports, or have long waiting times.
First I used azure alerts to trigger a runbook for increasing the sku level. This had a major ~six minute lag from the alert trigger to actually increasing sku level. This also had a downside that it might fail to increase the level sufficiently. Let's say the cpu trigger is at 80%, but it manages to increase up to 170% before the sku level rises. After the sku level change, the cpu drops to half, so 85%. If the level now rises, there won't be a second trigger, as it didn't drop under 80% from the first sku level change. So for the alert to trigger, the cpu level needs to rise from under 80% to over 80%.
Hoping to fix both issues, I switched into using a python script on our server, triggering a runbook webhook every two minutes. I created an azure log workspace to poll the logs from. The runbook checks the current embedded cpu one minute average use percentage, and then rises or lowers sku level according to the limits I've set. But this method too has a lag of about three minutes, which also results in the sku level increasing when it shouldn't, unless I change the webhook polling rate to three or four minutes.
I hope there's some method of achieving automatic sku level scaling without such significant lag? We're currently experiencing quite a bit of problems because of this with only about 40 potential users. We might need to support about 10000 potential users in the near future. Halting or slowing everything up for all of them due to the lag is out of the question. Perhaps webhook polling every minute could work, if it only didn't have such lag.
Actually, writing this I had the idea to research Hybrid Runbook Workers, if using such would help with the lag. Any other ideas?
Could using azure logic apps instead of our own server help at all? I don't think the lag comes from that phase though, and azure logic apps seemed like an overkill for such simple thing.
Solved! Go to Solution.
Hi @ebee,
Thank you for the response, scaling is the best step for now to help reduce the lag. And also yes logging and access to embedded CPU logs can sometimes introduce latency, once you get a chance to look into it, trying a more streamlined or lightweight logging method could help a lot. For now, I would recommend continuing with capacity scaling and monitoring the performance like how the lag changes.
Thanks and regards,
Anjan Kumar Chippa
Hi @ebee,
Thank you for reaching out to Microsoft Fabric Community.
Thank you @lbendlin for the prompt response.
As we haven’t heard back from you, we wanted to kindly follow up to check if the issue is resolved? or let us know if you need any further assistance.
Thanks and regards,
Anjan Kumar Chippa
Hi. Yes, I still need help. I don't think we can scale our power bi business unless I get the lag to 30 seconds or less.
As for the hybrid workers I mentioned, I don't think that will help. The culprit is logging, or accessing embedded cpu logs in timely manner afaik. I will have a look at some streaming method mentioned somewhere, but won't have time to work on this soon.
Hi @ebee,
Thank you for the response, scaling is the best step for now to help reduce the lag. And also yes logging and access to embedded CPU logs can sometimes introduce latency, once you get a chance to look into it, trying a more streamlined or lightweight logging method could help a lot. For now, I would recommend continuing with capacity scaling and monitoring the performance like how the lag changes.
Thanks and regards,
Anjan Kumar Chippa
Hi @ebee,
We wanted to kindly follow up to check if the solution I have provided for the issue worked? or let us know if you need any further assistance.
Thanks and regards,
Anjan Kumar Chippa
That lag is systemic, and it is even worse in regular Power BI/Fabric (eight minutes or more). Your telemetry will always be too much in the past. Not sure if this is caused by the batch ingestions in the telemetry eventhouses or by something else. It sure is infuriating.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 2 | |
| 1 | |
| 1 | |
| 1 | |
| 1 |