Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Rfingerhut
Frequent Visitor

Dataflow intermittent long-running refresh. Not enough detail in logs to thoroughly troubleshoot.

We have been experiencing intermittent issues with a certain dataflow in the service that will randomly run much longer than usual for what seems like no apparent reason.  This particular dataflow is using IBM DB2 mainframe as it's data source, and the query is "select * from table", which has ~50k rows at any given time.  We have had our database team run a trace on the query to make sure it wasn't on their side, but they monitored it and confirmed that it runs sub-second and shouldn't be causing any delays.  We also confirmed that the end time of the Power BI refresh matched the start time of the query against the database, so there is a gap from the time Power BI starts the refresh to when the database actually gets it, but we're unsure what is happening during that time and why it's happening.

From the Power BI side, we have enabled extra logging on the gateways to try and see what might be causing the issue on our side, but there aren't enough details in the logs to give any insight.  A bit of research about concurrent dataflows in the service made us look into potential issues with too many dataflows refreshing at the same time, but we confirmed we were nowhere near the concurrent limit during the refresh times, or at any time for that matter.  

 

What are we missing?  Is there a setting or additional logging we can look into to find the root cause and fix the intermittent long-running refreshes?

SC Capacity Dataflow Refresh Times.png

6 REPLIES 6
cassidy
Power Participant
Power Participant

I'd bet it's related to other Dataflows (and Datasets) running, which I know you mentioned, but in my experience Dataflows & Datasets running in the same Workspace impact eachother.  The fact that 2 of your long runs are at the same time, something must be competing.

 

However, and I've never got an explanation for this, I see far more success when running Dataflows/Datasets in their own Workspace if they are high priority...even though they are on the same Capacity.  It seems to prevent a sort of "blocking". May be worth giving a test.

For what it's worth, we have dedicated workspaces for dataflows depending on the subject matter. We don't have any datasets in the same workspace. We only pull the dataflows into our models as needed, and then are published to separate workspaces.

@cassidy the results are in...

Here is the refresh history for the exact same dataflow, but is the only dataflow or object in a new workspace.
New test dataflow in its own workspaceNew test dataflow in its own workspace

Same exact refresh schedule for the existing dataflow in a workspace with multiple other dataflows.
Existing dataflowExisting dataflow

Yes that is good management of Workspaces in my opinion, but I'd still run a copy of that Dataflow independently in it's own Workspace for a few days and see what happens.  Scheduling it to run concurrently with the original could give an interesting result. If the test Dataflow refreshes quickly at times when the original lags, it may point to blocking within the original Workspace.

You'd think it's either your Source prioritizing inbound queries or BI Service prioritizing Dataflows, I believe you've confirmed the first thing is not happening.

I will try that.  Thank you for the suggestion.

The primary complaint is that the Service and logs fail to provide sufficient information to easily determine the root cause.  If it's a prioritization issue, it would be nice to see that detailed out somewhere.  Running these tests may not lead to a definitive answer, which puts me right back at square one.  😑

Oh I hear you. A lot of BI Service is "magic optimization", which both allows low experience folks to do great things ( me ), but at the same time will not reveal what's really happening (especially around refreshing).

 

For example, Dataset refreshes are also like a black box, except at least one helpful person figured out how to analyze the progress and display bottlenecks https://dax.tips/2021/02/15/visualise-your-power-bi-refresh/

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.