March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
Register NowGet certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
Hello everyone,
I’m currently using the Microsoft Fabric Free Trial and have been encountering an issue with a pipeline I developed for ingesting data from a web server. The pipeline intermittently experiences significant delays during execution. In most cases, the majority of the tasks within a "For Each" loop execute in less than 30 minutes. However, at random intervals, one of the notebook calls will take an excessive amount of time to complete—sometimes exceeding 4 hours.
For example, as shown in the attached image, the pipeline was stuck on a single notebook execution for over 4 hours, even though the notebook successfully connected to the server and ingested the data. The pipeline seemed to be hanging at that same step, despite the data ingestion being complete. I eventually had to cancel the run after waiting for a long period. What makes this even more perplexing is that the issue occurs randomly. At times, the pipeline runs efficiently, while other times it drags significantly.
Has anyone else encountered similar issues with Microsoft Fabric pipelines, particularly inconsistent or prolonged execution times? I’d appreciate any advice or troubleshooting tips that might help resolve this erratic behaviour.
Thanks in advance for your help!
Solved! Go to Solution.
I think there are two distinct issues at play here, one you have noticed and one you might not be aware of.
Issue #1: Inconsistent processing time between invocations of any particular notebook:
This can be caused by a few things, including the time it takes to create a spark session (though, that shouldn't take more than a few minutes). You could also be hitting your spark pool max, or your Fabric capacity maximums. If you have access, you can view the compute usage in the Capacity Metrics Report. If you open up the individual notebook activites in the monitoring details, you can go to the spark monitoring, and there is a "resources (preview)" tab in there that should show you how efficiently your notebook utilized its allocated resources over the duration of its execution. Here you would also see if your spark pool was running out of resources, so notebooks were 'waiting' or 'queueing' for the pools resources to free up so they could allocate them.
There is also the real possibility that some of your data ingestions include more data than others. Which logically, you might expect would take longer to handle. Have you looked into the amount of data being written/read by these notebooks that are taking longer?
There is also the possibility that the API doesn't like you hitting it with a bunch of request very quickly, and might be delaying their responses. Have you logged the API calls from the notebook? You might want to look at when the API was called, when spark got a response, etc.
Issue #2: Notebook parallelization via the Pipeline for-each:
This is what I think you should look into if you really want to increase performance. The way you are executing one notebook at a time, in its own session, is not an effective use of the notebook's ability to perform dynamic parallelization and resource allocation on its own. I highly suggest using one invocation of a notebook which calls usesmssparkutils.notebook.runMultiple() to execute multiple invocations of a notebook, all in a single spark session, rather than simply calling one notebooks repeatedly in a for-each loop. This way, spark is able to really leverage its ability to perform concurrent tasks efficiently. This is a pretty decent article on the subject, which includes some examples of this method (and others) and has good explanations:
https://learn-it-all.medium.com/parallelism-in-spark-notebook-execution-in-microsoft-fabric-8fb6ac3f...
Best of luck!
I think there are two distinct issues at play here, one you have noticed and one you might not be aware of.
Issue #1: Inconsistent processing time between invocations of any particular notebook:
This can be caused by a few things, including the time it takes to create a spark session (though, that shouldn't take more than a few minutes). You could also be hitting your spark pool max, or your Fabric capacity maximums. If you have access, you can view the compute usage in the Capacity Metrics Report. If you open up the individual notebook activites in the monitoring details, you can go to the spark monitoring, and there is a "resources (preview)" tab in there that should show you how efficiently your notebook utilized its allocated resources over the duration of its execution. Here you would also see if your spark pool was running out of resources, so notebooks were 'waiting' or 'queueing' for the pools resources to free up so they could allocate them.
There is also the real possibility that some of your data ingestions include more data than others. Which logically, you might expect would take longer to handle. Have you looked into the amount of data being written/read by these notebooks that are taking longer?
There is also the possibility that the API doesn't like you hitting it with a bunch of request very quickly, and might be delaying their responses. Have you logged the API calls from the notebook? You might want to look at when the API was called, when spark got a response, etc.
Issue #2: Notebook parallelization via the Pipeline for-each:
This is what I think you should look into if you really want to increase performance. The way you are executing one notebook at a time, in its own session, is not an effective use of the notebook's ability to perform dynamic parallelization and resource allocation on its own. I highly suggest using one invocation of a notebook which calls usesmssparkutils.notebook.runMultiple() to execute multiple invocations of a notebook, all in a single spark session, rather than simply calling one notebooks repeatedly in a for-each loop. This way, spark is able to really leverage its ability to perform concurrent tasks efficiently. This is a pretty decent article on the subject, which includes some examples of this method (and others) and has good explanations:
https://learn-it-all.medium.com/parallelism-in-spark-notebook-execution-in-microsoft-fabric-8fb6ac3f...
Best of luck!
Thank you so much for your insightful response! Your detailed explanation of the two issues is incredibly helpful. I'll definitely look into the compute usage and check the Capacity Metrics Report to see if we're hitting any resource limits. The tip about the "resources (preview)" tab in the spark monitoring is great—I wasn't aware of that feature.
Regarding the data ingestion sizes and API call logs, those are excellent points that I hadn't fully considered. I'll investigate the data volumes and monitor the API interactions to see if they're contributing to the inconsistent processing times.
Your suggestion on notebook parallelization using usesmssparkutils.notebook.runMultiple() is exactly what I needed. Consolidating the tasks into a single Spark session sounds more efficient. I'll read the article you shared for more insight.
Thanks again for your assistance and the best wishes!
Hi @v-huijiey-msft,
Thank you for your response and suggestions. Unfortunately, I am still experiencing the same issue, even after enabling High Concurrency Mode. As shown in the image, the pipeline recently got stuck on a single notebook execution for over 8 hours, despite the notebook successfully connecting to the server and completing the data ingestion.
This erratic behaviour continues to be unpredictable, with prolonged delays occurring at seemingly random intervals. Any further guidance or troubleshooting steps would be greatly appreciated.
Best regards,
Ifeanyi
You can open the spark history server for the executing script and you should be able to figure out where in the script is getting stuck, based on the current running job or job timeline (look for green), which might help you troubleshoot.
Hi @Anyi ,
Can you try Spark Autotune? Does the troubleshooting documentation I posted below help you?
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Hi @ifeanyi ,
It seems that you are experiencing a significant delay due to free time while running your notebook in Fabric. This can often be attributed to the time it takes to start and manage a Spark session.
You can try the following:
Enable high concurrency mode. This mode allows multiple notebooks to share a single Spark session, reducing the overhead of starting a new session each time.
With High Concurrency Mode enabled, there's no need to start new spark sessions every time to run a notebook.
Configure high concurrency mode for notebooks - Microsoft Fabric | Microsoft Learn
Use Spark Autotune. Fabric's Spark Autotune feature automatically fine-tunes Spark Settings to optimize performance and reduce execution time without human intervention.
Fabric Spark Autotune and Run Series Job Analysis | Microsoft Fabric Blog | Microsoft Fabric
More information on troubleshooting techniques can be found in this document:
General troubleshooting - Microsoft Fabric | Microsoft Learn
If you have any other questions please feel free to contact me.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
Check out the October 2024 Fabric update to learn about new features.
User | Count |
---|---|
13 | |
8 | |
5 | |
4 | |
2 |
User | Count |
---|---|
26 | |
23 | |
15 | |
12 | |
5 |