Solved: Re: T-SQL Notebook vs. PySpark Notebook - Pipeline...

WDixon2025 · ‎03-22-2025

Hello Everyone,

I am new to this forum, so please feel free to direct my question elsewhere, if it is more appropriate.

BACKGROUND: My team created a Pipeline triggering our T-SQL Notebook to run using DROP TABLE IF EXISTS and CREATE TABLE statements to overwrite/update our table in the designated Warehouse.

ISSUE: When we run the T-SQL Notebook manually it works perfectly and refreshs with the latest data on the first attempt. However, when we use the Pipeline, it doesn't work the first time. Sometimes when we run the Pipeline 3 times, we finally get refreshed data results. We have tried putting delay activities between the steps in the Pipeline. However, that doesn't work consistently either.

TESTING: We have performed a comparsion using T-SQL Notebook vs. PySpark Notebook. The PySpark Notebook works perfectly with the Pipeline and has no problems refreshing the data on the first attempt. However, as stated in the ISSUE section above, the T-SQL Notebook does not work on the first Pipeline refresh attempt.

QUESTION: Has anyone ran into this issue using T-SQL Notebooks? And if so, what solution(s) have you figured out to assure your Pipeline refreshes and works successfully on the first attempt?

THANK YOU so much for your time, effort, and guidance,

WDixon2025

v-saisrao-msft · ‎03-25-2025

Hi [Recipient's Name],

Thank you for the update. I’m glad to hear that upgrading to Spark Runtime 1.3 has enhanced the consistency of the Pipeline refreshes.

Given that the issue was resolved following the upgrade, it’s likely that the previous runtime version experienced execution delays or metadata commit inconsistencies affecting the T-SQL operations. In contrast, PySpark may have managed metadata updates more efficiently, which could explain why it performed without any issues.

To ensure continued stability, I recommend the following actions:

Monitor Execution Logs: Leverage Fabric’s monitoring tools to verify that all runs are complete successfully and to detect any hidden errors or delays.
Validate Metadata Commit Times: If feasible, introduce a WAITFOR DELAY '00:00:05' statement after table creation to assess whether metadata commit timing was a factor.
Maintain Up-to-Date Runtime Versions: Since the upgrade had a positive impact, staying on the latest stable version is advisable to mitigate the risk of similar issues in the future.

If this resolves your issue, kindly consider accepting your response as the solution. Doing so will help other community members facing similar challenges.

Thank you.

View solution in original post

WDixon2025 · ‎03-24-2025

Thank you so much for the thorough response and all the great suggestions! We did evaluate/test everything you mentioned before I posted. We did just change our Spark settings today --> Runtime to 1.3 and tested the Pipeline again with new data and it refreshed on the first pipeline attempt - YAHOO!!! Only time will tell, but we are hopeful that maybe this Runtime upgrade resolved our issue.

THANK YOU again for the collaboration!!!

v-saisrao-msft · ‎03-25-2025

Hi [Recipient's Name],

Thank you for the update. I’m glad to hear that upgrading to Spark Runtime 1.3 has enhanced the consistency of the Pipeline refreshes.

Given that the issue was resolved following the upgrade, it’s likely that the previous runtime version experienced execution delays or metadata commit inconsistencies affecting the T-SQL operations. In contrast, PySpark may have managed metadata updates more efficiently, which could explain why it performed without any issues.

To ensure continued stability, I recommend the following actions:

Monitor Execution Logs: Leverage Fabric’s monitoring tools to verify that all runs are complete successfully and to detect any hidden errors or delays.
Validate Metadata Commit Times: If feasible, introduce a WAITFOR DELAY '00:00:05' statement after table creation to assess whether metadata commit timing was a factor.
Maintain Up-to-Date Runtime Versions: Since the upgrade had a positive impact, staying on the latest stable version is advisable to mitigate the risk of similar issues in the future.

If this resolves your issue, kindly consider accepting your response as the solution. Doing so will help other community members facing similar challenges.

Thank you.

v-saisrao-msft · ‎03-24-2025

Hi @WDixon2025,

Welcome to the Microsoft Fabric forum and thank you for your thorough explanation of the issue! It's great to see that you've conducted some testing and found that PySpark performs well within the same Pipeline. This is valuable information.

When using DROP TABLE IF EXISTS and CREATE TABLE, make sure no other processes are accessing the table at the same time.
Although you have tried using delay activities, consider configuring a Retry Policy in your Pipeline. Go to your Pipeline activity → Settings → Retry → Adjust the number of retry attempts and interval. This can help in case of transient issues.
Ensure your Warehouse has adequate resources during Pipeline runs. Check for any throttling or capacity issues using the monitoring tools available in Fabric.
After executing T-SQL Notebooks, sometimes data may not reflect immediately due to caching.

If this post helps, then please give us ‘Kudos’ and consider Accept it as a solution to help the other members find it more quickly.

Thank you.

T-SQL Notebook vs. PySpark Notebook - Pipeline Performance Comparison

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

T-SQL Notebook vs. PySpark Notebook - Pipeline Performance Comparison

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025