Don't miss your chance to take the Fabric Data Engineer (DP-600) exam for FREE! Find out how by attending the DP-600 session on April 23rd (pacific time), live or on-demand.
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
Hi,
I have some dataflow gen2 pipelines which are scheduled to run everyday. On some days , I can see the status of teh pipeline as succeeded but the data is not refreshing but when I run it manually later in the morning, data refreshes fine. Any ideas, why this could be happening and what we can do to prevent this.
Thanks
Sirisha
Solved! Go to Solution.
Hello @SirishaMurali sorry to know it didn't work. Could you programmatically force a refresh of SQL Analytics Endpoint using the script below. You can use a Python notebook for this.
import requests
import json
# Authenticate with Microsoft Fabric API
token = notebookutils.credentials.getToken("https://api.fabric.microsoft.com")
# Configuration
workspace = "<workspace_id>"
lakehouse_sql_endpoint = "<sql_analytics_endpoint_id>"
# API request headers
shared_headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
# Request body with timeout configuration
json_body = {
"timeout": {
"timeUnit": "Minutes",
"value": 2
}
}
# Refresh SQL Analytics Endpoint metadata
sync_sql_analytics_endpoint = requests.post(
f"https://api.fabric.microsoft.com/v1/workspaces/{workspace}/sqlEndpoints/{lakehouse_sql_endpoint}/refreshMetadata",
headers=shared_headers,
json=json_body
)
# Display the response
display(sync_sql_analytics_endpoint.json())
Hi @SirishaMurali,
We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.
Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are
MS Fabric community support
Hi @SirishaMurali,
We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.
Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are
MS Fabric community support
Thankyou @deborshi_nag for your reply. I have kept 45 minutes time gap between these two pipelines ( one from source to our bronze and the one between bronze to silver). Do you suggest keeping more than that?
The first pipeline usually runs fine but have this issue with the second pipeline sometimes.
First is a copy jib that gets data from source to lakehouse and the second(a pipeline that calls several other dataflow gen2 pipelines) takes this data , adds some transformation to it and loads into silver tables in the same lakehouse.
Please let me know if you need any further details.
Thanks for your help!
Cheers
Sirisha
Thank you @SirishaMurali for providing the additional information. You have set up a Copy Job (bronze) activity followed by Dataflow Gen2 (silver) activities, which indicates you may be encountering a metadata synchronisation issue.
When the Copy job writes to your Bronze Lakehouse tables, it creates parquet data files, delta logs, and updates the table definition. However, these changes may not be immediately visible to downstream processes, such as Dataflow Gen2, due to background synchronisation and commit operations within the Lakehouse storage layer. This is likely why your manual Dataflow Gen2 run in the morning correctly reads the updated bronze data.
Rather than relying on a fixed time gap, it is recommended to perform an explicit metadata refresh and a quick validation query. You can ensure this by inserting a notebook between the Copy job and the Dataflow Gen2 job, using a notebook activity that has something like:
REFRESH TABLE bronze.tableName;
This will prompt the Lakehouse engine to reread the delta logs and finalise the file state. Additionally, you may wish to check row counts or the latest timestamp before proceeding with the silver stage.
Thank you for your reply. Let me add the refresh table script and will come back to you if that fixed my issue. Thank you for your response.
Unfortunately, that rrfresh script didnt work either. Notebook succeeded but the downstream tables didn't update, they were still showing yesterday's data. Any other suggestions please?
Thanks
Sirisha
Hello @SirishaMurali sorry to know it didn't work. Could you programmatically force a refresh of SQL Analytics Endpoint using the script below. You can use a Python notebook for this.
import requests
import json
# Authenticate with Microsoft Fabric API
token = notebookutils.credentials.getToken("https://api.fabric.microsoft.com")
# Configuration
workspace = "<workspace_id>"
lakehouse_sql_endpoint = "<sql_analytics_endpoint_id>"
# API request headers
shared_headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
# Request body with timeout configuration
json_body = {
"timeout": {
"timeUnit": "Minutes",
"value": 2
}
}
# Refresh SQL Analytics Endpoint metadata
sync_sql_analytics_endpoint = requests.post(
f"https://api.fabric.microsoft.com/v1/workspaces/{workspace}/sqlEndpoints/{lakehouse_sql_endpoint}/refreshMetadata",
headers=shared_headers,
json=json_body
)
# Display the response
display(sync_sql_analytics_endpoint.json())
Hi @SirishaMurali,
Adding to @deborshi_nag's followup, can you confirm whether the 45 min gap is between the completion of the first pipeline and the start of the second? If not, it may be possible that the first pipeline, or underlying tables' Delta logs, are still in progress when the second pipeline kicks in.
Hi @stoic-harsh
Yes, that 45 minutes gap is between the completion of firstpipeline and kick-off time of the second pipeline.
Cheers
Sirisha
Hello @SirishaMurali
It's challenging to determine the root cause of the issue without further details about your data pipelines. Generally, this situation arises when the pipeline runs before the upstream system has provided new data; as a result, running it manually the next morning works because the data is then available.
Other factors such as schema drift, source timeouts, or missing parameters in Dataflow could also lead to your Dataflow running without errors being triggered by your pipeline.
If you could share more information about your data pipeline's processes, I would be happy to offer some guidance on what might be causing this behaviour.
Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
| User | Count |
|---|---|
| 14 | |
| 7 | |
| 6 | |
| 5 | |
| 5 |
| User | Count |
|---|---|
| 25 | |
| 23 | |
| 13 | |
| 12 | |
| 9 |