Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Dear Fabricators,
I am facing the following challenge.
I am copying .json files from an API into my lakehouse. The API uses the OAuth2 authentication protocol.
When the pipeline starts, I am refreshing my tokens within a notebook and save them into Azure Key Vault.
The access token is only valid for 1 hour, so if my copy takes more than 1 hour, my next Copy Activity in the ForEach loop will fail.
My questions is, how can I rerun the whole pipeline if the 1 hour runs out and a Copy Activity fails?
Note, that I would like to have the rerun executed automatically, without the need to press the rerun button.
Here is a picture of the pipeline for reference.
Thank you in advance!
Solved! Go to Solution.
Thanks @v-nuoc-msft,
I don't think you can invoke the pipeline you are currently running, as it would be a recursive or circular reference, but I could add another pipeline which is the copy of the original, and thus achieve the same result without referencing the current running pipeline.
This is very quirky in my opinion, so in the end I solved the issue by replicating the Copy Activity in a Notebook. The Notebook code can handle token handling and loading and saving the API response into a .json file for each item.
Hopefully later in the Fabric developement OAuth2 authentication, forEach conditions and pipeline reruns (and writing secrets to Key Vault) will be handled. For now, Notebook saved the day. Thank you for your time and answer Nono 🙂
Hi @x_mark_x
You can try implementing a refresh token in Azure Functions.
First, create the Azure Function App.
From the Azure portal menu or the Home page, select Create a resource.
In the New page, select Compute > Function App.
Then create the timer trigger function.
In your function app, select Overview, and then select + Create under Functions.
Under Select a template, scroll down and choose the Timer trigger template.
Timer trigger for Azure Functions | Microsoft Learn
After Azure Function has successfully refreshed the token, save the new access token to Azure Key Vault or Global parameters.
In your data pipeline, add a Web activity or HTTP activity to invoke Azure Function.
In other activities in the pipeline, use the latest access token.
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I understand correctly, you proposed to have the token refresh on a timer, and instead of rerunning a pipeline, this would automatically refresh the tokens, so they are up to date?
Do I understand right that there is no simple way to just rerun the pipeline instead, without the need of manually clicking Run again?
It would be so easy to just use Invoke Pipeline on fail and run the same pipeline again.
Or to run the RefreshTokens notebook on fail, and if that specific notebook is succeeded, connect it back to an earlier Activity. All these create a cycle so the pipeline can not be run.
Your solution makes sense but it is a big roundabout way of achieving something (I consider) simple. Unfortunately our Azure platform team doesn't have time to go through with this, and I dont have admin rights in Azure.
Is there any other more simple way to rerun a pipeline?
My shortcut idea now is to just simply schedule the pipeline to run 3 times (1:00 AM, 2:05 AM, 3:10 AM), so the whole pipeline runs, and each time the tokens are refreshed. In the worst case the forEach loop just doesn't run, as there are no new items to Copy...
Hi @x_mark_x
Do you understand the Retry policy?
Set a retry policy for critical activities, such as obtaining tokens and copying data, to ensure automatic retries in the event of temporary failures.
If you set the number of retries to 3, the activity automatically retries up to 3 times after the first failure.
This may be helpful to you.
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thank you @v-nuoc-msft,
I am familiar with the Retry Policy for individual Activities, and have already set the retries to 3 for the token refresh Notebook Activity.
This will make sure that if the Notebook fails, it will run it again.
My question was regarding the whole pipeline, not a single Activity.
Let's suppose, that all goes well, the Notebook Activity creates and saves a new token into the Key Vault. (fingers crossed that the notebookutils.mssparkutils.credentials.putSecret method is implemented soon, but that is another topic 😅)
The pipeline continues until the forEach loop, and the files are copied sequentially one after the other. Refer to my initial screenshot of the pipeline for reference.
If the Copy Activity takes more than an hour, and my access token is expired, the API response will be "message: unauthorized". Even if I set the Retry Policy on the Copy Activity to 3, there are no token refresh happening, because the token refresh Notebook Activity is upsteam.
As to my best konwledge I can not just back to an upstream process, to rerun the whole or partial pipeline.
I see the following quirky option for now:
On Copy Activity failure (within the forEach Activity) I could call the refresh token Notebook Activity and than the GetSecret Web Activity to fetch the current token and pass that output parameter into another Copy Activity which practically does the same as the first. I am kind of creating the same structure as outside of the forEach loop to mimic the pipeline run.
Or as mentioned before, run the pipeline 3 times in 3 different times.
Is there any other less quirky way?
Thank you in advance
Hi @x_mark_x
Is it possible to set a loop condition in the Microsoft Fabric data pipeline? The refresh token is invoked when the ForEach activity fails, and then the ForEach activity is executed again.
Add a Copy activity to the ForEach activity to copy files.
In the "Activity failure" path of the Copy activity, add a Notebook activity to refresh the token.
Configure the Notebook activity to get the new token and store it in the Azure Key Vault.
After the Notebook activity, add a Web activity that gets the new token from the Azure Key Vault and stores it in a variable.
After obtaining the Web activity for the token, add an Execute Pipeline activity that invokes the subpipeline containing the ForEach activity.
Make sure to pass the new token variable in the subpipeline so that the ForEach activity uses the latest token.
In the main pipeline, add a conditional activity to check that the ForEach activity completed successfully.
If the ForEach activity fails, the Notebook activity that refreshes the token and the Web activity that retrieves the token are invoked, and the ForEach activity is re-executed.
If you have any other ideas, please continue to discuss them with me.
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thanks @v-nuoc-msft,
I don't think you can invoke the pipeline you are currently running, as it would be a recursive or circular reference, but I could add another pipeline which is the copy of the original, and thus achieve the same result without referencing the current running pipeline.
This is very quirky in my opinion, so in the end I solved the issue by replicating the Copy Activity in a Notebook. The Notebook code can handle token handling and loading and saving the API response into a .json file for each item.
Hopefully later in the Fabric developement OAuth2 authentication, forEach conditions and pipeline reruns (and writing secrets to Key Vault) will be handled. For now, Notebook saved the day. Thank you for your time and answer Nono 🙂
Hi @x_mark_x
Thank you very much for sharing! If you have any other questions please continue to ask in the forum.
If your problem has been solved, please accept it as solution.
Regards,
Nono Chen
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
Check out the November 2024 Fabric update to learn about new features.
User | Count |
---|---|
5 | |
5 | |
2 | |
2 | |
1 |
User | Count |
---|---|
16 | |
11 | |
7 | |
6 | |
6 |