Solved: Refresh Lakehouse after load data with copy activi...

jgarcia-alvarez · ‎11-26-2024

Hi all,

I have created a Data Factory pipeline in Fabric to move data among the different layers (Lakehouses) using Notebooks and Copy activities and once data is in Gold Lakehouse I need to replicate it in a Warehouse. For that I am using a stored procedure, but it is not able to read the recent data until I refresh the Lakehouse manually. I added a Wait (60 seconds) activity before the Stored Procedure but it not refreshing automatically.

Could you let me know how to create an activity / set of activities to refresh the Lakehouse? As we don't have a specific activity for that is there any best practice?

Thanks in advance for your help.

Anonymous · ‎11-26-2024

Hi @jgarcia-alvarez

It looks like you want to make sure the data is up to date in Lakehouse before running the stored procedure.

You might consider using a custom script activity or notebook in the pipeline. This script can programmatically trigger a refresh of Lakehouse. For example:

# Example script to refresh Lakehouse
import requests

# Define your Lakehouse refresh endpoint and authentication details
refresh_url = "https://your-lakehouse-endpoint/refresh"
headers = {
    "Authorization": "Bearer your_access_token",
    "Content-Type": "application/json"
}

# Send the refresh request
response = requests.post(refresh_url, headers=headers)

if response.status_code == 200:
    print("Lakehouse refresh initiated successfully.")
else:
    print(f"Failed to refresh Lakehouse: {response.status_code}")

Ensure that the custom script activity is executed after data replication is complete.

I hope this helps you with your thoughts.

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Anonymous · ‎11-26-2024