Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
Hi we are using below code to refresh the fabric lakehouse after loading lakehouse tables and then stored procedure execution that consume those lakehouse tables and warehouse tables to populate records, we want to make sure that sql endpoint for lakehouse is refreshed properly before we start the procedure executions, for now we are using below code to refresh 2 custom tables now we want to do it for all particular schema and make sure lakehouse tables have data before we even start calling procedures, how can we optimze the below code any suggesstions would be great:
This is the proper API to refresh the Lakehouse SQL Analytics Endpoint:
We know that MS have released an API for endpoint refresh, but we cannot refresh all the tables at once since our pipelines are bifurcated between different schemas and set of tables and performed executions on ADF level, we need to refresh it schema or set of tables wise? Is there any way to call this api on a set kf different table levels and refresh only those set of tables??
I think perhaps it will only refresh the tables which have changed since the last time.
In that case, it shouldn't matter much if you specify a few tables or refresh the entire SQL Analytics Endpoint.
Are you experiencing long duration of the refresh SQL Analytics Endpoint API call?
To answer your question specifically, I haven't seen any option to specify which tables to refresh. But I'm thinking maybe that's not needed if it only refreshes the tables which have changed since the last time.
so when w perform incrmental loads, our pipeline timings are scheduled to run in such a way it overlaps with each others timings we cannot seggregate it, we can consider that all lakehouse tables keeps on reloading data that means we cannot call that API ones for all the lakehouse table in that way we need to call it again and again whenever a pipeline is at lakehouse data load step and I dont think an API can run multiple instances, that is why we want to run lakehouse table refresh at a time for particular pipeline that are related to lakehouse tables and refresh those only, if we cannot specifiy table level details in an API, then it will be no go for us at this point of time.
"and I dont think an API can run multiple instances"
Have you tried it?
(I haven't, but I would try it before writing it off).
Or you can decide to refresh the SQL Analytics Endpoint only on the last pipeline, if you are concerned about refreshing the SQL Analytics Endpoint too often.
You can also create an Idea for this feature. Also, you can ask on Microsoft Fabric on Reddit.
Use this code to refresh all tables in a schema dynamically:
tables_df = spark.sql("SHOW TABLES IN dbo")
for row in tables_df.collect():
spark.sql(f"REFRESH TABLE dbo.{row.tableName}")
✅ No hardcoding
✅ Scales with schema changes
✅ Ensures metadata is synced before procedures run
Instead of hardcoding table names, query the catalog for all tables in a given schema and loop through them.
from pyspark.sql import SparkSession
# create Spark session
spark = SparkSession.builder \
.appName("Refresh Lakehouse SQL Endpoint") \
.getOrCreate()
# define schema name
schema_name = "dbo"
# get all tables in schema
tables_df = spark.sql(f"SHOW TABLES IN {schema_name}")
tables = [row.tableName for row in tables_df.collect()]
# refresh each table
for table in tables:
print(f"Refreshing table {schema_name}.{table} ...")
spark.sql(f"REFRESH TABLE {schema_name}.{table}")
print(f"Metadata refresh completed for schema: {schema_name}")
If you are running this right after data ingestion, you need to wait until write jobs finish.
If ingestion is done within the same notebook, make sure to call spark.catalog.clearCache() to avoid stale metadata. But if its asynchronous consider implementing a checkpoint/audit table to track job completion and trigger refresh only after proper validation thtat the data has been landed.
Pls note that the REFRESH TABLE refreshes only the metadata, it does not reload the data unless there are structural changes. You can also force query compilation reset if you want (spark.catalog.refreshTable) but for heavy pipelines, limit the refresh to only the tables that has been changed.
Please 'Kudos' and 'Accept as Solution' if this answered your query.
How can I check if table is being refrshed successfully after running the refresh table command, I have seen data not reflecting in the lakehouse even if I run the refrshe table command for that table, is there any time delay between this? How can I keep running the refresh until a particular table has a date, for example checking count of records and refreshing again if count is 0.
Also we dont have any notebooks for data load, we use ADF executions after lakehouse data populates, we need to add some interim step to run refresh of all the tables corresponding to a schema or lets say a paritcular query that gives table output and keep checking the counts and run table refresh again if any table has still 0 count?
User | Count |
---|---|
4 | |
4 | |
2 | |
2 | |
2 |
User | Count |
---|---|
10 | |
8 | |
7 | |
6 | |
6 |