Solved: Re: How to properly refresh Lakehouse SQL endpoint...

AnmolGan81 · ‎08-26-2025

Hi we are using below code to refresh the fabric lakehouse after loading lakehouse tables and then stored procedure execution that consume those lakehouse tables and warehouse tables to populate records, we want to make sure that sql endpoint for lakehouse is refreshed properly before we start the procedure executions, for now we are using below code to refresh 2 custom tables now we want to do it for all particular schema and make sure lakehouse tables have data before we even start calling procedures, how can we optimze the below code any suggesstions would be great:

# create Spark session

spark = SparkSession.builder \

.appName("Refresh SQL Endpoint Metadata") \

.getOrCreate()

# refresh PR_ADDTL_DATA and GRID_DATA Tables

spark.sql("REFRESH TABLE dbo.TEST")

spark.sql("REFRESH TABLE dbo.TEST2")

print("Metadata refresh triggered for XYZ Tables successfully.")

Vinodh247 · ‎08-27-2025

Instead of hardcoding table names, query the catalog for all tables in a given schema and loop through them.

from pyspark.sql import SparkSession

# create Spark session
spark = SparkSession.builder \
    .appName("Refresh Lakehouse SQL Endpoint") \
    .getOrCreate()

# define schema name
schema_name = "dbo"

# get all tables in schema
tables_df = spark.sql(f"SHOW TABLES IN {schema_name}")
tables = [row.tableName for row in tables_df.collect()]

# refresh each table
for table in tables:
    print(f"Refreshing table {schema_name}.{table} ...")
    spark.sql(f"REFRESH TABLE {schema_name}.{table}")

print(f"Metadata refresh completed for schema: {schema_name}")

If you are running this right after data ingestion, you need to wait until write jobs finish.
If ingestion is done within the same notebook, make sure to call spark.catalog.clearCache() to avoid stale metadata. But if its asynchronous consider implementing a checkpoint/audit table to track job completion and trigger refresh only after proper validation thtat the data has been landed.

Pls note that the REFRESH TABLE refreshes only the metadata, it does not reload the data unless there are structural changes. You can also force query compilation reset if you want (spark.catalog.refreshTable) but for heavy pipelines, limit the refresh to only the tables that has been changed.

Ingest data into Lakehouse.
Validate ingestion completion (checkpoint or audit).
Run REFRESH TABLE dynamically for all tables in schema
Validate row counts.
Trigger stored procedure executions.

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Regards,
Vinodh
Microsoft MVP [Fabric]

View solution in original post

frithjof_v · ‎08-27-2025

This is the proper API to refresh the Lakehouse SQL Analytics Endpoint:

https://learn.microsoft.com/en-us/rest/api/fabric/sqlendpoint/items/refresh-sql-endpoint-metadata?ta...

AnmolGan81 · ‎08-27-2025

We know that MS have released an API for endpoint refresh, but we cannot refresh all the tables at once since our pipelines are bifurcated between different schemas and set of tables and performed executions on ADF level, we need to refresh it schema or set of tables wise? Is there any way to call this api on a set kf different table levels and refresh only those set of tables??

frithjof_v · ‎08-27-2025

I think perhaps it will only refresh the tables which have changed since the last time.

In that case, it shouldn't matter much if you specify a few tables or refresh the entire SQL Analytics Endpoint.

Are you experiencing long duration of the refresh SQL Analytics Endpoint API call?

To answer your question specifically, I haven't seen any option to specify which tables to refresh. But I'm thinking maybe that's not needed if it only refreshes the tables which have changed since the last time.

AnmolGan81 · ‎08-27-2025

so when w perform incrmental loads, our pipeline timings are scheduled to run in such a way it overlaps with each others timings we cannot seggregate it, we can consider that all lakehouse tables keeps on reloading data that means we cannot call that API ones for all the lakehouse table in that way we need to call it again and again whenever a pipeline is at lakehouse data load step and I dont think an API can run multiple instances, that is why we want to run lakehouse table refresh at a time for particular pipeline that are related to lakehouse tables and refresh those only, if we cannot specifiy table level details in an API, then it will be no go for us at this point of time.

frithjof_v · ‎08-28-2025

"and I dont think an API can run multiple instances"

Have you tried it?

(I haven't, but I would try it before writing it off).

Or you can decide to refresh the SQL Analytics Endpoint only on the last pipeline, if you are concerned about refreshing the SQL Analytics Endpoint too often.

You can also create an Idea for this feature. Also, you can ask on Microsoft Fabric on Reddit.

AnmolGan81 · ‎08-28-2025

I have seen that behaviour with various MS refresh API's, anyways we dont know if it will work, but if it did the we dont want to keep on calling refresh api again and again in all the pipelines, we are looking for a seggregated approach here as per our architecture.

v-aatheeque · ‎09-01-2025

Hi @AnmolGan81
Thanks for sharing your inputs. As the @frithjof_v mentioned, running multiple instances of the refresh API isn’t currently supported, and the recommended path is to post this as an idea in the New Idea - Microsoft Fabric Community so the product team can review it.

Shahid12523 · ‎08-27-2025

Use this code to refresh all tables in a schema dynamically:

tables_df = spark.sql("SHOW TABLES IN dbo")
for row in tables_df.collect():
spark.sql(f"REFRESH TABLE dbo.{row.tableName}")

✅ No hardcoding
✅ Scales with schema changes
✅ Ensures metadata is synced before procedures run

Shahed Shaikh

Vinodh247 · ‎08-27-2025

Instead of hardcoding table names, query the catalog for all tables in a given schema and loop through them.

from pyspark.sql import SparkSession

# create Spark session
spark = SparkSession.builder \
    .appName("Refresh Lakehouse SQL Endpoint") \
    .getOrCreate()

# define schema name
schema_name = "dbo"

# get all tables in schema
tables_df = spark.sql(f"SHOW TABLES IN {schema_name}")
tables = [row.tableName for row in tables_df.collect()]

# refresh each table
for table in tables:
    print(f"Refreshing table {schema_name}.{table} ...")
    spark.sql(f"REFRESH TABLE {schema_name}.{table}")

print(f"Metadata refresh completed for schema: {schema_name}")

If you are running this right after data ingestion, you need to wait until write jobs finish.
If ingestion is done within the same notebook, make sure to call spark.catalog.clearCache() to avoid stale metadata. But if its asynchronous consider implementing a checkpoint/audit table to track job completion and trigger refresh only after proper validation thtat the data has been landed.

Pls note that the REFRESH TABLE refreshes only the metadata, it does not reload the data unless there are structural changes. You can also force query compilation reset if you want (spark.catalog.refreshTable) but for heavy pipelines, limit the refresh to only the tables that has been changed.

Ingest data into Lakehouse.
Validate ingestion completion (checkpoint or audit).
Run REFRESH TABLE dynamically for all tables in schema
Validate row counts.
Trigger stored procedure executions.

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Regards,
Vinodh
Microsoft MVP [Fabric]

AnmolGan81 · ‎08-27-2025

How can I check if table is being refrshed successfully after running the refresh table command, I have seen data not reflecting in the lakehouse even if I run the refrshe table command for that table, is there any time delay between this? How can I keep running the refresh until a particular table has a date, for example checking count of records and refreshing again if count is 0.

Also we dont have any notebooks for data load, we use ADF executions after lakehouse data populates, we need to add some interim step to run refresh of all the tables corresponding to a schema or lets say a paritcular query that gives table output and keep checking the counts and run table refresh again if any table has still 0 count?

How to properly refresh Lakehouse SQL endpoint?

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

FabCon is coming to Atlanta

How to properly refresh Lakehouse SQL endpoint?

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026