Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
dzav
Advocate II
Advocate II

SQL endpoint sync issues

I've been experiencing 3 different issues related to the SQL endpoint not being refreshed quickly enough or at all when the undelying delta table is updated with new data.

1) A pipeline will fail when a task (Dataflow Gen2, Copy, and Notebook tasks) populates the delta table, but a subsequent Stored Procedure task attempts to query the table via the SQL endpoint:

 

Failed to complete the command because the underlying location does not exist. Underlying data description: table '<tablename>', file 'https://onelake.dfs.fabric.microsoft.com/.../.../Tables/<tablename>/part-....snappy.parquet'.

 

This occurs randomly, and sometimes a delay of 1-2 minutes will be enough for the task to begin succeeding, while other times even tiny tables (under 20 records) can take 15+ minutes before the task succeeds. Adding retries and Wait tasks can help, but since delays are inconsistent, this doesn't always work.

 

2) A pipeline populates the delta table, but a subsequent Stored Procedure task that queries the table via the SQL endpont returns stale data.

 

Again, this is random, but since there's no failure, only the Wait task helps remedy most cases of this.

 

3) Using Synapse Link with the Link to Fabric feature will sync and generate a table to a Fabric lakehouse, however, when querying the table via SSMS the following error is returned:

Msg 24701, Level 16, State 21, Line 1

The query failed because the access is denied on table 'dbo.<tablename>, file 'https://onelake.dfs.fabric.microsoft.com/.../.../Tables/<tablename>/part-....snappy.parquet'.

 

To remedy this, the table must be opened via the Lakehouse SQL endpoint in the Power BI service.

 

Ultimately, the problem is that (1) pipeline tasks succeed without verifying that the sql endpoint is in sync, and (2) sql endpoint syncing time is inconsistent and ultimately non-performant. A table with 20 records shouldn't take 15+ minutes to reflect the changes in the sql endpoint.

1 ACCEPTED SOLUTION

@frithjof_v , thank you! Looks like that was the optimal solution. I now how the following pattern:

1) Load the bronze lakehouse

2) A copy task to load the bronze warehouse from the bronze lakehouse

3) A stored procedure to copy from the bronze warehouse to the silver warehouse

 

I don't know how scalabale this is because we have over a hundred tables from Synapse Link and adding tables to a copy task is a bit onerous, but for now this seems to be working.

View solution in original post

11 REPLIES 11
dzav
Advocate II
Advocate II

Since this was still not working, I suspected that the OBJECT_ID function was checking a system table which was updated before the sql analytics endpoint table was fully created and synced. So, I updated the logic to query the sql analytics endpoint table directly via a select statement (e.g. SELECT TOP 1 1 from dbo.table). As before, I put this into a stored procedure that used a WHILE loop to check for table creation every 30 seconds for 20 times, failing after 10 minutes.

 

The check would succeed on the first try, then proceed to try to load the table and fail. In other words, despite a SELECT TOP 1 1 succeeding in the check, the subsequent SELECT/JOIN query of that table still failed to find it. I then suspected that the using 1 instead of * erroneously succeeded because it wasn't checking any of the column metadata, so I changed this to SELECT TOP 1 * FROM dbo.table WHERE 1 = 2. What I found was that each query now took 2+ minutes and would sometimes succeed and othertimes fail with the same old "Failed to complete the command because the underlying location does not exist." message.

 

It turns out that this is intermittent and misleading. I now suspect that each query is hitting a hidden query timeout limit when it fails and it has nothing to do with the table not existing.

 

Back to square one. Again.

 

EDIT: After futher testing, I can confirm that this is the sequence of events.

1) The Notebook task succeeds in dropping and reloading the table (~2.5 minutes)

2) Queries against the table in SQL produce stale results (for 0-15 minutes)

3) Certain queries begin to fail (for another 0-15 minutes)

4) Queries begin to succeed again with fresh data

Thanks for sharing!

 

I'm struggling a bit to keep up with all the details. Also, I'm not so well versed with SQL.

 

Does it seem to you that when your WHILE loop finally finds the newly created table, then subsequent SQL queries will also consistenly find the table?

Or does the table seem to sometimes "go missing" again (i.e. not stable)?

 

I guess step 3) is what I am confused about. Is the table "unstable" in this period?

I'm finding that the table goes missing again (in step 3) after being found earlier in the process (step 2).

 

When I tested this, what I found was that when it is found in step 2, it's actually returning stale data (from prior to the load in step 1).

 

The latest and most stable (but still unstable) version of my process goes like this:

1) Drop the tables (using Spark SQL, so it's doing this via the delta lake)

2) Recreate and reload the tables

3) Wait 5 minutes (this is crucial)

4) Query each loaded table in SQL to make sure it exists every 30 seconds for up to 20 minutes

5) Query these tables in SQL to load into the next set of tables (silver warehouse)

 

With this much delay (up to 25 minutes), the process tends to succeed about 50% of the time (better than <10% of the time ).

I've seen some people suggest to only use Lakehouse and Notebook, and not use the SQL Analytics Endpoint. So only Lakehouse + Direct Lake for Power BI.

 

I've also seen some people suggest to use Data Pipeline Copy Activity when they need to move data from Lakehouse to Warehouse, instead of using SQL query.

And then do stored procedure inside the Warehouse, if needed to further transform the data.

 

These are some suggested methods to avoid being dependent on the Lakehouse SQL Analytics Endpoint.

Interesting! I will test it out!

@frithjof_v , thank you! Looks like that was the optimal solution. I now how the following pattern:

1) Load the bronze lakehouse

2) A copy task to load the bronze warehouse from the bronze lakehouse

3) A stored procedure to copy from the bronze warehouse to the silver warehouse

 

I don't know how scalabale this is because we have over a hundred tables from Synapse Link and adding tables to a copy task is a bit onerous, but for now this seems to be working.

dzav
Advocate II
Advocate II

Well, this is still not working. Despite using this statement to wait until the sql analytics endpoint table is created:

WHILE OBJECT_ID([lakehouse].dbo.[table]) IS NULL

 

Subsequent queries still fail with:

Failed to complete the command because the underlying location does not exist. Underlying data description: table 'dbo.table', file 'https://onelake.dfs.fabric.microsoft.com/guid/guid/Tables/table/part-guid.snappy.parquet'.

 

This makes no sense.

dzav
Advocate II
Advocate II

Quick update for anyone who's curious.

 

  • Per the suggestions found in the linked Reddit thread, I added a stored procedure that queries the bronze lakehouse source table(s) (through the sql endpoint) to see if they exist and does this repeatedly with a WHILE loop and a WAITFOR DELAY until it succeeds or runs out of tries
    • I've parameterized all of this as some processes will have different requirements
    • In this case I do up to 20 checks every 30 seconds and fails the process if it doesn't locate the table after 10 minutes
  • This works well because an earlier step already explicitly drops the source table(s) using Spark SQL in a Notebook task (you can only drop tables via the lakehouse thus necessitating Spark SQL, not via the SQL analytics endpoint
  • Additionally, the subsequent step that that populates the target silver table has some retries because (and I don't know why) despite the earlier check, this can still sometimes fail to find the table

I'm going to keep monitoring this and see if I missed something, but the process is more stable and tight (as I'm only waiting as long as necessary), yet still not completely reliable. Ultimately, if Fabric was a more robust product, I shouldn't have to add all of these unnecessary checks. The sql endpoint objects should just be in sync with their lakehouse counterparts. I'll be speaking with our Microsoft reps.

dzav
Advocate II
Advocate II

Thanks! Is anyone aware if Microsoft is treating this issue as a bug and something that they are working on fixing? Support has basically said that since the sql endpoint has to sync metadata, we shouldn't expect the delta to match. In other words, this is working as intended according to Microsoft support. Practically, from an end user perspective, this is super broken.

Voice your opinion with your Microsoft reps.  Raise tickets.  Vote for ideas.

Helpful resources

Announcements
Sept Fabric Carousel

Fabric Monthly Update - September 2024

Check out the September 2024 Fabric update to learn about new features.

Expanding the Data Factory Forums

New forum boards available in Data Factory

Ask questions in Apache Airflow Job and Mirroring.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.