Load table from shortcut to ADLS Gen2 storage usin...

RM_dwalsh · ‎01-23-2025

So I have created a shortcut to some delta tables in ADLS Gen 2 storage using the storage account address and account key. I used Import schema and have a bunch of tables in DLHDEV02 under the schema AzureDevOps.

When I click on the lakehouse in my workspace, I can see the tables, preview the data and view the files etc.

I can also query these tables if I go to the SQL analytics endpoint which returns results perfectly

SELECT TOP (100) [Id],

...

FROM [DLHDEV02].[AzureDevOps].[Teams]

I can create a notebook, see all the tables etc. I can click the arrow beside the table to see the column names and datatypes. I can get the path of the table and load it using

df = spark.read.format("delta").load("abfss://<workspace>@onelake.dfs.fabric.microsoft.com/DLHDEV02.Lakehouse/Tables/AzureDevOps/Teams")

The issue arises when I right click on the table, select load data via spark which populates the following

df = spark.sql("SELECT * FROM DLHDEV02.AzureDevOps.Teams LIMIT 1000")

display(df)

This does not work and says the table is not found. What am I missing here?

Error is below. How can I resolve this as the end users will want to be able to query using spark sql?

---------------------------------------------------------------------------

AnalysisException Traceback (most recent call last)

Cell In[8], line 1

----> 1 df = spark.sql("SELECT * FROM DLHDEV02.AzureDevOps.Teams LIMIT 1000")

2 display(df)

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py:1440, in SparkSession.sql(self, sqlQuery, args, **kwargs)

1438 try:

1439 litArgs = {k: _to_java_column(lit(v)) for k, v in (args or {}).items()}

-> 1440 return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)

1441 finally:

1442 if len(kwargs) > 0:

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)

1316 command = proto.CALL_COMMAND_NAME +\

1317 self.command_header +\

1318 args_command +\

1319 proto.END_COMMAND_PART

1321 answer = self.gateway_client.send_command(command)

-> 1322 return_value = get_return_value(

1323 answer, self.gateway_client, self.target_id, self.name)

1325 for temp_arg in temp_args:

1326 if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception..deco(*a, **kw)

171 converted = convert_exception(e.java_exception)

172 if not isinstance(converted, UnknownException):

173 # Hide where the exception came from that shows a non-Pythonic

174 # JVM exception message.

--> 175 raise converted from None

176 else:

177 raise

AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `DLHDEV02`.`AzureDevOps`.`Teams` cannot be found. Verify the spelling and correctness of the schema and catalog.

If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.

To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 1 pos 14;

'GlobalLimit 1000

+- 'LocalLimit 1000

+- 'Project [*]

+- 'UnresolvedRelation [DLHDEV02, AzureDevOps, Teams], [], false

v-csrikanth · ‎06-24-2025

Hi @RM_dwalsh

We are following up once again regarding your query. Could you please confirm if the issue has been resolved through the support ticket with Microsoft?

If the issue has been resolved, we kindly request you to share the resolution or key insights here to help others in the community. If we don’t hear back, we’ll go ahead and close this thread.

Should you need further assistance in the future, we encourage you to reach out via the Microsoft Fabric Community Forum and create a new thread. We’ll be happy to help.

Thank you for your understanding and participation.

Best Regards,
Cheri Srikanth

Anonymous · ‎02-07-2025

Hi @RM_dwalsh,

We are following up to check whether you have raised the support ticket.If support ticket is raised kindly share the feedback for the issue raised. If a solution has been provided we kindly request you to share the solution with the community and mark it as the "Accept as Solution", as this will assist others facing similar challenges and benefit the broader community.

Please continue using Microsoft community forum.

Regards,
Pavan.

Anonymous · ‎01-27-2025

Hi @RM_dwalsh,

Thank you for reaching out in Microsoft Community Forum.

please follow below steps to resolve the issue;

1.You can view and query the tables in SQL Analytics and interact with them using the Spark DataFrame API (spark.read.format("delta").load(...)). This shows that the shortcut and storage access setup is configured correctly.

2.The error message indicates that Spark SQL is unable to recognise the fully qualified table name [DLHDEV02].[AzureDevOps].[Teams].

3.When using Spark SQL, the catalog and schema contexts are important. If the table is located in a particular catalog or schema that is not set as the default, it must be explicitly referenced.

Troubleshooting Steps:
1.Run the following commands to check the active catalog and schema:

print(spark.sql("SELECT current_catalog()").collect())
print(spark.sql("SELECT current_schema()").collect())

If the active catalog or schema is not set to DLHDEV02 or AzureDevOps, you will need to qualify the table name fully or set the context.

2.Explicitly set the catalog and schema in your session:

spark.sql("USE CATALOG DLHDEV02")
spark.sql("USE SCHEMA AzureDevOps")

After setting this, run the query again:

df = spark.sql("SELECT * FROM Teams LIMIT 1000")
display(df)

3.If setting the context is not feasible, always qualify the table name fully:

df = spark.sql("SELECT * FROM DLHDEV02.AzureDevOps.Teams LIMIT 1000")
display(df)

This ensures that Spark SQL knows where to look for the table.

4.If the table is not registered in the metastore, Spark SQL may not find it. To check if the table exists in the catalog:

spark.sql("SHOW TABLES IN DLHDEV02.AzureDevOps").show()

If the table Teams is not listed, it may not be registered.

5.If the table is missing in the metastore, you can register it manually:

spark.sql(f"""
CREATE TABLE DLHDEV02.AzureDevOps.Teams
USING DELTA
LOCATION 'abfss://<workspace>@onelake.dfs.fabric.microsoft.com/DLHDEV02.Lakehouse/Tables/AzureDevOps/Teams'
""")

Then retry the query.

6.Make sure the path in the shortcut matches the one used by the query:

df = spark.read.format("delta").load("abfss://<workspace>@onelake.dfs.fabric.microsoft.com/DLHDEV02.Lakehouse/Tables/AzureDevOps/Teams")
df.show()

If you found this post helpful, please consider marking it as "Accept as Solution" and select "Yes" if it was helpful. help other members find it more easily.

Regards,
Pavan.

RM_dwalsh · ‎01-27-2025

Hi, I have triied all these suggestions already unfortunately

1.Run the following commands to check the active catalog and schema:

print(spark.sql("SELECT current_catalog()").collect())
print(spark.sql("SELECT current_schema()").collect())

[Row(current_catalog()='spark_catalog')]

[Row(current_database()='8hgn8o..................................................................')]

2.Explicitly set the catalog and schema in your session:

spark.sql("USE CATALOG DLHDEV02")
spark.sql("USE SCHEMA AzureDevOps")

--------------------------------------------------------------------------- ParseException Traceback (most recent call last) Cell In[8], line 3 1 # Welcome to your new notebook 2 # Type here in the cell editor to add code! ----> 3 spark.sql("USE CATALOG DLHDEV02") 4 spark.sql("USE SCHEMA AzureDevOps") File /opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py:1440, in SparkSession.sql(self, sqlQuery, args, **kwargs) 1438 try: 1439 litArgs = {k: _to_java_column(lit(v)) for k, v in (args or {}).items()} -> 1440 return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self) 1441 finally: 1442 if len(kwargs) > 0: File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"😞 File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw) 171 converted = convert_exception(e.java_exception) 172 if not isinstance(converted, UnknownException): 173 # Hide where the exception came from that shows a non-Pythonic 174 # JVM exception message. --> 175 raise converted from None 176 else: 177 raise ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'DLHDEV02': extra input 'DLHDEV02'.(line 1, pos 12) == SQL == USE CATALOG DLHDEV02 ------------^^^

3.If setting the context is not feasible, always qualify the table name fully:

df = spark.sql("SELECT * FROM DLHDEV02.AzureDevOps.Teams LIMIT 1000")
display(df)

This ensures that Spark SQL knows where to look for the table.

This is what using the 'load table' feature was doing

5.If the table is missing in the metastore, you can register it manually:

--------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) Cell In[17], line 1 ----> 1 spark.sql(f""" 2 CREATE TABLE DLHDEV02.AzureDevOps.Teams 3 USING DELTA 4 LOCATION 'abfss://<workspace>@onelake.dfs.fabric.microsoft.com/DLHDEV02.Lakehouse/Tables/AzureDevOps/Teams' 5 """)

Py4JJavaError: An error occurred while calling o324.sql. : java.lang.AssertionError: assertion failed: Only managed tables are supported

Anonymous · ‎01-29-2025

Hi @RM_dwalsh,

Thank you for reaching out in Microsoft Community Forum.

Based on the issues you're encountering, here’s the updated solution:

1. you should fully qualify the table name in your queries since USE CATALOG and USE SCHEMA aren't working

df = spark.sql("SELECT * FROM DLHDEV02.AzureDevOps.Teams LIMIT 1000")
display(df)

2.The "only managed tables are supported" error suggests that Spark is expecting a managed table. Since your table is stored externally, you should register it as an external table like below

spark.sql(f"""
CREATE EXTERNAL TABLE DLHDEV02.AzureDevOps.Teams
USING DELTA
LOCATION 'abfss://<workspace>@onelake.dfs.fabric.microsoft.com/DLHDEV02.Lakehouse/Tables/AzureDevOps/Teams'
""")

3. Check if the table is properly registered using below code

spark.sql("SHOW TABLES IN DLHDEV02.AzureDevOps").show()

Please continue using Microsoft community forum.

If you found this post helpful, please consider marking it as "Accept as Solution" and select "Yes" if it was helpful and add "kudos". help other members find it more easily.

Regards,
Pavan.

RM_dwalsh · ‎01-29-2025

Hi, thanks for the reply. This also doesn't work as the 'table path already exists'. I would have expected that using the import schema function that it would define the schema and tables?

---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
Cell In[47], line 1
----> 1 spark.sql(f"""
2 CREATE EXTERNAL TABLE DLHDEV02.AzureDevOps.Teams
3 USING DELTA
4 LOCATION 'abfss://<workspace>@onelake.dfs.fabric.microsoft.com/DLHDEV02.Lakehouse/Tables/AzureDevOps/Teams'
5 """)

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py:1440, in SparkSession.sql(self, sqlQuery, args, **kwargs)
1438 try:
1439 litArgs = {k: _to_java_column(lit(v)) for k, v in (args or {}).items()}
-> 1440 return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
1441 finally:
1442 if len(kwargs) > 0:

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)
167 def deco(*a: Any, **kw: Any) -> Any:
168 try:
--> 169 return f(*a, **kw)
170 except Py4JJavaError as e:
171 converted = convert_exception(e.java_exception)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o324.sql.
: com.azure.storage.file.datalake.models.DataLakeStorageException: Status code 409, "{"error":{"code":"PathAlreadyExists","message":"The specified path already exists.\nRequestId:97d86a26-301f-000d-3c72-72ce0c000000\nTime:2025-01-29T17:26:17.6573026Z"}}"

If I try to change the table name that is being created there is a different error again stating that only managed tables are supported.

---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
Cell In[11], line 1
----> 1 spark.sql(f"""
2 CREATE EXTERNAL TABLE DLHDEV02.AzureDevOps.TeamsTest
3 USING DELTA
4 LOCATION 'abfss://<workspace>@onelake.dfs.fabric.microsoft.com/DLHDEV02.Lakehouse/Tables/AzureDevOps/Teams'
5 """)

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py:1440, in SparkSession.sql(self, sqlQuery, args, **kwargs)
1438 try:
1439 litArgs = {k: _to_java_column(lit(v)) for k, v in (args or {}).items()}
-> 1440 return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
1441 finally:
1442 if len(kwargs) > 0:

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)
167 def deco(*a: Any, **kw: Any) -> Any:
168 try:
--> 169 return f(*a, **kw)
170 except Py4JJavaError as e:
171 converted = convert_exception(e.java_exception)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o324.sql.
: java.lang.AssertionError: assertion failed: Only managed tables are supported

Anonymous · ‎01-30-2025

Hi @RM_dwalsh,

Thank you for reaching out in Microsoft Community Forum.

Please follow below steps to acheive 'table path already exists' error;

1. Check if the table already exists in the metastore using below code
spark.sql("SHOW TABLES IN DLHDEV02.AzureDevOps").show()

If the table exists, drop it before re-creating:
spark.sql("DROP TABLE IF EXISTS DLHDEV02.AzureDevOps.Teams")

2. If you're registering the table as an external table, ensure the path is correct and doesn't conflict with any existing data. You can try registering the table as an external table:

spark.sql("""
CREATE EXTERNAL TABLE DLHDEV02.AzureDevOps.Teams
USING DELTA
LOCATION 'abfss://<workspace>@onelake.dfs.fabric.microsoft.com/DLHDEV02.Lakehouse/Tables/AzureDevOps/Teams'
""")

If the error persists and Spark expects a managed table, create the table as a managed table:
spark.sql("""
CREATE TABLE DLHDEV02.AzureDevOps.Teams
USING DELTA
LOCATION 'abfss://<workspace>@onelake.dfs.fabric.microsoft.com/DLHDEV02.Lakehouse/Tables/AzureDevOps/Teams'
""")

3. make sure there are no conflicts with existing data paths when registering the table.

if still issue is persists, please raise a support ticket and provide them with all the troubleshooting steps you've already taken, which will help them understand the issue better and provide a resolution.

Below is the link to create Microsoft Support ticket:
https://learn.microsoft.com/en-us/power-bi/support/create-support-ticket

If you found this post helpful, please consider marking it as "Accept as Solution" and select "Yes" if it was helpful. help other members find it more easily.

Regards,
Pavan.