Solved: Spark SQL Queries for Tables in Custom Schema of a...

Pushkar_AK · ‎03-19-2025

Hi Community,

Hopy you all are doing well.

Need you help for how to read table in a custom schema using spark SQL.

Below are the details for the error im getting while reading the table:

I have created a lakehouse in my workspace with Lakehouse schemas (Public Preview) enabled.

I have added a new schema in the lakehouse, named wind.

I have created a table using below pyspark code:

df_write.write.format("delta") \
                .mode("overwrite") \
                .option("mergeSchema", "true") \
             .save('abfss://RWECEDAP_DataEngineering@onelake.dfs.fabric.microsoft.com/rwece_dap_silver.Lakehouse/Tables/wind/bbii')

Please suggest if any better way of doing writting data in the custom schema of a lakehouse.

My read code is follows which through Exception while running(I got this read code by dragging and dropping table into a code cell and get a code snippet that refers to the schema where the table is located)

df = spark.sql("SELECT * FROM rwece_dap_silver.wind.bbii LIMIT 1000")
display(df)

Error im getting while reading the above table:

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
Cell In[167], line 1
----> 1 df = spark.sql("SELECT * FROM rwece_dap_silver.wind.bbii LIMIT 1000")
      2 display(df)

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py:1631, in SparkSession.sql(self, sqlQuery, args, **kwargs)
   1627         assert self._jvm is not None
   1628         litArgs = self._jvm.PythonUtils.toArray(
   1629             [_to_java_column(lit(v)) for v in (args or [])]
   1630         )
-> 1631     return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
   1632 finally:
   1633     if len(kwargs) > 0:

File ~/cluster-env/trident_env/lib/python3.11/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:185, in capture_sql_exception.<locals>.deco(*a, **kw)
    181 converted = convert_exception(e.java_exception)
    182 if not isinstance(converted, UnknownException):
    183     # Hide where the exception came from that shows a non-Pythonic
    184     # JVM exception message.
--> 185     raise converted from None
    186 else:
    187     raise

AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `rwece_dap_silver`.`wind`.

Solutions Tried:

spark.conf.set("spark.sql.caseSensitive", "true")
Use this namespace to refer to tables in your code: "workspace.lakehouse.schema.table"

With this also im getting similar issue:

df = spark.sql("SELECT * FROM RWECEDAP_DataEngineering.rwece_dap_silver.wind.bbii LIMIT 1000")
display(df)

# Output:
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
Cell In[173], line 1
----> 1 df = spark.sql("SELECT * FROM RWECEDAP_DataEngineering.rwece_dap_silver.wind.bbii LIMIT 1000")
      2 display(df)

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py:1631, in SparkSession.sql(self, sqlQuery, args, **kwargs)
   1627         assert self._jvm is not None
   1628         litArgs = self._jvm.PythonUtils.toArray(
   1629             [_to_java_column(lit(v)) for v in (args or [])]
   1630         )
-> 1631     return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
   1632 finally:
   1633     if len(kwargs) > 0:

File ~/cluster-env/trident_env/lib/python3.11/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:185, in capture_sql_exception.<locals>.deco(*a, **kw)
    181 converted = convert_exception(e.java_exception)
    182 if not isinstance(converted, UnknownException):
    183     # Hide where the exception came from that shows a non-Pythonic
    184     # JVM exception message.
--> 185     raise converted from None
    186 else:
    187     raise

AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `RWECEDAP_DataEngineering`.`rwece_dap_silver`.`wind`.

spencer_sa · ‎03-19-2025

Hi,
I'm guessing you've not attached the lakehouse to the notebook. If I don't attach the lakehouse I get exactly that error. In my case the following;
AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `test_schema`.`dim`.

Your options are;

Attach the relevant lakehouse or;
Load the table into a dataframe and use createTempView to then be able to reference the loaded table

source_df = spark.read.format('delta').load('abfss://test@onelake.dfs.fabric.microsoft.com/test_schema.Lakehouse/Tables/dim/user_behavior_dataset')
source_df.createTempView("view_i_created")
df = spark.sql("SELECT * FROM view_i_created LIMIT 1000")
display(df)

Note that just doing a select * here will net you the same original dataframe. The real power of approach 2 is you can connect to tables in different lakehouses dynamically without having to attach them first *and* you can join/merge on different tables using the full power of Spark SQL. That said, I normally prefer to use Python/pySpark myself.

If this helps, please consider Accepting as a Solution to help others find it more easily&nbsp

View solution in original post

spencer_sa · ‎03-20-2025

I believe saveAsTable requires a lakehouse to be attached - same as using Spark SQL. So the following works when you've attached a default lakehouse;

df.write.saveAsTable('dim.user_behavior_dataset2')

using the abfss path (kinda needed if you have no attached lakehouse) requires you to use the full save functionality (or use the createTempView and write to that).
The following works;

df.write.format('delta').mode('overwrite').save('abfss://test@onelake.dfs.fabric.microsoft.com/test_schema.Lakehouse/Tables/dim/user_behavior_dataset3')

spencer_sa · ‎03-19-2025

Hi,
I'm guessing you've not attached the lakehouse to the notebook. If I don't attach the lakehouse I get exactly that error. In my case the following;
AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `test_schema`.`dim`.

Your options are;

Attach the relevant lakehouse or;
Load the table into a dataframe and use createTempView to then be able to reference the loaded table

source_df = spark.read.format('delta').load('abfss://test@onelake.dfs.fabric.microsoft.com/test_schema.Lakehouse/Tables/dim/user_behavior_dataset')
source_df.createTempView("view_i_created")
df = spark.sql("SELECT * FROM view_i_created LIMIT 1000")
display(df)

Note that just doing a select * here will net you the same original dataframe. The real power of approach 2 is you can connect to tables in different lakehouses dynamically without having to attach them first *and* you can join/merge on different tables using the full power of Spark SQL. That said, I normally prefer to use Python/pySpark myself.

If this helps, please consider Accepting as a Solution to help others find it more easily&nbsp

Pushkar_AK · ‎03-19-2025

Thanks @spencer_sa , the solution was helpful , reading table with path worked.

So any other ways of reading, writing the delta table like spark.read.table. , saveAsTable resepectively doesnt work in case in custome schema in a lakehouse?

Spark SQL Queries for Tables in Custom Schema of a Lakehouse (Public Preview)

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025

Party with Power BI’s own Guy in a Cube

Spark SQL Queries for Tables in Custom Schema of a Lakehouse (Public Preview)

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025