Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Pushkar_AK
New Member

Spark SQL Queries for Tables in Custom Schema of a Lakehouse (Public Preview)

Hi Community,

Hopy you all are doing well.

 

Need you help for how to read table in a custom schema using spark SQL.

Below are the details for the error im getting while reading the table:

I have created a lakehouse in my workspace with Lakehouse schemas (Public Preview) enabled.

I have added a new schema in the lakehouse, named wind.

Pushkar_AK_0-1742380470200.png

I have created a table using below pyspark code:

df_write.write.format("delta") \
                .mode("overwrite") \
                .option("mergeSchema", "true") \
             .save('abfss://RWECEDAP_DataEngineering@onelake.dfs.fabric.microsoft.com/rwece_dap_silver.Lakehouse/Tables/wind/bbii')

Please suggest if any better way of doing writting data in the custom schema of a lakehouse.

My read code is follows which through Exception while running(I got this read code by dragging and dropping table into a code cell and get a code snippet that refers to the schema where the table is located)

df = spark.sql("SELECT * FROM rwece_dap_silver.wind.bbii LIMIT 1000")
display(df)

Error im getting while reading the above table:

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
Cell In[167], line 1
----> 1 df = spark.sql("SELECT * FROM rwece_dap_silver.wind.bbii LIMIT 1000")
      2 display(df)

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py:1631, in SparkSession.sql(self, sqlQuery, args, **kwargs)
   1627         assert self._jvm is not None
   1628         litArgs = self._jvm.PythonUtils.toArray(
   1629             [_to_java_column(lit(v)) for v in (args or [])]
   1630         )
-> 1631     return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
   1632 finally:
   1633     if len(kwargs) > 0:

File ~/cluster-env/trident_env/lib/python3.11/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:185, in capture_sql_exception.<locals>.deco(*a, **kw)
    181 converted = convert_exception(e.java_exception)
    182 if not isinstance(converted, UnknownException):
    183     # Hide where the exception came from that shows a non-Pythonic
    184     # JVM exception message.
--> 185     raise converted from None
    186 else:
    187     raise

AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `rwece_dap_silver`.`wind`.

Solutions Tried:

With this also im getting similar issue:

df = spark.sql("SELECT * FROM RWECEDAP_DataEngineering.rwece_dap_silver.wind.bbii LIMIT 1000")
display(df)

# Output:
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
Cell In[173], line 1
----> 1 df = spark.sql("SELECT * FROM RWECEDAP_DataEngineering.rwece_dap_silver.wind.bbii LIMIT 1000")
      2 display(df)

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py:1631, in SparkSession.sql(self, sqlQuery, args, **kwargs)
   1627         assert self._jvm is not None
   1628         litArgs = self._jvm.PythonUtils.toArray(
   1629             [_to_java_column(lit(v)) for v in (args or [])]
   1630         )
-> 1631     return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
   1632 finally:
   1633     if len(kwargs) > 0:

File ~/cluster-env/trident_env/lib/python3.11/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:185, in capture_sql_exception.<locals>.deco(*a, **kw)
    181 converted = convert_exception(e.java_exception)
    182 if not isinstance(converted, UnknownException):
    183     # Hide where the exception came from that shows a non-Pythonic
    184     # JVM exception message.
--> 185     raise converted from None
    186 else:
    187     raise

AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `RWECEDAP_DataEngineering`.`rwece_dap_silver`.`wind`.

 

1 ACCEPTED SOLUTION
spencer_sa
Super User
Super User

Hi,
I'm guessing you've not attached the lakehouse to the notebook.  If I don't attach the lakehouse I get exactly that error.  In my case the following;
AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `test_schema`.`dim`.

 

Your options are;

  • Attach the relevant lakehouse or;
  • Load the table into a dataframe and use createTempView to then be able to reference the loaded table

 

source_df = spark.read.format('delta').load('abfss://test@onelake.dfs.fabric.microsoft.com/test_schema.Lakehouse/Tables/dim/user_behavior_dataset')
source_df.createTempView("view_i_created")
df = spark.sql("SELECT * FROM view_i_created LIMIT 1000")
display(df)

Note that just doing a select * here will net you the same original dataframe. The real power of approach 2 is you can connect to tables in different lakehouses dynamically without having to attach them first *and* you can join/merge on different tables using the full power of Spark SQL. That said, I normally prefer to use Python/pySpark myself.

 

If this helps, please consider Accepting as a Solution to help others find it more easily&nbsp

 

View solution in original post

3 REPLIES 3
spencer_sa
Super User
Super User

I believe saveAsTable requires a lakehouse to be attached - same as using Spark SQL.  So the following works when you've attached a default lakehouse;

df.write.saveAsTable('dim.user_behavior_dataset2')
 

using the abfss path (kinda needed if you have no attached lakehouse) requires you to use the full save functionality (or use the createTempView and write to that).
The following works;

df.write.format('delta').mode('overwrite').save('abfss://test@onelake.dfs.fabric.microsoft.com/test_schema.Lakehouse/Tables/dim/user_behavior_dataset3')



spencer_sa
Super User
Super User

Hi,
I'm guessing you've not attached the lakehouse to the notebook.  If I don't attach the lakehouse I get exactly that error.  In my case the following;
AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `test_schema`.`dim`.

 

Your options are;

  • Attach the relevant lakehouse or;
  • Load the table into a dataframe and use createTempView to then be able to reference the loaded table

 

source_df = spark.read.format('delta').load('abfss://test@onelake.dfs.fabric.microsoft.com/test_schema.Lakehouse/Tables/dim/user_behavior_dataset')
source_df.createTempView("view_i_created")
df = spark.sql("SELECT * FROM view_i_created LIMIT 1000")
display(df)

Note that just doing a select * here will net you the same original dataframe. The real power of approach 2 is you can connect to tables in different lakehouses dynamically without having to attach them first *and* you can join/merge on different tables using the full power of Spark SQL. That said, I normally prefer to use Python/pySpark myself.

 

If this helps, please consider Accepting as a Solution to help others find it more easily&nbsp

 

Thanks @spencer_sa , the solution was helpful , reading table with path worked.

So any other ways of reading, writing the delta table like spark.read.table. , saveAsTable resepectively doesnt work in case in custome schema in a lakehouse?

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.