Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I was testing the lakehouse with schema feature recently. After creating the lakehouse and schema, I would like to create an empty delta table using the DeltaTableBuilder API. Here are the codes that I have executed in my notebook:
from delta.tables import DeltaTable
from pyspark.sql.types import *
DeltaTable.createIfNotExists(spark) \
.tableName("test.table1") \
.addColumn("col1", LongType()) \
.addColumn("col2", StringType()) \
.addColumn("col3", StringType()) \
.execute()
The delta table was created successfully but there are some errors from the DeltaTable:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
Cell In[17], line 9
1 from delta.tables import DeltaTable
2 from pyspark.sql.types import StructType, StructField, StringType, ArrayType, LongType, IntegerType, BooleanType
4 DeltaTable.createIfNotExists(spark) \
5 .tableName("test.table1") \
6 .addColumn("col1", LongType()) \
7 .addColumn("col2", StringType()) \
8 .addColumn("col3", StringType()) \
----> 9 .execute()
File /usr/hdp/current/spark3-client/jars/delta-core_2.12-2.4.0.14.jar/delta/tables.py:1330, in DeltaTableBuilder.execute(self)
1321 @since(1.0) # type: ignore[arg-type]
1322 def execute(self) -> DeltaTable:
1323 """
1324 Execute Table Creation.
1325
(...)
1328 .. note:: Evolving
1329 """
-> 1330 jdt = self._jbuilder.execute()
1331 return DeltaTable(self._spark, jdt)
File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)
167 def deco(*a: Any, **kw: Any) -> Any:
168 try:
--> 169 return f(*a, **kw)
170 except Py4JJavaError as e:
171 converted = convert_exception(e.java_exception)
File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
Py4JJavaError: An error occurred while calling o4870.execute.
: java.lang.AssertionError: assertion failed: Invalid namespace �
at scala.Predef$.assert(Predef.scala:223)
at com.microsoft.fabric.spark.metadata.NamespaceResolver.inferNamespace(pathResolvers.scala:83)
at com.microsoft.fabric.spark.metadata.NamespaceResolver.$anonfun$toNamespace$1(pathResolvers.scala:78)
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
at com.microsoft.fabric.spark.metadata.NamespaceResolver.toNamespace(pathResolvers.scala:78)
at com.microsoft.fabric.spark.metadata.DefaultSchemaMetadataManager.getSchema(DefaultSchemaMetadataManager.scala:72)
at com.microsoft.fabric.spark.metadata.MetadataManager.getSchema(MetadataManager.scala:187)
at com.microsoft.fabric.spark.metadata.InstrumentedMetadataManager.super$getSchema(MetadataManager.scala:314)
at com.microsoft.fabric.spark.metadata.InstrumentedMetadataManager.$anonfun$getSchema$1(MetadataManager.scala:314)
at com.microsoft.fabric.spark.metadata.Helpers$.timed(Helpers.scala:29)
at com.microsoft.fabric.spark.metadata.InstrumentedMetadataManager.getSchema(MetadataManager.scala:314)
at com.microsoft.fabric.spark.catalog.OnelakeExternalCatalog.getDatabase(OnelakeExternalCatalog.scala:78)
at com.microsoft.fabric.spark.catalog.OnelakeExternalCatalog.databaseExists(OnelakeExternalCatalog.scala:84)
at com.microsoft.fabric.spark.catalog.InstrumentedExternalCatalog.$anonfun$databaseExists$1(OnelakeExternalCatalog.scala:417)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at com.microsoft.fabric.spark.metadata.Helpers$.timed(Helpers.scala:29)
at com.microsoft.fabric.spark.catalog.InstrumentedExternalCatalog.databaseExists(OnelakeExternalCatalog.scala:417)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:69)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:363)
at org.apache.spark.sql.delta.DeltaTableUtils$.isDeltaTable(DeltaTable.scala:94)
at io.delta.tables.DeltaTable$.forName(DeltaTable.scala:830)
at io.delta.tables.DeltaTableBuilder.execute(DeltaTableBuilder.scala:366)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Is it safe to ignore this error?
Hi @HenryC,
Have you setting and pin the default Lakehouse before you execute these codes? AFAIK, some of common libraries do not reference and initialization to the notebook environment if you not setting the default Lakehouse.
BTW, I also test with your code and it execute failed. After I doublecheck and change the created table names and it works well. (It seems not support to use 'dot' character in the table names)
from delta.tables import DeltaTable
from pyspark.sql.types import *
DeltaTable.createIfNotExists(spark) \
.tableName("test_table1") \
.addColumn("col1", LongType()) \
.addColumn("col2", StringType()) \
.addColumn("col3", StringType()) \
.execute()
Regards,
Xiaoxin Sheng
Hi @Anonymous , Thank you for your reply.
I was testing out the lakehouse schema feature that released in July 2024. I need to specify the schema name "test" in the codes, otherwise the table will go to the default dbo schema. I probably think that it is a bug from the delta.tables libraries.
Henry
Hi @HenryC,
OK, I reproduce this on my side.
The notebook code processing has break (the following steps and code not executed), and the delta table seems not create correctly, so I think you can't ignore this error.
It seems like the 'DeltaTable.createIfNotExists' function does not work with the ‘schema.tablename’ in the Lakehouse with public preview 'lakehouse schema' feature.
You can also take a look the lakehouse schema limitations if helps:
Lakehouse schemas (Preview) - Microsoft Fabric | Microsoft Learn
Regards,
Xiaoxin Sheng