Re: Error when creating table using DeltaTableBuil...

HenryC · ‎07-29-2024

I was testing the lakehouse with schema feature recently. After creating the lakehouse and schema, I would like to create an empty delta table using the DeltaTableBuilder API. Here are the codes that I have executed in my notebook:

from delta.tables import DeltaTable
from pyspark.sql.types import *

DeltaTable.createIfNotExists(spark) \
    .tableName("test.table1") \
    .addColumn("col1", LongType()) \
    .addColumn("col2", StringType()) \
    .addColumn("col3", StringType()) \
    .execute()

The delta table was created successfully but there are some errors from the DeltaTable:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
Cell In[17], line 9
      1 from delta.tables import DeltaTable
      2 from pyspark.sql.types import StructType, StructField, StringType, ArrayType, LongType, IntegerType, BooleanType
      4 DeltaTable.createIfNotExists(spark) \
      5     .tableName("test.table1") \
      6     .addColumn("col1", LongType()) \
      7     .addColumn("col2", StringType()) \
      8     .addColumn("col3", StringType()) \
----> 9     .execute()

File /usr/hdp/current/spark3-client/jars/delta-core_2.12-2.4.0.14.jar/delta/tables.py:1330, in DeltaTableBuilder.execute(self)
   1321 @since(1.0)  # type: ignore[arg-type]
   1322 def execute(self) -> DeltaTable:
   1323     """
   1324     Execute Table Creation.
   1325 
   (...)
   1328     .. note:: Evolving
   1329     """
-> 1330     jdt = self._jbuilder.execute()
   1331     return DeltaTable(self._spark, jdt)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)
    167 def deco(*a: Any, **kw: Any) -> Any:
    168     try:
--> 169         return f(*a, **kw)
    170     except Py4JJavaError as e:
    171         converted = convert_exception(e.java_exception)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o4870.execute.
: java.lang.AssertionError: assertion failed: Invalid namespace �
	at scala.Predef$.assert(Predef.scala:223)
	at com.microsoft.fabric.spark.metadata.NamespaceResolver.inferNamespace(pathResolvers.scala:83)
	at com.microsoft.fabric.spark.metadata.NamespaceResolver.$anonfun$toNamespace$1(pathResolvers.scala:78)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at com.microsoft.fabric.spark.metadata.NamespaceResolver.toNamespace(pathResolvers.scala:78)
	at com.microsoft.fabric.spark.metadata.DefaultSchemaMetadataManager.getSchema(DefaultSchemaMetadataManager.scala:72)
	at com.microsoft.fabric.spark.metadata.MetadataManager.getSchema(MetadataManager.scala:187)
	at com.microsoft.fabric.spark.metadata.InstrumentedMetadataManager.super$getSchema(MetadataManager.scala:314)
	at com.microsoft.fabric.spark.metadata.InstrumentedMetadataManager.$anonfun$getSchema$1(MetadataManager.scala:314)
	at com.microsoft.fabric.spark.metadata.Helpers$.timed(Helpers.scala:29)
	at com.microsoft.fabric.spark.metadata.InstrumentedMetadataManager.getSchema(MetadataManager.scala:314)
	at com.microsoft.fabric.spark.catalog.OnelakeExternalCatalog.getDatabase(OnelakeExternalCatalog.scala:78)
	at com.microsoft.fabric.spark.catalog.OnelakeExternalCatalog.databaseExists(OnelakeExternalCatalog.scala:84)
	at com.microsoft.fabric.spark.catalog.InstrumentedExternalCatalog.$anonfun$databaseExists$1(OnelakeExternalCatalog.scala:417)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at com.microsoft.fabric.spark.metadata.Helpers$.timed(Helpers.scala:29)
	at com.microsoft.fabric.spark.catalog.InstrumentedExternalCatalog.databaseExists(OnelakeExternalCatalog.scala:417)
	at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:69)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:363)
	at org.apache.spark.sql.delta.DeltaTableUtils$.isDeltaTable(DeltaTable.scala:94)
	at io.delta.tables.DeltaTable$.forName(DeltaTable.scala:830)
	at io.delta.tables.DeltaTableBuilder.execute(DeltaTableBuilder.scala:366)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:829)

Is it safe to ignore this error?

Anonymous · ‎07-29-2024

Hi @HenryC,

Have you setting and pin the default Lakehouse before you execute these codes? AFAIK, some of common libraries do not reference and initialization to the notebook environment if you not setting the default Lakehouse.

BTW, I also test with your code and it execute failed. After I doublecheck and change the created table names and it works well. (It seems not support to use 'dot' character in the table names)

from delta.tables import DeltaTable
from pyspark.sql.types import *

DeltaTable.createIfNotExists(spark) \
    .tableName("test_table1") \
    .addColumn("col1", LongType()) \
    .addColumn("col2", StringType()) \
    .addColumn("col3", StringType()) \
    .execute()

Regards,

Xiaoxin Sheng

HenryC · ‎07-30-2024

Hi @Anonymous , Thank you for your reply.

I was testing out the lakehouse schema feature that released in July 2024. I need to specify the schema name "test" in the codes, otherwise the table will go to the default dbo schema. I probably think that it is a bug from the delta.tables libraries.

Henry

Anonymous · ‎07-31-2024

Hi @HenryC,

OK, I reproduce this on my side.

The notebook code processing has break (the following steps and code not executed), and the delta table seems not create correctly, so I think you can't ignore this error.

It seems like the 'DeltaTable.createIfNotExists' function does not work with the ‘schema.tablename’ in the Lakehouse with public preview 'lakehouse schema' feature.

You can also take a look the lakehouse schema limitations if helps:

Lakehouse schemas (Preview) - Microsoft Fabric | Microsoft Learn

Regards,

Xiaoxin Sheng