Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
NEJO
Helper I
Helper I

Unable to apply a Z-Order to a delta table in a lakehouse

Hi readers

 

Fairly new to the Fabric world, but diving in to the deep end with a development of a Datalake. 

 

Allemting to apply some optimisaitons to my tables, through V-ORDER and on the large tables some use of Z-ORDER.  The V-ORDER appears to work fine, but when it comes to applying the Z-ORDER it fails with th below error:

 

IllegalArgumentException: java.io.IOException: Invalid input length 6.

 

The code i'm using to apply this is as follows, but regardless of the field or table I try and apply this to - it always fails to run.

 

 

from delta.tables import DeltaTable

# Specify the table name
table_name = "silver.timecard"

# Create a DeltaTable object
delta_table = DeltaTable.forName(spark, table_name)

# Apply Z-Ordering on the 'WorkDt' column
delta_table.optimize().executeZOrderBy("WorkDt")

 

 

I'd welcome any thoughts to push me in the right direction.

 

Thanks in advance for all your help.

 

Neil

1 ACCEPTED SOLUTION

HI @NEJO,

Ok, I think I can reproduce the issue. It seems like you are working with the scheme Lakehouse and your table are stored in the silver scheme, right? 

After I test with DeltaTable.forName function and find the 'table name' parameter does not support any specific characters.

So I think the currently Lakehouse 'new schema' tables path should not work with DeltaTable functions.(the default dbo schema can directly use table name to invoke) 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

View solution in original post

6 REPLIES 6
v-shex-msft
Community Support
Community Support

Hi @NEJO,

In fact, I copy your code and test with my side, and they can work without other errors.

1.png

Have you checked the delta table has success initialized and load data to the 'Delta Table' object before you applied zorder? Can you please share some more detail information about this issue?

BTW, you can also try to use sql query to apply zorder to your delta table:

 

%%sql
OPTIMIZE deltatable ZORDER BY (ColumnName)

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Thanks @v-shex-msft 

 

I've also tried this option and I'm getting the same error message being returned.  I have also attempted the same across other tables, but with the same results.

 

Some more info supplied with the error:

 

java.io.IOException: Invalid input length 6
com.google.common.io.BaseEncoding.decode(BaseEncoding.java:237)
com.microsoft.azure.trident.core.TridentHelper.decodeMultipartName(TridentHelper.java:485)
com.microsoft.fabric.spark.metadata.NamespaceResolver.inferNamespace(pathResolvers.scala:99)
com.microsoft.fabric.spark.metadata.NamespaceResolver.$anonfun$toNamespace$1(pathResolvers.scala:95)
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
com.microsoft.fabric.spark.metadata.NamespaceResolver.toNamespace(pathResolvers.scala:95)
com.microsoft.fabric.spark.metadata.DefaultSchemaMetadataManager.getSchema(DefaultSchemaMetadataManager.scala:83)
com.microsoft.fabric.spark.metadata.MetadataManager.getSchema(MetadataManager.scala:195)
com.microsoft.fabric.spark.metadata.InstrumentedMetadataManager.super$getSchema(MetadataManager.scala:325)
com.microsoft.fabric.spark.metadata.InstrumentedMetadataManager.$anonfun$getSchema$1(MetadataManager.scala:325)
com.microsoft.fabric.spark.metadata.Helpers$.timed(Helpers.scala:83)
com.microsoft.fabric.spark.metadata.InstrumentedMetadataManager.getSchema(MetadataManager.scala:325)
com.microsoft.fabric.spark.catalog.OnelakeExternalCatalog.getDatabase(OnelakeExternalCatalog.scala:85)
com.microsoft.fabric.spark.catalog.OnelakeExternalCatalog.databaseExists(OnelakeExternalCatalog.scala:91)
com.microsoft.fabric.spark.catalog.InstrumentedExternalCatalog.$anonfun$databaseExists$1(OnelakeExternalCatalog.scala:424)
scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
com.microsoft.fabric.spark.metadata.Helpers$.timed(Helpers.scala:83)
com.microsoft.fabric.spark.catalog.InstrumentedExternalCatalog.databaseExists(OnelakeExternalCatalog.scala:424)
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:69)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:363)
org.apache.spark.sql.delta.DeltaTableUtils$.isDeltaTable(DeltaTable.scala:94)
org.apache.spark.sql.delta.DeltaTableIdentifier$.$anonfun$apply$1(DeltaTableIdentifier.scala:115)
org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141)
org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139)
org.apache.spark.sql.delta.DeltaTableIdentifier$.recordFrameProfile(DeltaTableIdentifier.scala:80)
org.apache.spark.sql.delta.DeltaTableIdentifier$.apply(DeltaTableIdentifier.scala:113)
org.apache.spark.sql.delta.commands.DeltaCommand.getDeltaLog(DeltaCommand.scala:247)
org.apache.spark.sql.delta.commands.DeltaCommand.getDeltaLog$(DeltaCommand.scala:234)
org.apache.spark.sql.delta.commands.OptimizeTableCommandBase.getDeltaLog(OptimizeTableCommand.scala:47)
org.apache.spark.sql.delta.commands.OptimizeTableCommand.run(OptimizeTableCommand.scala:122)
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:152)
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:125)
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:214)
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:100)
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:67)
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:152)
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:145)
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:145)
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:129)
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:123)
org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:640)
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:671)
org.apache.livy.repl.SQLInterpreter.execute(SQLInterpreter.scala:163)
org.apache.livy.repl.Session.$anonfun$executeCode$1(Session.scala:877)
scala.Option.map(Option.scala:230)
org.apache.livy.repl.Session.executeCode(Session.scala:874)
org.apache.livy.repl.Session.$anonfun$execute$10(Session.scala:569)
org.apache.livy.repl.Session.withRealtimeOutputSupport(Session.scala:1103)
org.apache.livy.repl.Session.$anonfun$execute$3(Session.scala:569)
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
scala.util.Success.$anonfun$map$1(Try.scala:255)
scala.util.Success.map(Try.scala:213)
scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base/java.lang.Thread.run(Thread.java:829)

Thanks

Neil

HI @NEJO,

Ok, I think I can reproduce the issue. It seems like you are working with the scheme Lakehouse and your table are stored in the silver scheme, right? 

After I test with DeltaTable.forName function and find the 'table name' parameter does not support any specific characters.

So I think the currently Lakehouse 'new schema' tables path should not work with DeltaTable functions.(the default dbo schema can directly use table name to invoke) 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Hi @NEJO ,

Any update on this? Did the above suggestions help with your scenario? if that is the case, you can consider Kudo the helpful suggestions to help others who faced similar requirements.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Hi @v-shex-msft 

 

Many thanks for the responses, and appologies for the delayed response.

 

You are correct - and it seems my use of the 'preview' schema option for Lakehouses has caused this to fail.  Back to the drawing board - and will split these 2 lakehouses and manage them individually (which I had done in the first place - before I noticed this option).

 

Many thanks again for your time.

 

Neil

I had same issue for Merge. Switch to DeltaTable.forPath(spark,'Tables/silver/timecard'). This approach works fine. 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.