Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
During development & testing, have to regularly delete lakehouse tables through Explorer side panel and try to recreate it using python code, but it often comes with an error as there are some leftover files in the filesystem for Table folder (probably a log folder).
Is there a better way to create lakehouse tables in Python notebook that doesn't cause an error when trying to recreate table with same name?
write_deltalake(
delta_table_path,
df,
mode="overwrite",
schema_mode="overwrite",
engine="rust",
storage_options={"allow_unsafe_rename": "true"}
)
---------------------------------------------------------------------------
DeltaError Traceback (most recent call last)
Cell In[40], line 12
7 df = df.apply(lambda col: col if col.name in ["revision","_ETL_Time"]
8 else col.apply(lambda x: json.dumps(x)
9 if not isinstance(x, str) else x))
10 #df.info()
---> 12 write_deltalake(
13 delta_table_path,
14 df,
15 mode="overwrite",
16 schema_mode="overwrite",
17 engine="rust",
18 storage_options={"allow_unsafe_rename": "true"}
19 )
21 display(df)
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/deltalake/writer.py:304, in write_deltalake(table_or_uri, data, schema, partition_by, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, schema_mode, storage_options, partition_filters, predicate, large_dtypes, engine, writer_properties, custom_metadata)
301 return
303 data = RecordBatchReader.from_batches(schema, (batch for batch in data))
--> 304 write_deltalake_rust(
305 table_uri=table_uri,
306 data=data,
307 partition_by=partition_by,
308 mode=mode,
309 table=table._table if table is not None else None,
310 schema_mode=schema_mode,
311 predicate=predicate,
312 name=name,
313 description=description,
314 configuration=configuration,
315 storage_options=storage_options,
316 writer_properties=(
317 writer_properties._to_dict() if writer_properties else None
318 ),
319 custom_metadata=custom_metadata,
320 )
321 if table:
322 table.update_incremental()
DeltaError: Delta transaction failed, version 0 already exists.
Hi @MangoMagic ,
I hope the information provided is helpful.I wanted to check whether you were able to resolve the issue with the provided solutions.Please let us know if you need any further assistance.
Thank you.
Hi @MangoMagic ,
Thank you for reaching out to Microsoft Fabric Community.
Thank you @BhaveshPatel @Element115 for the prompt response.
May I ask if you have resolved this issue? Please let us know if you have any further issues, we are happy to help.
Thank you.
Let's first programmatically delete the table(s) properly, and thus let the underlying system perform all the necessary cleanup.
Single table
PySpark
spark.sql("DROP TABLE IF EXISTS schema_name.table_name")
If multiple tables in a given schema:
PySpark
schema = "your_schema_name"
# List all tables in schema
tables = spark.catalog.listTables(schema)
# Drop each table
for table in tables:
table_name = table.name
print(f"Dropping table {schema}.{table_name}")
spark.sql(f"DROP TABLE IF EXISTS {schema}.{table_name}")
Combining these, let's look at a full PySpark solution:
schema = "your_schema_name"
# Step 1: List all tables in the schema
tables = spark.catalog.listTables(schema)
# Step 2: Drop all tables
for t in tables:
print(f"Dropping table {schema}.{t.name}")
spark.sql(f"DROP TABLE IF EXISTS {schema}.{t.name}")
# Step 3: Recreate all tables with a sample schema
# WARNING: Replace sample_schema with actual schemas if you want original structures back
sample_schema = "id INT, name STRING"
for t in tables:
table_full_name = f"{schema}.{t.name}"
print(f"Recreating table {table_full_name}")
spark.sql(f"""
CREATE TABLE {table_full_name} ({sample_schema})
USING DELTA
LOCATION '/mnt/datalake/{schema}/{t.name}'
""")
If you still get complains about the table names, this means the lakehouse server hasn't update its state yet.
Hence, what I would do:
0_create pipeline
1_create a Notebook activity to run the delete Pyspark code above
2_use the Web activity with the new lakehouse REST API to force the lakehouse to sync (instead of a Wait activity because we can't know how long it's gonna take)
3_on success, next activity is a Notebook recreating the tables (somehow, you would need to save the table names somewhere, maybe another lakehouse dedicated only to store metadata)
I had the same issue with lakehouse tables on which write I/O wasn't finished by the time the next activity is ready to run. that was before the new REST API, so at the time I used a PySpark script to force the lakehouse to refresh and sync all metadata. and it worked.
Haven't used the Web activity, so just a suggestion, and to be tested.
Hi @MangoMagic
You should delete the single table through notebook by right clicking Delta Table and Delete the table.
Delete Single Delta Table at a time from Explorer Side Panel. There is no alternative way.
Alternatively, You can drop and recreate the delta table
DROP TABLE IF EXISTS DimTable
CREATE TABLE IF NOT EXISTS DimTable
USING DELTA
This is managed delta table ( Table with metadata )
Hi @MangoMagic ,
Thank you for reaching out to Microsoft Fabric Community.
Thank you @lbendlin for the prompt response.
I wanted to check if you had the opportunity to review the information provided and resolve the issue..?Please let us know if you need any further assistance.We are happy to help.
Thank you.
in your notebook code include a "remove if exists" step that clears up the remaining artifacts before you attempt to create the new artifact