Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
frithjof_v
Continued Contributor
Continued Contributor

Dropping and recreating lakehouse table

Hi,

 

I have read, when needing to rename or remove columns in a Lakehouse table, that a suggested solution is to drop the table and create a new table with the new table schema.

 

I am wondering what will be the consequences of dropping and recreating a Lakehouse table?

 

  • If the table is part of a relationship in a direct lake semantic model, or
  • If the table is the target of a OneLake shortcut

 

Will the relationship in the direct lake semantic model be broken, after the table has been dropped and recreated?

 

Will the shortcut be broken, after the table has been dropped and recreated?

 

Anyone have some information or documentation about this?

 

(Provided the table name will be exactly the same after recreating the table, but some column names and the number of columns may have changed. Let's assume the relationship key column still has the same name and type.)

 

Thank you! 😀

11 REPLIES 11
frithjof_v
Continued Contributor
Continued Contributor

Thank you @v-gchenna-msft,

 

I tried using ALTER TABLE command in Spark notebook.

 

ADD COLUMN worked for me, but RENAME COLUMN or DROP COLUMN did not work for me.

 

Basically I get an error message telling me I need to enable column mapping.

If I enable column mapping then I am able to rename or drop columns in the Notebook interface, however the table doesn't work together with the SQL Analytics Endpoint and Direct Lake Semantic Model after enabling column mapping.

 

 

You can read about it in my most recent comment here:

 

https://community.fabric.microsoft.com/t5/General-Discussion/SQL-ALTER-command/m-p/3748079

 

Thank you 😀

Hi @frithjof_v ,

Why don't you try using pyspark features -

vgchennamsft_0-1712298589218.png

 

 

 

df = df.drop("new_columns2")

df = df.withColumnRenamed("new_column1", "updated_column1")

display(df)

 

 

Docs to refer -
pyspark.sql.DataFrame.drop — PySpark master documentation (apache.org) 

Instead of deleting and creating newlakehouse table. 
Please let me know if this works or still creates any issues or limitations.

@v-gchenna-msft thank you!

 

Is this a good approach (see below)?
(I am also thinking about performance on large tables)
Are there any better approaches?

 

Drop column:

 

spark.read.table("tableName")\
   .drop("columName")\
   .write\
   .mode("overwrite")\
   .option("overwriteSchema", "true")\
   .saveAsTable("tableName")

 

 

Change column name:

 

spark.read.table("tableName")\
   .withColumnRenamed("oldColumnName", "newColumnName")\
   .write\
   .mode("overwrite")\
   .option("overwriteSchema", "true")\
   .saveAsTable("tableName")

 

 

Change column type:

 

spark.read.table("tableName")\
   .withColumn("columnName", col("columnName").cast("columnType"))\
   .write\
   .mode("overwrite")\
   .option("overwriteSchema", "true")\
   .saveAsTable("tableName")

 

 

Add column (see also my next comment in this thread):

 

spark.read.table("tableName")\
   .withColumn("columnName", lit(None).cast("columnType"))\
   .write\
   .mode("overwrite")\
   .option("overwriteSchema", "true")\
   .saveAsTable("tableName")

 

 

I am new to PySpark, but this code seems to work for the Lakehouse tables, and the tables also seem to continue working with the SQL Analytics Endpoint and Direct Lake Semantic Model 😀
(although dropping columns, renaming columns or changing column types may of course cause calculations or relationships in the semantic model to break, but that is only if the altered column is directly involved in the calculation or relationship).

 

I think this might work 😀

 

In addition to your reply, I was inspired by this article: https://learn.microsoft.com/en-us/azure/databricks/delta/update-schema#--explicitly-update-schema-to...


However I am interested to learn more about the performance implications of this method (is this method basically overwriting all the data in the table)?

  • I think for adding columns, this code might be more efficient because it doesn't require overwriting the table data?
%%sql
ALTER TABLE tableName 
ADD COLUMN columnName columnType; 

 

 

Thank you @v-gchenna-msft,

 

I want to use the data in a Direct Lake Semantic Model for Power BI reporting.

 

To be able to use the data in a Direct Lake Semantic Model, the data needs to be in Lakehouse tables as far as I know.

 

So I am looking for a method to update table schema of a Lakehouse table, which will not break this table's functionality, relationships etc. in the SQL Analytics Endpoint and Direct Lake Semantic Model.

 

When working with import mode power bi semantic models in Power BI desktop, it is normal to be able to rename columns or remove columns from tables in the semantic model, without having to delete the table and create a new table. I am looking for the same functionality to remove or rename columns with direct lake semantic model.

 

For now, only adding columns seems to work.

 

Thank you 😀

Hi @frithjof_v ,

See the fabric_cat_tools library here. It has functions for adding data columns (to the semantic model) and removing columns as well.

GitHub - m-kovalsky/fabric_cat_tools: Documentation for the fabric_cat_tools library

vgchennamsft_0-1712307547780.png


Hope this is helpful. Please let me know incase if you have further queries.

 

Hi @frithjof_v ,

We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .

Hi @frithjof_v ,

We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .

Thank you @v-gchenna-msft,

 

I tried the fabric_cat_tools and I enjoyed testing these functions. It's really interesting and it obviously has many use cases. Thank you for pointing me towards this.

 

However it didn't quite solve my wish for removing columns, with my objective being to work with Lakehouse and Direct Lake Semantic Model in a coherent way.  

 

Here are my findings:

 

I tried using the remove_column function, and it does remove the column from the custom direct lake semantic model, however it doesn't delete it from the Lakehouse.

 

Also, if I refresh the objects of the semantic model (see screenshot below), then the removed column reappears.

 

frithjof_v_0-1712860000550.png

 

 

Furthermore, the remove_column function doesn't seem to work with the Lakehouse's default semantic model. I got this error message:

frithjof_v_1-1712860531113.png

 


I would prefer a solution for removing columns in the Lakehouse, which also works together with the SQL Analytics Endpoint and Direct Lake Semantic Model.
E.g. %%sql ALTER TABLE tableName DROP COLUMN columnName;
or a user interface way of dropping columns from a Lakehouse table.
That will be a holistic solution in my opinion, for working with Lakehouse in conjunction with Direct Lake Semantic Model.

I created an idea here, please vote if anyone else needs this: https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=746601c5-6ba9-ee11-92bd-6045bdb0416e

 

Thank you 😀

v-gchenna-msft
Community Support
Community Support

Hi  @frithjof_v ,

Thanks for using Fabric Community.You're correct that dropping and recreating a Lakehouse table can be a one of the solution for renaming or removing columns, but it's important to be aware of the consequences, especially in the context of MS Fabric. Here's a breakdown of the impacts:

Consequences of Dropping and Recreating a Lakehouse Table:

  •  Relationships in Direct Lake Semantic Model: Yes, dropping the table will break any existing relationships in your Direct Lake semantic model. This is because the model relies on the table schema, and changing the schema (even with the same name) breaks the connection. You'll need to re-establish the relationships after recreating the table.
  •  OneLake Shortcuts: Similar to semantic models, OneLake shortcuts will also be broken after dropping the table. The shortcut points to a specific schema, and recreating the table with a different schema will render the shortcut unusable. You'll need to recreate the shortcut pointing to the new table.

Alternative Approaches:

While dropping and recreating might seem like a straightforward solution, it can be disruptive. Here are some alternative approaches to consider:

  •  Alter Table: Depending on your specific needs, you might be able to use the ALTER TABLE statement to modify the existing table schema. This could involve renaming columns, adding new columns, or dropping existing ones (with some limitations). This approach avoids breaking relationships and shortcuts.

    Doc to refer - ALTER TABLE - Spark 3.0.0-preview Documentation (apache.org)

    We can use Alter commands in Spark SQL inorder to change the schema of table present in Lakehouse.

    vgchennamsft_3-1712296074224.png

Remember: Before making any changes, it's crucial to back up your data and thoroughly test your approach in a non-production environment.


Hope this is helpful. Please let me know incase of further queries.

I did some testing and here are my observations so far:

 

I created a Lakehouse A and created some tables in Lakehouse A by using Dataflows Gen2. I also created another Lakehouse B, and in Lakehouse B I created shortcuts to all the tables in Lakehouse A. 

 

I deleted a table from Lakehouse A by clicking the ellipsis for the table and then clicking Delete. The table disappeared in Lakehouse A, and the shortcut table also disappeared in Lakehouse B.


Then I recreated the table in Lakehouse A by using Dataflow Gen2. The new table had a slightly changed schema: I had removed some columns and I had added some new columns compared to the original table. The table name was identical.

 

I observed that the shortcut table had reappeared in Lakehouse B. So in my case the shortcut in Lakehouse B is still working, after I had dropped and recreated the target table in Lakehouse A with a slightly altered schema.

 

 

Regarding the semantic model:

After dropping and recreating the table, the table was not in the default direct lake semantic model anymore. I needed to manually add the table to the default direct lake semantic model again after I had recreated the table.

 

For the custom direct lake semantic model, the table was actually still in the semantic model after dropping and recreating the table.
However the table now had an orange warning triangle, and I had to refresh the semantic model objects in order for the orange warning triangle to disappear.

 

 

Is this the expected behaviour when dropping and recreating a lakehouse table with an altered table schema (added/removed columns)?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors