Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
BryanCarmichael
Advocate I
Advocate I

Lakehouse Add or Remove columns from table

Not sure if this is the right forum or not - but here is the issue.

 

We are loading data to a Lakehouse using gen 2 data flows (for now they are just pointing at exisitn gen 1 dataflows then doing the lakehouse insert - we will recify this later on).

 

Over time it is typical for columns to be added, removed and / or updated in a dataflow  - with a datamart these changes are reflected automatically in the schema - however with a laehouse when adding a new column to the dtaflow i can see no way to bring that into the lakehouse.

 

What do i need to do here - only options i can see are
1: import it as a new table but that seems to be very clunky as you would need to update queoroes / stored procedures on your sql end point to cater for this
2: Delete exisitng table in lakehouse and then add a new one with the same name.

 

Am i missing something ?

10 REPLIES 10
frithjof_v
Continued Contributor
Continued Contributor

It seems to be possible to add columns in Lakehouse table now by using notebook.

 

I am able to use the following type of command in a Notebook:

 

%%sql

ALTER TABLE tableName

ADD COLUMN columnName dataType

 

And the table will get updated also in SQL Analytics Endpoint and Direct Lake Semantic Model, something which was a problem before.

 

Ref. this thread:

https://community.fabric.microsoft.com/t5/General-Discussion/SQL-ALTER-command/m-p/3748079#M4861

 

However, I get an error if I try to rename or remove (drop) a column.

Maybe this is a solution for renaming columns, dropping columns and changing column type in Lakehouse tables:

https://community.fabric.microsoft.com/t5/General-Discussion/Dropping-and-recreating-lakehouse-table...

 

funtomas
Frequent Visitor

There is 3rd option which worked for me:

 

3.Rename original table. For exmplate rename "Table" to "Table1"

 

Then go to your Dataflow and setup destination of your Dataflow again. (Of course use create new table called "Table".)

 

funtomas_1-1699027621980.png

 

 

BryanCarmichael
Advocate I
Advocate I

So where we ended up with this is moving to using a warehouse for almost everything - and creating our own tables in it.

This allows you to alter them and create primary keys (not enforced) .
Lakehouse is good for super unstructured data but if you have structured data then a warehouse is a much better option.

asittrivedi
Regular Visitor

I have used a Python notebook to add a column to an existing table and that works just fine. One can use spark dataframe or pyspark.pandas dataframe to get the desired outcome. 

Would really appreciate any additional insight or links to resources that could be provided on this subject.

could you share how to do this in python?

import pyspark.pandas as ps
import pandas as pd
import numpy as np
from pyspark.sql import *
 
psdf = ps.read_delta('path_to_table')
psdf.head(10)
 
psdf['new_col'] = ''
psdf.head(10)
 
sdf = psdf.to_spark()
 
sdf.write.mode('overwrite').saveAsTable('existing_table')
 
Once can use a spark dataframe in lieu of pandas.
 
How ever the easiest option now is to use a SQL notebook and add a column. Please refer
marcuspaivio
New Member

I was wondering the same. 🙂

Helpful resources

Announcements
Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors