March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Not sure if this is the right forum or not - but here is the issue.
We are loading data to a Lakehouse using gen 2 data flows (for now they are just pointing at exisitn gen 1 dataflows then doing the lakehouse insert - we will recify this later on).
Over time it is typical for columns to be added, removed and / or updated in a dataflow - with a datamart these changes are reflected automatically in the schema - however with a laehouse when adding a new column to the dtaflow i can see no way to bring that into the lakehouse.
What do i need to do here - only options i can see are
1: import it as a new table but that seems to be very clunky as you would need to update queoroes / stored procedures on your sql end point to cater for this
2: Delete exisitng table in lakehouse and then add a new one with the same name.
Am i missing something ?
It seems to be possible to add columns in Lakehouse table now by using notebook.
I am able to use the following type of command in a Notebook:
%%sql
ALTER TABLE tableName
ADD COLUMN columnName dataType
And the table will get updated also in SQL Analytics Endpoint and Direct Lake Semantic Model, something which was a problem before.
Ref. this thread:
https://community.fabric.microsoft.com/t5/General-Discussion/SQL-ALTER-command/m-p/3748079#M4861
However, I get an error if I try to rename or remove (drop) a column.
Maybe this is a solution for renaming columns, dropping columns and changing column type in Lakehouse tables:
https://community.fabric.microsoft.com/t5/General-Discussion/Dropping-and-recreating-lakehouse-table...
There is 3rd option which worked for me:
3.Rename original table. For exmplate rename "Table" to "Table1"
Then go to your Dataflow and setup destination of your Dataflow again. (Of course use create new table called "Table".)
So where we ended up with this is moving to using a warehouse for almost everything - and creating our own tables in it.
This allows you to alter them and create primary keys (not enforced) .
Lakehouse is good for super unstructured data but if you have structured data then a warehouse is a much better option.
I have used a Python notebook to add a column to an existing table and that works just fine. One can use spark dataframe or pyspark.pandas dataframe to get the desired outcome.
Would really appreciate any additional insight or links to resources that could be provided on this subject.
could you share how to do this in python?
I was wondering the same. 🙂
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
User | Count |
---|---|
6 | |
3 | |
2 | |
2 | |
1 |
User | Count |
---|---|
13 | |
9 | |
5 | |
4 | |
4 |