The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Not sure if this is the right forum or not - but here is the issue.
We are loading data to a Lakehouse using gen 2 data flows (for now they are just pointing at exisitn gen 1 dataflows then doing the lakehouse insert - we will recify this later on).
Over time it is typical for columns to be added, removed and / or updated in a dataflow - with a datamart these changes are reflected automatically in the schema - however with a laehouse when adding a new column to the dtaflow i can see no way to bring that into the lakehouse.
What do i need to do here - only options i can see are
1: import it as a new table but that seems to be very clunky as you would need to update queoroes / stored procedures on your sql end point to cater for this
2: Delete exisitng table in lakehouse and then add a new one with the same name.
Am i missing something ?
It seems to be possible to add columns in Lakehouse table now by using notebook.
I am able to use the following type of command in a Notebook:
%%sql
ALTER TABLE tableName
ADD COLUMN columnName dataType
And the table will get updated also in SQL Analytics Endpoint and Direct Lake Semantic Model, something which was a problem before.
Ref. this thread:
https://community.fabric.microsoft.com/t5/General-Discussion/SQL-ALTER-command/m-p/3748079#M4861
However, I get an error if I try to rename or remove (drop) a column.
Maybe this is a solution for renaming columns, dropping columns and changing column type in Lakehouse tables:
https://community.fabric.microsoft.com/t5/General-Discussion/Dropping-and-recreating-lakehouse-table...
There is 3rd option which worked for me:
3.Rename original table. For exmplate rename "Table" to "Table1"
Then go to your Dataflow and setup destination of your Dataflow again. (Of course use create new table called "Table".)
So where we ended up with this is moving to using a warehouse for almost everything - and creating our own tables in it.
This allows you to alter them and create primary keys (not enforced) .
Lakehouse is good for super unstructured data but if you have structured data then a warehouse is a much better option.
I have used a Python notebook to add a column to an existing table and that works just fine. One can use spark dataframe or pyspark.pandas dataframe to get the desired outcome.
Would really appreciate any additional insight or links to resources that could be provided on this subject.
could you share how to do this in python?
I was wondering the same. 🙂
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Fabric update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
4 | |
1 | |
1 | |
1 | |
1 |