The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event! Join us in Stockholm, Sweden from September 24-27, 2024.
2-for-1 sale on June 20 only!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hej,
How do you actually truncate data in Lakehouse Tables the best way?
I have used the below sql coding in Notebook, but it takes 1 minute to delete 1 row.
There are just 3 rows in total in this test-table.
And without the WHERE-clause, it just seems to take forever sometimes. No matter how big the table is.
How do you truncate your data in Lakehouse table without deleting the table
Hi @sk_few
Thanks for using Microsoft Fabric Community.
Apologies for the inconvenience.
Notebooks being a viable approach for truncating data in Lakehouse tables in Fabric.
Lakehouse tables typically use append-only storage formats like Delta Lake. You can use DELETE statements with appropriate WHERE clauses to target specific rows for deletion. The DELETE statement doesn't physically remove data. Instead, it marks rows as deleted. This can be slow for small tables as it still needs to manage the metadata.
You can write PySpark scripts to manipulate dataframes and achieve truncating data in Lakehouse.
For additional information please refer : pyspark.sql.DataFrame.drop — PySpark master documentation (apache.org)
I hope this information helps. Please do let us know if you have any further questions.
Thanks.
Hi,
Thank you for taking your time to answer.
That can explain why, I keep maxing out Fabric Capacity, if it goes and remap all rows. I have like 100 million of rows that I delete quite often.
I have looked into the link of DrataDfram.drop, but that is only to be used for cropping columns,.
Please correct me if I am wrong. Is there a specific script that can empty a whole table permanently instead of marking them as Delete?
Additionally, I have looked into how to delete a parquet file, as they just keep piling up, taking up more and more storage space. Can you advise on this too?
I don't have experience with this, but I am guessing that the VACUUM command could be used here?
Hope someone else can confirm 😀
I just read about the vacuum command, and from what I read it seems to be fit for this kind of purpose, but I don't have the practical experience so don't trust me blindly on this 😊
Hi @sk_few
Apologies for the delay in response from my end.
As @frithjof_v mentioned, you can use VACCUM command.
For more details please refer : Use table maintenance feature to manage delta tables in Fabric
: Delta Lake table optimization and V-Order
And also please refer to the similar thread realted to VACCUM command : Solved: Re: Delta Lake VACUUM, how does it work? - Microsoft Fabric Community
I hope this information helps. Please do let us know if you have any further queries.
Thank you.
Hi @sk_few
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.
Thanks.
Hi @sk_few
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others .
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread.
Thanks.
Hi @sk_few
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.
Thanks.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.
Ask questions in Eventhouse and KQL, Eventstream, and Reflex.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |
User | Count |
---|---|
7 | |
6 | |
4 | |
3 | |
3 |