Solved: Re: How to truncate a lakehouse table.

Dinesh09 · ‎12-10-2023

I have a multiple dataflowgen2 inserts data into the same lakehouse table. Before inserting data into lakehouse table, I would like to truncate the lakehouse table. How can I do that. The lakehouse table already have 40 milions rows.

Dinesh09 · ‎12-10-2023

Notebook is the only option to drop the table right ?

View solution in original post

AndyDDC · ‎12-10-2023

what are the repercussions if the destination table didn't exist downstream? Eg I'm thinking that dropping the table beforehand and then recreating via the data flow may have less overhead

Dinesh09 · ‎12-10-2023

Notebook is the only option to drop the table right ?

Scott_Powell · ‎12-11-2023

Yep I think so. But again a Gen2 dataflow just set to "replace" will truncate the table before it starts loading the new data.

Hope this helps!

Scott

Scott_Powell · ‎12-11-2024

Just saw this thread pop up again, and I will say that we 100% don't use gen2 dataflows any more. Months ago (March or April 2024 I think) Microsoft released the capability for a copy activity in a pipeline to source on-prem data.

All of our copy jobs use this now, and with dynamic partitioning we're able to load data very quickly - I can load 10+ billion rows of data in a few hours. And we don't have to worry about updating a ton of gen2 dataflows when something changes.

Hope this helps,

Scott

Scott_Powell · ‎12-10-2023

@AndyDDC not really sure - I decided to keep as few moving parts as possible. Didn't really notice any issues with using the "replace" option to clear out the data. That job didn't seem to run abnormally long or anything.

Hope this helps!

Scott

Scott_Powell · ‎12-10-2023

That's exactly what I'm doing - running multiple jobs in parallel to populate a large table (in my case just over 1 billion rows). The first Gen2 dataflow does a "replace" which truncates the data, and then loads it's slice. After it's done, I launch the other Gen2 dataflows, all of which append in parallel.

Hope this helps,

Scott

Dinesh09 · ‎12-10-2023

Let me become precise. In one dataflowgen2 I have multiple tables. When dataflowgen2 refresh triggers all the tables within this flow will start inserting data into same lakehouse table parallely. I took this decision because my source table has 40 million rows. I divided this huge table into chunks. I am thinking that we would need to spinup spark cluster and run the pyspark script to wipe out all the rows in the lakehouse table and insert data. I am not sure about the script to be written to notebook. I don't want log files also to trace back. Data will be always available in source system.

Scott_Powell · ‎12-10-2023

I just create a copy of the dataflow and set the method to "replace" instead of append. Run that first, and then have the other "Append" dataflows after that (using pipelines can really help keep this straight)

Hope this helps!

Scott

p.s. what I completely DON'T know how to do is delete "some" of the data out. For example "I accidentally loaded Jan 2023 twice, I need to delete just that month and then reload it once". No clue on that - maybe partitioning and deleting the partition files, but that doesn't help if the tables aren't partitioned by the slice you want to delete

Dinesh09 · ‎12-10-2023

Thank for adding your thoughts and workaround. Let me become precise. In one dataflowgen2 I have multiple tables. When dataflowgen2 refresh triggers all the tables within this flow will start inserting data into same lakehouse table parallely. I took this decision because my source table has 40 million rows. I divided this huge table into chunks. I am thinking that we would need to spinup spark cluster and run the pyspark script to wipe out all the rows in the lakehouse table and insert data. I am not sure about the script to be written to notebook. I don't want log files also to trace back. Data will be always available in source system.

How to truncate a lakehouse table.

Helpful resources

Join us at the Microsoft Fabric Community Conference

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024

New Offer! Become a Certified Fabric Data Engineer

How to truncate a lakehouse table.

Helpful resources

Join us at the Microsoft Fabric Community Conference

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024