Dataflow Gen2 - Table not getting dropped and recr...

frithjof_v · ‎07-11-2024

Hi,

According to the documentation about managed settings for new tables:

"Drop and recreate table: To allow for these schema changes, on every dataflow refresh the table is dropped and recreated. Your dataflow refresh might cause the removal of relationships or measures that were added previously to your table."

Dataflow Gen2 data destinations and managed settings - Microsoft Fabric | Microsoft Learn

However in my Lakehouse, I can still query old versions of the table by using notebook.

So it seems the table is not actually dropped and recreated when the dataflow gen2 refreshes?

Instead, it seems to happen two operations: ReplaceTable and Update. Note that I can still query the old versions of the table by using time travel (e.g. '%%sql SELECT * FROM Table_aa VERSION AS OF 1' will work fine).

So it doesn't seem that the table is actually dropped and recreated.

I am using the managed settings for new tables (automatic settings) in my dataflow destination settings.

Is the documentation incorrect on this point, or am I missing something?

Thank you 😀

v-nuoc-msft · ‎07-11-2024

Hi @frithjof_v

Each operation that modifies the Delta Lake table creates a new version of the table.

You can use the history information to audit operations, roll back tables, or query tables at specific points in time using time travel.

Table history retention is determined by the table setting , which is 30 days by default.

The original table has been replaced by a newer one, but you can view the original table in the notebook by querying the table version.

This is normal and not because the table has not been deleted or recreated.

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

frithjof_v · ‎07-11-2024

If a table gets dropped and recreated, there should be no history anymore (and not possible to do time travel), isn't that correct?

Because dropping a table means to delete the table, also the version history?

If I run a command to manually drop a table, e.g. '%%sql DROP TABLE Table_aa'

then there will be no table history anymore because the table has been dropped.

What I am saying is I don't think the Dataflow Gen2 really drops the table, as the documentation states.

Instead it seems to me that the dataflow gen2 does some kind of overwrite or replace of the table instead, which is different than a drop. Because overwrite and replace operations keep the history.

Maybe similar to replace which is mentioned here:

https://docs.databricks.com/en/delta/drop-table.html#when-to-replace-a-table

So I would like to know if the behaviour I see is the expected behaviour, in that case I think the documentation is incorrect because the documentation states that the table will be dropped and recreated, however it doesn't seem to me that the table actually gets dropped.

v-nuoc-msft · ‎07-11-2024

Hi @frithjof_v

Your idea is reasonable.

"Drop and recreate table: To allow for these schema changes, on every dataflow refresh the table is dropped and recreated. Your dataflow refresh might cause the removal of relationships or measures that were added previously to your table."

The documentation mentions that tables are deleted and recreated when the data flow is refreshed, which you can actually think of as an overwriting operation. The new table overwrites the original table.

You mentioned:

"If I run a command to manually drop a table, e.g. '%%sql DROP TABLE Table_aa'

Then there will be no table history anymore because the table has been dropped."

This is also true because the table has been deleted. These are actually two different operations that the document aims to refresh.

I hope I have clarified the matter for you.

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

frithjof_v · ‎07-12-2024

I'm not quite sure I understand.

"These are actually two different operations that the document aims to refresh."

"The documentation mentions that tables are deleted and recreated when the data flow is refreshed, which you can actually think of as an overwriting operation. The new table overwrites the original table."

Per my understanding, delete (drop) and recreate a table is a different concept than overwrite a table, especially in Delta lake world where old versions are kept when overwriting a table, but old versions are deleted when dropping a table.

So I would like to get clarification about this part of the documentation:

"Drop and recreate table: To allow for these schema changes, on every dataflow refresh the table is dropped and recreated."

Does the table actually get dropped and recreated?

Or it just gets replaced?

Ref.

https://docs.databricks.com/en/delta/drop-table.html#when-to-replace-a-table

If the table doesn't really get dropped, then I think this part of the documentation should be changed because the documentation says that the table gets dropped.

However, my understanding about what it means that a table gets dropped, may be wrong.

But so far, my understanding when you drop a table is that is deletes the table including all the version history. So it seems to me the documentation here is incorrect, because the version history actually seems to be retained after refreshing the dataflow.

v-nuoc-msft · ‎07-12-2024

Hi @frithjof_v

We may need more time to discuss the problem.

Thank you for your discoveries and questions!

If you still have questions about the official documentation, you can provide product feedback in the Feedback section below the documentation.

Regards,

Nono Chen

Dataflow Gen2 - Table not getting dropped and recreated

Helpful resources

Join us at the Microsoft Fabric Community Conference

We want your feedback!

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024

Join us at the 2025 Microsoft Fabric Community Conference

Dataflow Gen2 - Table not getting dropped and recreated

Helpful resources

Join us at the Microsoft Fabric Community Conference

We want your feedback!

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024