Dataflows Gen2 data destinations and managed setti...

mideboer · ‎02-22-2024

We are very excited to announce a lot of new improvements to data destinations in Dataflows Gen2. Here is an overview of the new improvements and how to get started.

After you have cleaned and prepared your data with dataflows gen 2, you want to land your data into a destination. This is possible with the data destination capabilities in dataflows gen 2. With this capability you can pick from different destinations, like Azure SQL, Fabric Lakehouse and many more. Dataflows gen2 will write your data to the destination, and from there you can use your data for further analysis and reporting.

Supported data destinations.

Azure SQL databases
Azure data explorer (Kusto)
Fabric Lakehoue
Fabric Warehouse
Fabric KQL database

Entry points

Every data query in your Dataflow Gen2 can have a data destination. Functions and list are not supported. You can specify the data destination for every query individually and you can use multiple different destinations within the dataflow.

There are 3 main entry-points to specify the data destination.

Through the top ribbon

Add_data_destination_entry_point_inside_of_the_Power_Query_ribbon

Select data destination from ribbon.

Through query settings

Add_Data_destination_entry_point_inside_of_the_applied_steps_section

Select data destination from query setting pane.

Through the diagram view

Add_data_destination_inside_of_the_query_node_of_the_diagram_view

Select data destination from diagram view.

Connect to the data destination

Connecting to the data destination is similar to connecting to a data source. Connections can be used for both reading and writing your data, given that you have the right permissions on the data source. You need to create a new connection or pick an existing connection and click next.

Connect_to_data_destination_dialog_when_connecting_to_a_lakehouse

Make a connection to the Fabric Lakehouse.

Create a new table or pick an existing table

When loading into your data destination, you have the possibility to either create a new table or pick an existing table.

New table

When choosing to create a new table, during the dataflow gen 2 refresh, a new table will be created in your data destination. If the table gets deleted in the future by manually going into the destination, the dataflow will recreate the table on the next dataflow refresh.

By default, you table name will have the same name as your query name. If you have any invalid characters in your table name that are not supported by the destination, it will automatically be adjusted. For example, many destinations do not support spaces or special characters.

Choose_destination_target_for_the_data_destination_feature_when_using_a_Lakehous

Create a new table in the data destination.

Next, you must select the destination container. If you chose any of the Fabric data destination, you could use the navigator to select the Fabric artifact you want to load your data into. For Azure destinations, you can either specify the database during connection creation, or select the database from the navigator experience.

Existing table

To choose an existing table, use the toggle at the top of the navigator. When choosing an existing table, you need to pick both the Fabric artifact/database and table using the navigator.

When using an existing table, the table will not be recreated in any scenario. If you delete the table manually from the data destination, the dataflow gen 2 will not recreate the table on the next refresh.

previewing_a_table_when_using_the_choose_destination_target_table

Select an existing table from the data destination.

Managed settings for new tables

When loading into a new table, by default the automatic settings are on. Using the automatic settings, dataflows gen 2 manages the mapping for you. This will allow you the following behavior:

Update method replace: Data will be replaced at every dataflow refresh. Any data in the destination will be removed. The data in the destination will be replaced with the output data of the dataflow.
Managed mapping: Mapping is managed for you. When you need to make changes to your data/query to add an additional column or change a data type, mapping is automatically adjusted for this when you republish your dataflow. You do not have to go into the data destination experience every time you make changes to your dataflow, allowing you for easy schema changes when you republish the dataflow.
Drop and recreate table: To allow for these schema changes, on every dataflow refresh, the table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.

NOTE: currently this is only supported for Lakehouse and Azure SQL database as data destination.

choose_destination_settings_dialog_and_the_automatic_settings_being_enabled

Use automatic setting for data destination.

Manual settings

By un-toggle the use automatic setting, you get full control over how to load your data into the data destination. You can make any changes to the column mapping by changing the source type or excluding any column that you do not need in your data destination.

choose_destination_settings_dialog_when_manually_selecting_the_updated_method_an

Use manual settings to control the data destination.

Update methods.

Most destinations support both Append and Replace as update methods. Fabric KQL databases and Azure data explorer do not support replace as update method.

Replace: On every dataflow refresh, your data will be dropped from the destination and replaced by the output data of the dataflow.

Append: On every dataflow refresh, the output data from the dataflow will be appended to the existing data in the data destination table.

Schema options on publish
Schema options on publish only apply when the update method in replace. When appending data, changes to the schema are not possible.

Dynamic schema: When choosing dynamic schema, you allow for schema changes in the data destination when you republish the dataflow. Because you are not using managed mapping, you still need to update the column mapping in dataflow destination flow when you make any changes to your query. When the dataflow is refreshed, your table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.

Fixed schema: When choosing fixed schema, schema changes are not possible. When the dataflow gets refreshed, only the rows in the table will be dropped and replaced with the output data from the dataflow. Any relationships or measures on the table will stay intact. If you make any changes to your query in the dataflow, the dataflow publish will fail if it detects that the query schema does not match the data destination schema. Use this setting when you do not plan to change the schema and have relationships or measure added to you destination table.

NOTE: when loading data into the warehouse, only fixed schema is supported:

schema_options_on_publish_section_that_define_what_can_be_accomplished_during_a

Dynamic schema is not supported for Fabric Warehouse.

Supported data source types per destination

Supported data types per storage location:	DataflowStagingLakehouse	Azure DB (SQL) Output	Azure Data Explorer Output	Fabric Lakehouse (LH) Output	Fabric Warehouse (WH) Output
Action	No	No	No	No	No
Any	No	No	No	No	No
Binary	No	No	No	No	No
Currency	Yes	Yes	Yes	Yes	No
DateTimeZone	Yes	Yes	Yes	No	No
Duration	No	No	Yes	No	No
Function	No	No	No	No	No
None	No	No	No	No	No
Null	No	No	No	No	No
Time	Yes	Yes	No	Yes	Yes
Type	No	No	No	No	No
Structured (List, Record, Table)	No	No	No	No	No

List of supported data source types by destination types.

Advanced topics:

Using staging before loading to a destination

To enhance performance of query processing, staging can be used within dataflows gen 2 to leverage Fabric compute to execute your queries.

When staging is enabled on your queries (this is the default behavior), you data will be loading into the staging location, which is an internal Lakehouse only accessible by dataflows itself.

Using staging locations can be either beneficial or the opposite when loading data into the data destination.

Loading data into the Lakehouse

When you are loading data into the Lakehouse, it is advised to disable staging on the query to avoid loading twice into a similar destination, once for staging and once for data destination. To improve the dataflow performance, disable staging for any query that has Lakehouse as the data destination.

To disable staging, right-click on the query and disable staging, by clicking on the enable staging button. Your query will turn grey.

Enable_staging_flag_enabled_for_the_query_named_Query_inside_of_the_Diagram_view

Enable staging through the context menu.

Loading data into the Warehouse

When loading data into the Warehouse, staging is required before the write operation to the data destination. This is to improve performance. Currently, only loading into the same warehouse as the dataflow is supported. Ensure staging is enabled for all queries that load into the warehouse.

When staging is disabled, and you choose warehouse as output destination, you will get a warning to enable staging first before you can configure the data destination:

Add_data_destination_pop-up_dialog_to_confirm_that_you_want_to_enable_the_stagin

Warning to enable staging bore adding Fabric warehouse as a data destination.

If you already have a warehouse as destination and try to disable staging, a warning will be shown where you can either remove warehouse as the destination or dismiss the staging action:

Warning_to_remove_warehouse_as_a_destination_before_disabeling_staging

Warning to remove warehouse as a destination before disabling staging.

Nullable issues

When you have a nullable column, in some cases, it gets detected by Power Query as non-nullable and when writing to the data destination, the column type will be non-nullable. During refresh, the following error will occur:

E104100 Couldn’t refresh entity because of an issue with the mashup document MashupException.Error: DataFormat.Error: Error in replacing table’s content with new data in a version: #{0}., InnerException: We can’t insert null data into a non-nullable column., Underlying error: We can’t insert null data into a non-nullable column. Details: Reason = DataFormat.Error;Message = We can’t insert null data into a non-nullable column.; Message.Format = we can’t insert null data into a non-nullable column.

To force nullable columns you can try to do the following steps:

Delete the table from the data destination
Remove the data destination from the dataflow
Go into the dataflow and update the data types by leveraging the following PQ code:

Table.TransformColumnTypes(#"PREVIOUS STEP", {{"COLLUMNNAME1", type nullable text}, {"COLLUMNNAME2", type nullable Int64.Type}})
Add the data destination

Join us in Barcelona for FabCon and SQLCon, September 28 - October 1, 2026.