Re: Proper workflow for dataflows using merged tab...

MG86 · ‎12-05-2023

Hi,

I have a SQL Source of which, for this example, I want to load Table A and Table B, but also want to add some columns from Table B to Table A via a Merge.

First workflow I tried:

1. Staging Dataflow that loads Table A and Table B without further transformation (except for some 'remove other columns' )
2. Set up Incremental Refresh on both tables, with partitioning on month level and detect data changes.
3. New Dataflow that gets tables from first staging dataflow as linked tables, Enhanced Compute Engine set to On.
4, Merge Table A and B into new Calculated Table, this is using the SQL endpoint of the staging dataflow I believe.

5. Set up Incremental Refresh on the new Calculated Table, with partitioning on month level and detect data changes.

I figured this should be the fastest as Staging dataflow makes sure that both tables are up-to-date and then the merge is done with the enhanced compute engine on the dataflow instead of the source. But it turns out slower than the following workflow:

1. Single dataflow that loads table A and table B but disabled Load on Table B

2. Merge Table B with Table A

3. Load table B again with enabled Load

3. Set incremental refresh on Table A and B with detect data changes.

In this case the merge is queried at the source. This is much faster, but it does not account for changes in Table B. But if I enable load on Table B then the merged table becomes a calculated table again on a linked table and the same happens as the earlier workflow. That's why I query it twice but once loaded and once not loaded.

When dealing with merges and working with incremental refresh and detect data changes, what is the proper way? I want to minimize data load while keeping both tables up-to-date. From the documentation the workflow with a staging dataflow and then a transform dataflow for any further ETL such as joins was recommended, but as stated it's 2 to 3 times slower.

lbendlin · ‎12-10-2023

The proper workflow for this is not to do the merge. Let the data model do the work for you.

Martin1986 · ‎12-11-2023

Fair point and that's usually what I do, so this is an exception. Thus I wanted to do it as far upstream as possible (can't change the views so that's where it ends) rather than a calculated column in DAX.

lbendlin · ‎12-11-2023

In my personal opinion a dataflow is only useful to shield the developers (NOT the report users) from a slow data source. If your data source is not slow then eliminate the dataflow and use a dataset with incremental refresh.

MG86 · ‎12-12-2023

I plan on using the dataflow in multiple datasets, that's the reason I choose a dataflow.

Proper workflow for dataflows using merged tables

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

Proper workflow for dataflows using merged tables

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025