Incremental refresh on computed tables in dataflow...

RobertSlattery · ‎12-29-2023

I am wondering what is best practice when I have a two layer architecture, comprising Extract the Transform dataflows, and I have incremental refresh on the extract layer. If I want to aggregate, for example, in the Transform layer, based on a linked entity in the Extract layer, should I set up the aggregated table to have incremental refresh as well? Both layers are implimented as dataflows in the same workspace and the Extract layer dataflows have enhanced compute engine forced on and I am careful to make sure query folding is not broken.

Should I even bother with incremental refresh for the Transform layer given that the data is already loaded into the service in the extract layer? If yes, are there considerations to ensure the partitions are compatible?

lbendlin · ‎12-30-2023

Should I even bother with incremental refresh for the Transform layer given that the data is already loaded into the service in the extract layer?

Having the data in the service and getting the data into the service are different things. If your data source is fast enough to support flush and fill then you don't need incremental refresh. If your data source is slow or very big then incremental refresh can be used if you take special care of data changes (that may not happen in the "hot" partition).

I haven't heard definitive guidance but a partition size around 200M rows seems to be a bit of a sweet spot.

RobertSlattery · ‎12-30-2023

Hi @lbendlin , thanks for the reply, but my question is specifically about guidance on incremental refresh for a dataflow that only consumes a linked entity in another dataflow in the same workspace.

In other words, the data is already loaded to the service, via incremental refresh, in the first dataflow.

The first dataflow also has enhanced compute engine forced to on to ensure query folding is an option.

lbendlin · ‎12-31-2023

in that scenario there is no point in having a second dataflow, or having incremental refresh in it. (You can of course still do either if you want).

RobertSlattery · ‎12-31-2023

Apologies, I should have specified that I would normally be doing more than simply aggregating, and would usually be combining dimension entities, for example, from other dataflows, with the extracted fact table. It is actually recommended by MS to have a second dataflow and I have found that I have a lot less issues if I follow that practice. The only thing missing in this article is the question I am asking here about incremental refresh.

lbendlin · ‎12-31-2023

Look at it from the maintenance perspective. Any additional piece of process will require additional maintenance. They can recommend that and probably have good motives, but it adds to the maintenance.

RobertSlattery · ‎12-31-2023

OK, thanks, but in my experience the oposite is true. As I mentioned above, I have found that I have a lot less issues if I follow their best practice. Most of the issues I have with dataflows are due to bugs in the service, and these reduced significantly when I limited each dataflow to a single datasource, i.e. a single ODBC datasource in the staging layer, and Power BI Dataflows in the transformation layer.

Incremental refresh on computed tables in dataflows

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024