Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I am wondering what is best practice when I have a two layer architecture, comprising Extract the Transform dataflows, and I have incremental refresh on the extract layer. If I want to aggregate, for example, in the Transform layer, based on a linked entity in the Extract layer, should I set up the aggregated table to have incremental refresh as well? Both layers are implimented as dataflows in the same workspace and the Extract layer dataflows have enhanced compute engine forced on and I am careful to make sure query folding is not broken.
Should I even bother with incremental refresh for the Transform layer given that the data is already loaded into the service in the extract layer? If yes, are there considerations to ensure the partitions are compatible?
Should I even bother with incremental refresh for the Transform layer given that the data is already loaded into the service in the extract layer?
Having the data in the service and getting the data into the service are different things. If your data source is fast enough to support flush and fill then you don't need incremental refresh. If your data source is slow or very big then incremental refresh can be used if you take special care of data changes (that may not happen in the "hot" partition).
I haven't heard definitive guidance but a partition size around 200M rows seems to be a bit of a sweet spot.
Hi @lbendlin , thanks for the reply, but my question is specifically about guidance on incremental refresh for a dataflow that only consumes a linked entity in another dataflow in the same workspace.
In other words, the data is already loaded to the service, via incremental refresh, in the first dataflow.
The first dataflow also has enhanced compute engine forced to on to ensure query folding is an option.
in that scenario there is no point in having a second dataflow, or having incremental refresh in it. (You can of course still do either if you want).
Apologies, I should have specified that I would normally be doing more than simply aggregating, and would usually be combining dimension entities, for example, from other dataflows, with the extracted fact table. It is actually recommended by MS to have a second dataflow and I have found that I have a lot less issues if I follow that practice. The only thing missing in this article is the question I am asking here about incremental refresh.
Look at it from the maintenance perspective. Any additional piece of process will require additional maintenance. They can recommend that and probably have good motives, but it adds to the maintenance.
OK, thanks, but in my experience the oposite is true. As I mentioned above, I have found that I have a lot less issues if I follow their best practice. Most of the issues I have with dataflows are due to bugs in the service, and these reduced significantly when I limited each dataflow to a single datasource, i.e. a single ODBC datasource in the staging layer, and Power BI Dataflows in the transformation layer.
Check out the September 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.