Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
RobertSlattery
Resolver III
Resolver III

Incremental refresh on computed tables in dataflows

I am wondering what is best practice when I have a two layer architecture, comprising Extract the Transform dataflows, and I have incremental refresh on the extract layer.  If I want to aggregate, for example, in the Transform layer, based on a linked entity in the Extract layer, should I set up the aggregated table to have incremental refresh as well?  Both layers are implimented as dataflows in the same workspace and the Extract layer dataflows have enhanced compute engine forced on and I am careful to make sure query folding is not broken.

 

Should I even bother with incremental refresh for the Transform layer given that the data is already loaded into the service in the extract layer?  If yes, are there considerations to ensure the partitions are compatible?

6 REPLIES 6
lbendlin
Super User
Super User

Should I even bother with incremental refresh for the Transform layer given that the data is already loaded into the service in the extract layer?

Having the data in the service and getting the data into the service are different things.  If your data source is fast enough to support flush and fill then you don't need incremental refresh.  If your data source is slow or very big  then incremental refresh can be used if you take special care of data changes (that may not happen in the "hot"  partition).

 

I haven't heard definitive guidance but a partition size around 200M rows seems to be a bit of a sweet spot.

Hi @lbendlin , thanks for the reply, but my question is specifically about guidance on incremental refresh for a dataflow that only consumes a linked entity in another dataflow in the same workspace.

 

In other words, the data is already loaded to the service, via incremental refresh, in the first dataflow.

The first dataflow also has enhanced compute engine forced to on to ensure query folding is an option.

in that scenario there is no point in having a second dataflow, or having incremental refresh in it.  (You can of course still do either if you want).

Apologies, I should have specified that I would normally be doing more than simply aggregating, and would usually be combining dimension entities, for example, from other dataflows, with the extracted fact table.  It is actually recommended by MS to have a second dataflow and I have found that I have a lot less issues if I follow that practice.   The only thing missing in this article is the question I am asking here about incremental refresh.

 

Look at it from the maintenance perspective.  Any additional piece of process will require additional maintenance. They can recommend that and probably have good motives, but it adds to the maintenance.

OK, thanks, but in my experience the oposite is true.  As I mentioned above,  I have found that I have a lot less issues if I follow their best practice.  Most of the issues I have with dataflows are due to bugs in the service, and these reduced significantly when I limited each dataflow to a single datasource, i.e. a single ODBC datasource in the staging layer, and Power BI Dataflows in the transformation layer.

Helpful resources

Announcements
Fabcon_Europe_Social_Bogo

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Power BI Carousel June 2024

Power BI Monthly Update - June 2024

Check out the June 2024 Power BI update to learn about new features.

PBI_Carousel_NL_June

Fabric Community Update - June 2024

Get the latest Fabric updates from Build 2024, key Skills Challenge voucher deadlines, top blogs, forum posts, and product ideas.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

Top Solution Authors