Solved: Re: Optimum set up for Dataflows

CrazyHorse66 · ‎04-14-2022

Hi, thanks in advance for your support

We have an ETL process set up using Dataflows, not without it's issues but generally working ok

1. Ingest - S1 Dataflows, data from source

2. Transform - S2 Dataflows, data from S1 Dataflows, transformed

3. Datasets using S2 Dataflows

We have now started to update these to use the Enhanced Compute Engine, with promising results

Key questions

a. Should the Enhanced be turned on for S1 AND S2, or ONLY S1, or ONLY S2? Or different settings (Optimised) for each?

b. What type of entity should the S2 dataflow use? Linked etc?

c. What method should S2 dataflow use - Direct Query or Import? Would be good to be able to filter the S1 data for use in S2 (reduce the size) depending on the requirement

d. We have set up Power Automate to trigger the S2 after S1 completes - is that the best solution based on the updates to any of the above?

Thanks again

v-easonf-msft · ‎04-17-2022

Hi， @CrazyHorse66

a. The main purpose of using the Enhanced Compute Engine is to drastically reduce the refresh time required for long-running ETL steps on compute entities. If you have many ETL operations (such as performing joins, distinct, filte) on S2, you should enable the enhanced compute engine for S2.

b.The following list describes some of the reasons you may choose to use linked tables:

If you want to reuse a table across multiple dataflows, such as a date table or a static lookup table, you should create a table once and then reference it across the other dataflows.
If you want to avoid creating multiple refreshes to a data source, it's better to use linked tables to store the data and act as a cache. Doing so allows every subsequent consumer to leverage that table, reducing the load to the underlying data source.
If you need to perform a merge between two tables.

c.You can use DirectQuery to connect directly to your dataflow, and thereby connect directly to your dataflow without having to import its data. Direct query avoids separate refresh schedules and it is aslo useful for working on a filtered view of data inside a dataflow.
Limitation: Large dataflows may have trouble with timeout issues when viewing visualizations. Large dataflows that run into trouble with timeout issues should use Import mode.

d. Personally, I also prefer to use Power Automate to trigger multiple dataflows and Power BI datasets sequentially.

Please refer to below tutorials for more details.
Premium features of dataflows

Create a dataflow using linked tables

Best Regards,
Community Support Team _ Eason

View solution in original post

CrazyHorse66 · ‎04-18-2022

I really appreciate your reply, thanks

Please could you provide further information:

a. Should the engine only be enabled for S2 then? Or S1 AND S2?

v-easonf-msft · ‎04-18-2022

Hi, @CrazyHorse66

If S1 is used to extract all of the required from a data sourcee and all ETL operations are performed in S2, you just need to enable the Enhanced Compute Engine for S2.

By the way, in Premium Gen2, Enhanced Compute Engine is turned off by default. It is automatically turned on when the dataflow is connected to another dataflow.

Best Regards,
Community Support Team _ Eason

CrazyHorse66 · ‎04-21-2022

Hi, a follow up questions please - if enhanced is enabled for both s1 (ingest) AND s2 (transform) is this a bad thing in terms of performance, or can cause conflict between the 2 working together in any way?

v-easonf-msft · ‎04-21-2022

Hi, @CrazyHorse66

It is not recommended to enable it for S1, it could lead to slower refresh times:

When the enhanced compute engine is enabled, it requires some memory to function properly. As such, memory available to perform a refresh is reduced and therefore increases the likelihood of refreshes to be queued, which in turn reduces the number of dataflows that can refresh concurrently. To address this, when enabling enhanced compute, increase the memory assigned for dataflows to ensure the memory available for concurrent dataflow refreshes remains the same.

Another reason you may encounter slower refreshes is that the compute engine only works on top of existing entities. If your dataflow references a data source that's not a dataflow, you won't see an improvement. There will be no performance increase, since in some big data scenarios, the initial read from a data source would be slower because the data needs to be passed to the enhanced compute engine.

Best Regards,
Community Support Team _ Eason

v-easonf-msft · ‎04-17-2022

Hi， @CrazyHorse66

a. The main purpose of using the Enhanced Compute Engine is to drastically reduce the refresh time required for long-running ETL steps on compute entities. If you have many ETL operations (such as performing joins, distinct, filte) on S2, you should enable the enhanced compute engine for S2.

b.The following list describes some of the reasons you may choose to use linked tables:

If you want to reuse a table across multiple dataflows, such as a date table or a static lookup table, you should create a table once and then reference it across the other dataflows.
If you want to avoid creating multiple refreshes to a data source, it's better to use linked tables to store the data and act as a cache. Doing so allows every subsequent consumer to leverage that table, reducing the load to the underlying data source.
If you need to perform a merge between two tables.

c.You can use DirectQuery to connect directly to your dataflow, and thereby connect directly to your dataflow without having to import its data. Direct query avoids separate refresh schedules and it is aslo useful for working on a filtered view of data inside a dataflow.
Limitation: Large dataflows may have trouble with timeout issues when viewing visualizations. Large dataflows that run into trouble with timeout issues should use Import mode.

d. Personally, I also prefer to use Power Automate to trigger multiple dataflows and Power BI datasets sequentially.

Please refer to below tutorials for more details.
Premium features of dataflows

Create a dataflow using linked tables

Best Regards,
Community Support Team _ Eason