Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
CrazyHorse66
Helper I
Helper I

Optimum set up for Dataflows

Hi, thanks in advance for your support

 

We have an ETL process set up using Dataflows, not without it's issues but generally working ok

1. Ingest - S1 Dataflows, data from source

2. Transform - S2 Dataflows, data from S1 Dataflows, transformed

3. Datasets using S2 Dataflows

 

We have now started to update these to use the Enhanced Compute Engine, with promising results

Key questions

a. Should the Enhanced be turned on for S1 AND S2, or ONLY S1, or ONLY S2? Or different settings (Optimised) for each?

b. What type of entity should the S2 dataflow use? Linked etc?

c. What method should S2 dataflow use - Direct Query or Import? Would be good to be able to filter the S1 data for use in S2 (reduce the size) depending on the requirement

d. We have set up Power Automate to trigger the S2 after S1 completes - is that the best solution based on the updates to any of the above?

 

Thanks again

1 ACCEPTED SOLUTION
v-easonf-msft
Community Support
Community Support

Hi, @CrazyHorse66 

a. The main purpose of using the Enhanced Compute Engine is to drastically reduce the refresh time required for long-running ETL steps on compute entities. If you have many ETL operations (such as performing joins, distinct, filte) on S2, you should enable the enhanced compute engine for S2.

b.The following list describes some of the reasons you may choose to use linked tables:

  • If you want to reuse a table across multiple dataflows, such as a date table or a static lookup table, you should create a table once and then reference it across the other dataflows.

  • If you want to avoid creating multiple refreshes to a data source, it's better to use linked tables to store the data and act as a cache. Doing so allows every subsequent consumer to leverage that table, reducing the load to the underlying data source.

  • If you need to perform a merge between two tables.

c.You can use DirectQuery to connect directly to your dataflow, and thereby connect directly to your dataflow without having to import its data. Direct query avoids separate refresh schedules and it is aslo useful for working on a filtered view of data inside a dataflow. 
Limitation: Large dataflows may have trouble with timeout issues when viewing visualizations. Large dataflows that run into trouble with timeout issues should use Import mode.

d. Personally, I also prefer to use Power Automate to trigger multiple dataflows and Power BI datasets sequentially.

Please refer to below tutorials for more details.
Premium features of dataflows 

Create a dataflow using linked tables 

 

Best Regards,
Community Support Team _ Eason

View solution in original post

5 REPLIES 5
CrazyHorse66
Helper I
Helper I

I really appreciate your reply, thanks

 

Please could you provide further information:

a. Should the engine only be enabled for S2 then? Or S1 AND S2?

 

Hi, @CrazyHorse66 

If S1 is used to extract all of the required from a data sourcee and all ETL operations are performed in S2, you just need to enable the Enhanced Compute Engine for S2.

 

By the way, in Premium Gen2, Enhanced Compute Engine is turned off by default. It is automatically turned on when the dataflow is connected to another dataflow.

 

Best Regards,
Community Support Team _ Eason

Hi, a follow up questions please - if enhanced is enabled for both s1 (ingest) AND s2 (transform) is this a bad thing in terms of performance, or can cause conflict between the 2 working together in any way?

Hi, @CrazyHorse66 

It is not recommended to enable it for S1, it could lead to slower refresh times:

 

  • When the enhanced compute engine is enabled, it requires some memory to function properly. As such, memory available to perform a refresh is reduced and therefore increases the likelihood of refreshes to be queued, which in turn reduces the number of dataflows that can refresh concurrently. To address this, when enabling enhanced compute, increase the memory assigned for dataflows to ensure the memory available for concurrent dataflow refreshes remains the same.

  • Another reason you may encounter slower refreshes is that the compute engine only works on top of existing entities. If your dataflow references a data source that's not a dataflow, you won't see an improvement. There will be no performance increase, since in some big data scenarios, the initial read from a data source would be slower because the data needs to be passed to the enhanced compute engine.


Best Regards,
Community Support Team _ Eason

v-easonf-msft
Community Support
Community Support

Hi, @CrazyHorse66 

a. The main purpose of using the Enhanced Compute Engine is to drastically reduce the refresh time required for long-running ETL steps on compute entities. If you have many ETL operations (such as performing joins, distinct, filte) on S2, you should enable the enhanced compute engine for S2.

b.The following list describes some of the reasons you may choose to use linked tables:

  • If you want to reuse a table across multiple dataflows, such as a date table or a static lookup table, you should create a table once and then reference it across the other dataflows.

  • If you want to avoid creating multiple refreshes to a data source, it's better to use linked tables to store the data and act as a cache. Doing so allows every subsequent consumer to leverage that table, reducing the load to the underlying data source.

  • If you need to perform a merge between two tables.

c.You can use DirectQuery to connect directly to your dataflow, and thereby connect directly to your dataflow without having to import its data. Direct query avoids separate refresh schedules and it is aslo useful for working on a filtered view of data inside a dataflow. 
Limitation: Large dataflows may have trouble with timeout issues when viewing visualizations. Large dataflows that run into trouble with timeout issues should use Import mode.

d. Personally, I also prefer to use Power Automate to trigger multiple dataflows and Power BI datasets sequentially.

Please refer to below tutorials for more details.
Premium features of dataflows 

Create a dataflow using linked tables 

 

Best Regards,
Community Support Team _ Eason

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors