Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi, thanks in advance for your support
We have an ETL process set up using Dataflows, not without it's issues but generally working ok
1. Ingest - S1 Dataflows, data from source
2. Transform - S2 Dataflows, data from S1 Dataflows, transformed
3. Datasets using S2 Dataflows
We have now started to update these to use the Enhanced Compute Engine, with promising results
Key questions
a. Should the Enhanced be turned on for S1 AND S2, or ONLY S1, or ONLY S2? Or different settings (Optimised) for each?
b. What type of entity should the S2 dataflow use? Linked etc?
c. What method should S2 dataflow use - Direct Query or Import? Would be good to be able to filter the S1 data for use in S2 (reduce the size) depending on the requirement
d. We have set up Power Automate to trigger the S2 after S1 completes - is that the best solution based on the updates to any of the above?
Thanks again
Solved! Go to Solution.
Hi, @CrazyHorse66
a. The main purpose of using the Enhanced Compute Engine is to drastically reduce the refresh time required for long-running ETL steps on compute entities. If you have many ETL operations (such as performing joins, distinct, filte) on S2, you should enable the enhanced compute engine for S2.
b.The following list describes some of the reasons you may choose to use linked tables:
If you want to reuse a table across multiple dataflows, such as a date table or a static lookup table, you should create a table once and then reference it across the other dataflows.
If you want to avoid creating multiple refreshes to a data source, it's better to use linked tables to store the data and act as a cache. Doing so allows every subsequent consumer to leverage that table, reducing the load to the underlying data source.
c.You can use DirectQuery to connect directly to your dataflow, and thereby connect directly to your dataflow without having to import its data. Direct query avoids separate refresh schedules and it is aslo useful for working on a filtered view of data inside a dataflow.
Limitation: Large dataflows may have trouble with timeout issues when viewing visualizations. Large dataflows that run into trouble with timeout issues should use Import mode.
d. Personally, I also prefer to use Power Automate to trigger multiple dataflows and Power BI datasets sequentially.
Please refer to below tutorials for more details.
Premium features of dataflows
Create a dataflow using linked tables
Best Regards,
Community Support Team _ Eason
I really appreciate your reply, thanks
Please could you provide further information:
a. Should the engine only be enabled for S2 then? Or S1 AND S2?
Hi, @CrazyHorse66
If S1 is used to extract all of the required from a data sourcee and all ETL operations are performed in S2, you just need to enable the Enhanced Compute Engine for S2.
By the way, in Premium Gen2, Enhanced Compute Engine is turned off by default. It is automatically turned on when the dataflow is connected to another dataflow.
Best Regards,
Community Support Team _ Eason
Hi, a follow up questions please - if enhanced is enabled for both s1 (ingest) AND s2 (transform) is this a bad thing in terms of performance, or can cause conflict between the 2 working together in any way?
Hi, @CrazyHorse66
It is not recommended to enable it for S1, it could lead to slower refresh times:
When the enhanced compute engine is enabled, it requires some memory to function properly. As such, memory available to perform a refresh is reduced and therefore increases the likelihood of refreshes to be queued, which in turn reduces the number of dataflows that can refresh concurrently. To address this, when enabling enhanced compute, increase the memory assigned for dataflows to ensure the memory available for concurrent dataflow refreshes remains the same.
Another reason you may encounter slower refreshes is that the compute engine only works on top of existing entities. If your dataflow references a data source that's not a dataflow, you won't see an improvement. There will be no performance increase, since in some big data scenarios, the initial read from a data source would be slower because the data needs to be passed to the enhanced compute engine.
Best Regards,
Community Support Team _ Eason
Hi, @CrazyHorse66
a. The main purpose of using the Enhanced Compute Engine is to drastically reduce the refresh time required for long-running ETL steps on compute entities. If you have many ETL operations (such as performing joins, distinct, filte) on S2, you should enable the enhanced compute engine for S2.
b.The following list describes some of the reasons you may choose to use linked tables:
If you want to reuse a table across multiple dataflows, such as a date table or a static lookup table, you should create a table once and then reference it across the other dataflows.
If you want to avoid creating multiple refreshes to a data source, it's better to use linked tables to store the data and act as a cache. Doing so allows every subsequent consumer to leverage that table, reducing the load to the underlying data source.
c.You can use DirectQuery to connect directly to your dataflow, and thereby connect directly to your dataflow without having to import its data. Direct query avoids separate refresh schedules and it is aslo useful for working on a filtered view of data inside a dataflow.
Limitation: Large dataflows may have trouble with timeout issues when viewing visualizations. Large dataflows that run into trouble with timeout issues should use Import mode.
d. Personally, I also prefer to use Power Automate to trigger multiple dataflows and Power BI datasets sequentially.
Please refer to below tutorials for more details.
Premium features of dataflows
Create a dataflow using linked tables
Best Regards,
Community Support Team _ Eason
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.