Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
jeffshieldsdev
Solution Sage
Solution Sage

When to split dataflows in ETL chain?

I have multiple dataflows, as a part of an ETL chain.

 

I'm following this great pattern from @MatthewRoche here: https://ssbipolar.com/2019/10/07/quick-tip-factoring-your-dataflow-entities/

 

For each entity, I have at least 3 dataflows:

 

  • 1-Ingest
  • 2-Cleanse
  • 3-Final

 

This is a great setup, because it gives me injection points if I need to add new or change data midstream.

 

At this point, Ingest simply ingests. I have a few Ingest dataflows with Incremental Refresh enabled.

 

Cleanse converts data types...my data source stores some numerical IDs as strings instead of integers--so I convert those here.

 

Final (at this point) simple renames the columns to business friendly names.

 

This is a great pattern--but I wonder if my minimal transformations require this many steps--or am I sacrificing performance by generating 3 different computed entities in this chain?

 

Should Ingest always just ingest, or if I'm simply casting and renaming columns--should I just use one dataflow? Does anyone have any recommendations in this space?  Thanks.

 

EDIT:

I think I answered my question on Ingest with Incremental Refresh.  When I enable Incremental Refresh, additional steps and queries are added ("_Canary", "RangeStart", and "RangeEnd") and a Table.Select() step added to my main query.  This steps is added last, so any other transformations will have to be performed first--meaning the Table.Select() will not fold and all records will have to be downloaded before they can be filtered.

 

EDIT2:

Although, I could have two queries in my dataflow: Customers_Ingest and Customers_Cleanse, where _Ingest is untransformed and incremental refresh enabled, and _Cleanse is linked and has the transformations.  Since these transformations are happening within the same dataflow though, I assume I wouldn't get the benefit of the enhanced compute engine.

1 REPLY 1
v-xuding-msft
Community Support
Community Support

Hi @jeffshieldsdev ,

If you need to get timely help, I think you could create a support ticket to get the dedicated support from Microsoft. You could reference the blog about how to create it.  I don't have much experience in ETL.  Sorry that I have not helped you.

Support Ticket.gif

Best Regards,

Xue Ding

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly. Kudos are nice too.

Best Regards,
Xue Ding
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

February 2024 Update Carousel

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Fabric Career Hub

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.

Top Solution Authors
Top Kudoed Authors