cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.

Reply
jeffshieldsdev
Solution Sage
Solution Sage

When to split dataflows in ETL chain?

I have multiple dataflows, as a part of an ETL chain.

 

I'm following this great pattern from @MatthewRoche here: https://ssbipolar.com/2019/10/07/quick-tip-factoring-your-dataflow-entities/

 

For each entity, I have at least 3 dataflows:

 

  • 1-Ingest
  • 2-Cleanse
  • 3-Final

 

This is a great setup, because it gives me injection points if I need to add new or change data midstream.

 

At this point, Ingest simply ingests. I have a few Ingest dataflows with Incremental Refresh enabled.

 

Cleanse converts data types...my data source stores some numerical IDs as strings instead of integers--so I convert those here.

 

Final (at this point) simple renames the columns to business friendly names.

 

This is a great pattern--but I wonder if my minimal transformations require this many steps--or am I sacrificing performance by generating 3 different computed entities in this chain?

 

Should Ingest always just ingest, or if I'm simply casting and renaming columns--should I just use one dataflow? Does anyone have any recommendations in this space?  Thanks.

 

EDIT:

I think I answered my question on Ingest with Incremental Refresh.  When I enable Incremental Refresh, additional steps and queries are added ("_Canary", "RangeStart", and "RangeEnd") and a Table.Select() step added to my main query.  This steps is added last, so any other transformations will have to be performed first--meaning the Table.Select() will not fold and all records will have to be downloaded before they can be filtered.

 

EDIT2:

Although, I could have two queries in my dataflow: Customers_Ingest and Customers_Cleanse, where _Ingest is untransformed and incremental refresh enabled, and _Cleanse is linked and has the transformations.  Since these transformations are happening within the same dataflow though, I assume I wouldn't get the benefit of the enhanced compute engine.

1 REPLY 1
v-xuding-msft
Community Support
Community Support

Hi @jeffshieldsdev ,

If you need to get timely help, I think you could create a support ticket to get the dedicated support from Microsoft. You could reference the blog about how to create it.  I don't have much experience in ETL.  Sorry that I have not helped you.

Support Ticket.gif

Best Regards,

Xue Ding

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly. Kudos are nice too.

Best Regards,
Xue Ding
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
PBI November 2023 Update Carousel

Power BI Monthly Update - November 2023

Check out the November 2023 Power BI update to learn about new features.

Community News

Fabric Community News unified experience

Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.

Dashboard in a day with date

Exclusive opportunity for Women!

Join us for a free, hands-on Microsoft workshop led by women trainers for women where you will learn how to build a Dashboard in a Day!

Power BI Fabric Summit Carousel

The largest Power BI and Fabric virtual conference

130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.

Top Solution Authors