Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more
I have to load data from SAP HANA DB to fabric warehouse.
there are more than 15 tables to be loaded each with large number of rows
is it good to create one pipeline with more than 15 dataglow gen 2 for each table to load data? Or create one pipeline for each table?
Solved! Go to Solution.
Ideally the best way would have been to have a single pipeline and a single dataflow Gen 2 with a meta data driven framework but as of today, dataflow Gen 2 cannot be parameterized.
So owning to that, you would have to create sepearte dataflows per table.
Now the question is should those be triggered or integrated within a signle pipeline or different pipelines?
For that, it depends on your requirement :
1) Is the load for all tables supposed to be scheudled at the same time?
2) Is there any flow dependency between the tables? meaning should those be some sequentail / dependant flow or all tables are independant of each other?
If those tables are to be loaded atthe same time, then the best way would be for you to craete a signle pipeline and integrate all the dataflows within the same pipeline.
I really appreciate your prompt response. Thank you for that.
Lets keep the dependency aside for a minute.
Will it be still ok to have multiple dataflows each for a table inside a single pipeline if those tables have millions of records?
The dataflows gen 2 would use the same capacity tied to the workspace.
So whether you run them within the same pipeline or run them parallely via different pipelines, the capacity being utilized is the same. So it wont matter but in case if the capacity is getting throttled with some many runs, atleast you can make them sequential or depeandt in case if all are within the same pipeline and manage the capacity utilization
Okay. got it. 👍
Thank you once again for the prompt responses.
Hi @ManasiL ,
Glad to know that your query got resolved. Please continue using Fabric Community on your further queries.
Ideally the best way would have been to have a single pipeline and a single dataflow Gen 2 with a meta data driven framework but as of today, dataflow Gen 2 cannot be parameterized.
So owning to that, you would have to create sepearte dataflows per table.
Now the question is should those be triggered or integrated within a signle pipeline or different pipelines?
For that, it depends on your requirement :
1) Is the load for all tables supposed to be scheudled at the same time?
2) Is there any flow dependency between the tables? meaning should those be some sequentail / dependant flow or all tables are independant of each other?
If those tables are to be loaded atthe same time, then the best way would be for you to craete a signle pipeline and integrate all the dataflows within the same pipeline.
If I need to integrate multiple dataflows with very less data and not dependent on each other in one pipeline how can i implement that? Running them in sequence? Can i run them in parallel? If yes which would be best approach?
All the dataflows are using the same workspace capacity is the major thing to take into consideration.
In case if the DFs are independant, then executing them parallely would the best approach but executing all DFs in parallel might choke up the workspace capacity; so you might have to execute DFs in parallel in batches and not all at the same time
Yes they are within the same workspace. Can you please share the link to see how to implement them in parallel in batches?
Check out the March 2025 Fabric update to learn about new features.
Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.