CONTEXT:
lakehouse (LH) == exists; no tables yet.
pipeline (PL) == exists; create tables in LH for first time, then ingest data incrementally thereafter.
ISSUE:
LG and PL do not sync at the same time. In other words, after a Copy data activity writes data to the LH, this data is not immediately available downstream to other activities in the same PL. Therefore one must use the Wait activity and guess how many seconds to pause the PL leading to a non-optimal computational state of affairs.
Furthermore, since the PL already contains conditional branching for multiple tables in order to distinguish between first time full data ingest and future time incremental ingest, now the user has to use a condition + wait activity after each Copy data activity, thus adding 2 new activities for each branch already in exisitence and thereby easily breaking the max activity allowed threshhold of 80 activities per PL.
Now, the user breaks the PL into multiple PL. But a new problem arises. Various PL variables maintain internal PL state values that need to be available further downstream in the original PL. Now that this original PL is broken apart into 2 or more PLs, how do you pass these values around to other PLs in both an elegant manner and a USER FRIENDLY manner without having to resort to some kludgy feature added as an afterthought to the PL machinery?
Currently 2 PL activities are available: Invoke PL (Legacy) and Invoke PL (Preview).
When trying to pass PL variables or params around when using Invoke PL (Preview), Fabric tell the user to use Invoke PL (Legacy).
What happens when or if in a few years that legacy activity is deprecated? Let's say some users at that time in the future will have dozens of PLs with complex data flows. Now they will have to go back in and modify every single PL that was passing values around into other PLs and thus wasting hours. That scenario is not acceptable and highly suboptimal.
SUGGESTION:
0__How about increasing the max allowed activity limit in a PL from 80 to 512, for ex?
1__How about exposing the LH sync clock as an activity to PLs so PL Wait activity can use that to precisely wait until the LH tells it the data is ready for read access?
2__How about designing Invoke PL (Preview) so it is both elegant and user friendly to pass params or variables around between PLs?
3__How about a menu option to transform the PL JSON mapping to Python code, for ex, so that the user could drop to Visual Code and continue adding/modifying the PL but directly as Python code, and then upload that to Fabric and have Fabric show it visually as usual as a PL with a bunch of activities? Also, letting Copilot analyze the code and provide suggestions and/or improvements (especially valuable for users who cannot afford a Premium SKU)?
... View more