Microsoft Fabric Community Conference 2025, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount.
Register nowGet certified as a Fabric Data Engineer: Check your eligibility for a 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700. Get started
Existing functionality:
AWS: Currently, we have a lambda function(python script) which has many conditions(fetches data from an on prem sql server) which in turn generates
parameters and sends them to a shell script(has sqoop import query) and similar way one more lambda for sqoop export.
Target:
We are planning to achieve the same in Fabric Datapipeline using one of the below 3 options. Please suggest what could be the best one to proceed with and why?
We are thinking of 3 options
1)Using only notebook activity approach(for the existing Sqoop import and Export)and calling copyData API for the data.
python notebook will handle all the conditional statments that lambda was handling earlier and then calls an api with parameters such as (source,target) to copy the data from sql server to adls gen 2.
2)Using notebook activity and datacopy activity approach (for the existing Sqoop import and Export)--Same as point 1 but using notebook activity and datacopy activity from pipeline
3)Azure function activity(for lambda functions) and CopyData activity approach for the existing Sqoop import and Export
Hi @Anonymous , This is pretty helpful, could you please ellaborate more on cons of API CopyData apart from streamlining. Also we have so many parameters that needs to be passed from Notebook to Copy Data activity(queries also sometimes), would this be feasible with Option1? Any thoughts would be much helpful. Thank you very much.
Hi @AmruthaVarshini ,
Cons of copyData API (compared to Datacopy Activity):
Feasibility of Option 1 with Many Parameters:
Overall Recommendation:
If your primary concern is simplicity and leveraging built-in functionalities, Option 2 (Notebook with Datacopy Activity) is still preferred. It provides a cleaner abstraction for data transfer with robust error handling and monitoring.
If you require maximum control over the data transfer process and have the resources to handle additional coding for error handling and monitoring, Option 1 (Notebook with copyData API) can be explored.
Additional Thoughts:
Ultimately, the best choice depends on your specific requirements and priorities. If you have a strong preference for a more low-level approach with copyData, carefully consider the added development and maintenance overhead.
Hope this helps. Do let me know incase of further queries.
Hi @AmruthaVarshini ,
We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .
Thanks
Hi @AmruthaVarshini ,
We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .
Thanks
Hi @Anonymous ,
Thanks a lot for more insights. Currently we are doing a POC for Option 1 and Option 2 to understand which is more affordable and performant for our particular requirement. Would be getting back with few more questions once POC is done. Hope for same supoort from you and team.
Hi @AmruthaVarshini ,
Glad to know that you got some insights over your query. Do let me know incase of further queries.
Hi @AmruthaVarshini ,
Thanks for using Fabric Community.
I would suggest using Option 2: Using notebook activity and datacopy activity approach.
Pros of Option 2:
Drawbacks of Other Options:
At last, it is completely depends upon your choice on choosing with approach.
Hope this is helpful. Do let me know incase of further queries.