Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified as a Fabric Data Engineer: Check your eligibility for a 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700. Get started

Reply

Fabric Pipeline Activity-How to use API of activities in Notebook Activity

Existing functionality:

AWS: Currently, we have a lambda function(python script) which has many conditions(fetches data from an on prem sql server) which in turn generates
parameters and sends them to a shell script(has sqoop import query) and similar way one more lambda for sqoop export.

Target:

We are planning to achieve the same in Fabric Datapipeline using one of the below 3 options. Please suggest what could be the best one to proceed with and why?



We are thinking of 3 options

1)Using only notebook activity approach(for the existing Sqoop import and Export)and calling copyData API for the data.
python notebook will handle all the conditional statments that lambda was handling earlier and then calls an api with parameters such as (source,target) to copy the data from sql server to adls gen 2.

2)Using notebook activity and datacopy activity approach (for the existing Sqoop import and Export)--Same as point 1 but using notebook activity and datacopy activity from pipeline

3)Azure function activity(for lambda functions) and CopyData activity approach for the existing Sqoop import and Export

7 REPLIES 7

Hi @Anonymous , This is pretty helpful, could you please ellaborate more on cons of API CopyData apart from streamlining. Also we have so many parameters that needs to be passed from Notebook to Copy Data activity(queries also sometimes), would this be feasible with Option1? Any thoughts would be much helpful. Thank you very much.

Anonymous
Not applicable

Hi @AmruthaVarshini ,

Cons of copyData API (compared to Datacopy Activity):

  1. Lower-Level Control: copyData offers more granular control over data transfer, but requires writing more code within the notebook to handle details like serialization, error handling, and progress tracking. Datacopy Activity simplifies these aspects.
  2. Error Handling: With copyData, you'll need to implement custom error handling logic within the notebook to capture and respond to potential issues during data transfer. Datacopy Activity provides built-in error handling and retries for robust data movement.
  3. Monitoring and Logging: Monitoring data transfer progress and logging details can be more cumbersome with copyData. Datacopy Activity offers better integration with Datapipeline's monitoring and logging capabilities for improved visibility.
  4. Security Considerations: When using copyData, ensure proper access control is in place for the API calls to prevent unauthorized data access. Datacopy Activity leverages Datapipeline's security model for secure data transfer.

Feasibility of Option 1 with Many Parameters:

  1. Passing Parameters: Option 1 can handle a large number of parameters from the notebook to the copyData API. You can leverage libraries like requests to build dynamic API calls with parameters passed as arguments or within the request body.
  2. Passing Queries: While feasible, passing complex queries as parameters can become cumbersome and less readable. Consider storing reusable queries in separate files or leveraging configuration management tools for better maintainability.


Overall Recommendation:

 

If your primary concern is simplicity and leveraging built-in functionalities, Option 2 (Notebook with Datacopy Activity) is still preferred. It provides a cleaner abstraction for data transfer with robust error handling and monitoring.

If you require maximum control over the data transfer process and have the resources to handle additional coding for error handling and monitoring, Option 1 (Notebook with copyData API) can be explored.



Additional Thoughts:

 

  • Explore Datapipeline's variable settings for storing frequently used parameters, reducing the number of values passed directly from the notebook.
  • If query complexity is a concern, consider pre-processing or storing reusable queries in separate locations and referencing them within the notebook for cleaner code.


Ultimately, the best choice depends on your specific requirements and priorities. If you have a strong preference for a more low-level approach with copyData, carefully consider the added development and maintenance overhead. 

Hope this helps. Do let me know incase of further queries.

Anonymous
Not applicable

Hi @AmruthaVarshini ,

We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .

Thanks

Anonymous
Not applicable

Hi @AmruthaVarshini ,

We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .

Thanks

Hi @Anonymous , 

 

Thanks a lot for more insights. Currently we are doing a POC for Option 1 and Option 2 to understand which is more affordable and performant for our particular requirement. Would be getting back with few more questions once POC is done. Hope for same supoort from you and team.

Anonymous
Not applicable

Hi @AmruthaVarshini ,

Glad to know that you got some insights over your query. Do let me know incase of further queries.

Anonymous
Not applicable

Hi @AmruthaVarshini ,

Thanks for using Fabric Community.

I would suggest using Option 2: Using notebook activity and datacopy activity approach.

Pros of Option 2:

 

  • Flexibility: Notebooks provide a familiar Python environment for handling complex logic and conditional statements, similar to your existing Lambda functions.
  • Native Integration: Datacopy activity seamlessly integrates with Fabric Datapipeline, allowing efficient data movement from your on-premises SQL Server to ADLS Gen2. This eliminates the need for external shell scripts and simplifies the pipeline.
  • Cost-Effective: Notebooks within Datapipeline might be more cost-effective compared to Azure Functions, especially for simpler data transfer tasks. Functions incur separate execution costs.
  • Maintainability: Code resides within the pipeline, making it easier to manage and version control compared to separate Lambda functions.

 

Drawbacks of Other Options:

 

  • Option 1 (Only Notebook Activity): While feasible, using only notebooks with copyData API might require additional code to handle data transfer logic, making it less streamlined.
  • Option 3 (Azure Function Activity): Azure functions introduce additional complexity and potential cost compared to native datacopy within Datapipeline.

At last, it is completely depends upon your choice on choosing with approach. 

Hope this is helpful. Do let me know incase of further queries.

Helpful resources

Announcements
Feb2025 Sticker Challenge

Join our Community Sticker Challenge 2025

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

JanFabricDE_carousel

Fabric Monthly Update - January 2025

Explore the power of Python Notebooks in Fabric!

JanFabricDW_carousel

Fabric Monthly Update - January 2025

Unlock the latest Fabric Data Warehouse upgrades!