- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Help related to copy acivity in data pipelines
I have different databases say db1, db2, db3 in sql server. I want to load multiple tables (always same table names from each database) to lakehouse.
currently I am writing a copy activity to load a table from one database. I am thinking of using a for loop activity to achieve this. by passing the databases as the array and table names in an array.
how can I achieve this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @nikhil0511
Can you tell me if your problem is solved? If yes, please accept it as solution.
Regards,
Nono Chen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi I like to share different approch with you. I do this by concatenate a sql query using parameter.
So you call the pipline with different parameter for db, table, ... and build the sql query using concat
My Example...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nikhil,
I feel the below steps helpful to you:
1. Create a Loop Activity:
- In your Azure Data Factory pipeline, add a ForEach activity (loop) to iterate over the list of databases (e.g., 'db1', 'db2', 'db3').
2. Define Arrays:
- Create two arrays:
- 'databases': Contains the names of your databases ('db1', 'db2', 'db3').
- 'table_names': Contains the corresponding table names (e.g., 'table1', 'table2', 'table3').
3. Connection Strings:
- Maintain a dictionary (or another suitable data structure) that maps each database name to its connection string.
For example: in python
connection_strings = {
"db1": "connection_string_db1",
"db2": "connection_string_db2",
"db3": "connection_string_db3"
}
4. Loop Logic:
- Inside the ForEach activity, use the current database name ('db_name') to retrieve the corresponding connection string and table name.
- Use these values to create the appropriate data pipeline activity (e.g., Copy Activity) to load the table from the database to the lakehouse.
- You can use the on-premises gateway to connect to the server.
5. Logging and Monitoring:
- Consider adding logging or monitoring activities to track the progress and any potential errors during the loop.
Remember to replace the placeholders with your actual logic for loading the tables.
If you encounter any issues or need further assistance, feel free to ask! 😊
Regards,
Chiranjeevi Kudupudi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Chiranjeevi,
Thank you for your response.
Correct me if I am wrong.
I am creating two arrays names Databases and Tables under Variables section in data pipeline.
I am clueless where to create the python code that uses the connection strings.
I am passing this gateway name as a variable as well named 'gateway'
My data connection normally contains like this : {gateway};{databasename}
I am novice to the data pipelines and it would be good if you can provide me the screenshots how you are applying these steps.
Thank you
Nikhil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nikhil,
I cannot post screenshots.
I can provide the steps to achieve it
1. Connection Strings:
- You mentioned that you have a gateway variable ('gateway') and a database name variable ('databasename').
- To construct your connection string, use Python to combine these variables:
#Coding
import os
# Get the gateway and database names
gateway = os.environ.get('gateway')
database = os.environ.get('databasename')
# Construct your connection string
connection_string = f"{gateway};{database}"
2. Azure Data Factory Pipeline:
- Create an Azure Data Factory pipeline to orchestrate the data movement.
- In your pipeline, use a ForEach activity to iterate over the list of databases (e.g., 'db1', 'db2', 'db3').
3. Copy Activity:
- Inside the ForEach activity, configure a 'Copy Data activity' for each database.
- Set the source dataset to the corresponding table in the database.
- Set the sink dataset to your lakehouse (for example Azure Blob Storage, Azure SQL Data Warehouse, or Delta Lake etc).
4. Dynamic Mapping:
- Since you're dealing with multiple tables, consider using dynamic mapping.
- Define a mapping data flow that dynamically maps columns based on the source table schema.
5. Logging and Monitoring:
- Add logging and monitoring activities to track the progress and any potential errors during the pipeline execution.
6. Resources:
- Refer to the python in data factory documentation -- https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-overview
for detailed information on configuring the Copy Data activity.
Explore the Azure Data Factory for practical examples
https://github.com/Azure/Azure-DataFactory
Remember to adapt these steps to your specific scenario, and feel free to ask if you need further assistance!
Regards,
Chiranjeevi Kudupudi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Chiranjeevi,
Thank you for the response.
Like I said earlier I am novice to Fabric and its artifacts. I would like to know where you are asking me to write the python code.
I am having three variables in Datapipeline itself, two array type variables that each contains database names and tables names and the other variable is gateway string.
Databases = ["dB1", "dB2"]
Tables = ["TB1","TB2"]
Gateway = "SQLServer"
Now in For activity o datapipeline, I am adding a copy activity. This copy activity asks the value. SQLServer;dB1 and database value: db1 and table name : TB1.
similarly for next table the first two values needs to be same and Table name as TB2.
In next iteration, the values should changes as follows : SQLServer;dB2, dB2 and TB1.
I tried to pass these variables using dynamic content. but still I am unable to pass them correctly.
I would like to achieve this.
Thank you
Nikhilesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you trying to pass the gateway name "SQLServer" into the copy activity dynamically as well? Because in Fabric, if you use any dynamic content for the connection, it looks like this:
So unfortunately you can't dynamically connect to multiple SQL connections, only Fabric artifacts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sorry forgot to mention, each database connection string is different as well and we use on premesis gateway to connect to the server.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as I know, you can currently only dynamically select Fabric artifacts as connections. See this idea Microsoft Idea and this thread Copy Data Activity: Unable to set dynamic connection string : r/MicrosoftFabric (reddit.com)
Helpful resources
Join us at the Microsoft Fabric Community Conference
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Microsoft Fabric Community Conference 2025
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
Subject | Author | Posted | |
---|---|---|---|
09-23-2024 07:06 AM | |||
11-05-2024 08:57 AM | |||
12-04-2024 12:38 AM | |||
08-28-2024 03:58 PM | |||
10-15-2024 03:16 PM |
User | Count |
---|---|
6 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
11 | |
6 | |
6 | |
4 | |
4 |