The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I have multiple files with same data structure in a folder and I would like to merge by row but not all files, so I need to filter the files before combine. And each file have more the one sheet, but I want to combine just one specific sheet from each file.
Any suggestion?
Solved! Go to Solution.
Hi @Anonymous ,
Please try to use a scripting or programming approach, such as a Python script, which can handle file and sheet selection and then perform the merge operation. Below are the steps you can follow to achieve this:
1. Identify the Files to Merge:
Make a list of the specific files you want to merge. You can do this manually or write a script to filter files based on a naming convention or a pattern.
2. Install Required Libraries:
If you're using Python, ensure you have libraries like `pandas` and `openpyxl` installed, which are great for handling Excel files.
You can install them using pip:
```bash
pip install pandas openpyxl
```
3. Write the Script:
Write a Python script that does the following:
Loops through the list of files.
Opens the specific sheet you want to merge from each file.
Appends the data from the sheet to a master dataframe.
Here's a basic structure of the script:
```python
import pandas as pd
import os
# List of files to merge
files_to_merge = ['file1.xlsx', 'file2.xlsx', ...]
# The name of the sheet you want to merge
sheet_name = 'Sheet1'
# Empty dataframe to hold merged data
merged_data = pd.DataFrame()
for file in files_to_merge:
# Construct the full file path if the files are not in the same directory as the script
file_path = os.path.join('path_to_directory', file)
# Read the specific sheet from the Excel file
data = pd.read_excel(file_path, sheet_name=sheet_name)
# Append the data to the merged_data dataframe
merged_data = merged_data.append(data, ignore_index=True)
# Save the merged data to a new Excel file
merged_data.to_excel('merged_data.xlsx', index=False)
```
4. Run the Script:
Execute the script to perform the merge. Ensure that you have the correct file paths and sheet names specified.
5. Check the Output:
After running the script, you should have a new Excel file named "merged_data.xlsx" with the combined data from the specified sheets.
After following these steps, you should have successfully merged the specified sheets from the selected files into a single Excel file.
Best regards,
Community Support Team_Binbin Yu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @Anonymous ,
Please try to use a scripting or programming approach, such as a Python script, which can handle file and sheet selection and then perform the merge operation. Below are the steps you can follow to achieve this:
1. Identify the Files to Merge:
Make a list of the specific files you want to merge. You can do this manually or write a script to filter files based on a naming convention or a pattern.
2. Install Required Libraries:
If you're using Python, ensure you have libraries like `pandas` and `openpyxl` installed, which are great for handling Excel files.
You can install them using pip:
```bash
pip install pandas openpyxl
```
3. Write the Script:
Write a Python script that does the following:
Loops through the list of files.
Opens the specific sheet you want to merge from each file.
Appends the data from the sheet to a master dataframe.
Here's a basic structure of the script:
```python
import pandas as pd
import os
# List of files to merge
files_to_merge = ['file1.xlsx', 'file2.xlsx', ...]
# The name of the sheet you want to merge
sheet_name = 'Sheet1'
# Empty dataframe to hold merged data
merged_data = pd.DataFrame()
for file in files_to_merge:
# Construct the full file path if the files are not in the same directory as the script
file_path = os.path.join('path_to_directory', file)
# Read the specific sheet from the Excel file
data = pd.read_excel(file_path, sheet_name=sheet_name)
# Append the data to the merged_data dataframe
merged_data = merged_data.append(data, ignore_index=True)
# Save the merged data to a new Excel file
merged_data.to_excel('merged_data.xlsx', index=False)
```
4. Run the Script:
Execute the script to perform the merge. Ensure that you have the correct file paths and sheet names specified.
5. Check the Output:
After running the script, you should have a new Excel file named "merged_data.xlsx" with the combined data from the specified sheets.
After following these steps, you should have successfully merged the specified sheets from the selected files into a single Excel file.
Best regards,
Community Support Team_Binbin Yu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
@Anonymous You should be able to filter the files before combining them. Also, in your transform sample file, you can just select the sheet that you want. Do you have a programmatic way to identify the files that you want to filter out or keep?