Solved: Import Excel file with changing columns

DouweMeer · ‎11-26-2024

How do you make sure if you import files from a folder, that all columns are kept? Say, first 10 files have column A, B, and C. Then after come 25 with A, B, C, and D, and eventually 5 more with A, B, D, and E.

If my expectation is to have 1 query where the results is 5 columns that dependable on the file's type, the other columns it doesn't have are left blank. Say, outcome is 5 columns, A, B, C, D, and E where with the first template that has A, B, and C, in the end result the column D, and E are left blank. Second template only E is left blank and third template has C left blank.

It also needs to continue to work, if there's a fourth template coming up in the future with column F, the outcome should automatically get 6 columns. If one of the files has the column F misspelled, there are 7 column. If you correct that misspelled column, it returns to 6.

How do you get that done?

ZhangKun · ‎11-26-2024

When we import a folder, several things actually happen:

Find all files
Generate a calculation template based on the user's operation, and only use this template to read the worksheet
Execute the calculation template in the previous step for all files
Expand the worksheet
Set the type and other steps

The problem you mentioned occurs when "expanding the worksheet". When expanding the worksheet (using Table.ExpandTableColumn), you need to provide the column name, whether the user writes code or selects through the UI.

Solution:

Check the steps automatically generated by Power Query and delete the steps after expanding the table column
Write code to collect all column names, and deduplicate and sort the column names (if you need)
Use this list of stored column names to expand the table

The sample code is as follows:

let
    源 = Folder.Files("C:\Users\Black\Desktop\新建文件夹 (3)"),
    筛选的隐藏文件1 = Table.SelectRows(源, each [Attributes]?[Hidden]? <> true),
    调用自定义函数1 = Table.AddColumn(筛选的隐藏文件1, "转换文件", each 转换文件([Content])),
    重命名的列1 = Table.RenameColumns(调用自定义函数1, {"Name", "Source.Name"}),
    删除的其他列1 = Table.SelectColumns(重命名的列1, {"Source.Name", "转换文件"}), 
    columnNameList = List.Distinct(List.Combine(List.Transform(删除的其他列1[转换文件], Table.ColumnNames))), 
    expandColumn = Table.ExpandTableColumn(删除的其他列1, "转换文件", columnNameList, columnNameList)
in
    expandColumn

data:

1.xlsx

A	B	C	D
21	22	23	24

2.xlsx

A	B	C
11	12	13

3.xlsx

A	B	D	E
31	32	33	34

result:

Source.Name	A	B	C	D	E
1.xlsx	21	22	23	24
2.xlsx	11	12	13
3.xlsx	31	32		33	34

View solution in original post

wdx223_Daniel · ‎11-26-2024

try to use the second argument of Table.Combine

check and amend your step of combining the tables.

=Table.Combine(YourTableList,{"A","B","C","D","E"})

DouweMeer · ‎11-27-2024

Perhaps you misunderstand? Or I misunderstand what you propose.

I'm not seeing the step you propose:

Like, I'm referring to this thingy:

ZhangKun · ‎11-26-2024

When we import a folder, several things actually happen:

Find all files
Generate a calculation template based on the user's operation, and only use this template to read the worksheet
Execute the calculation template in the previous step for all files
Expand the worksheet
Set the type and other steps

The problem you mentioned occurs when "expanding the worksheet". When expanding the worksheet (using Table.ExpandTableColumn), you need to provide the column name, whether the user writes code or selects through the UI.

Solution:

Check the steps automatically generated by Power Query and delete the steps after expanding the table column
Write code to collect all column names, and deduplicate and sort the column names (if you need)
Use this list of stored column names to expand the table

The sample code is as follows:

let
    源 = Folder.Files("C:\Users\Black\Desktop\新建文件夹 (3)"),
    筛选的隐藏文件1 = Table.SelectRows(源, each [Attributes]?[Hidden]? <> true),
    调用自定义函数1 = Table.AddColumn(筛选的隐藏文件1, "转换文件", each 转换文件([Content])),
    重命名的列1 = Table.RenameColumns(调用自定义函数1, {"Name", "Source.Name"}),
    删除的其他列1 = Table.SelectColumns(重命名的列1, {"Source.Name", "转换文件"}), 
    columnNameList = List.Distinct(List.Combine(List.Transform(删除的其他列1[转换文件], Table.ColumnNames))), 
    expandColumn = Table.ExpandTableColumn(删除的其他列1, "转换文件", columnNameList, columnNameList)
in
    expandColumn

data:

1.xlsx

A	B	C	D
21	22	23	24

2.xlsx

A	B	C
11	12	13

3.xlsx

A	B	D	E
31	32	33	34

result:

Source.Name	A	B	C	D	E
1.xlsx	21	22	23	24
2.xlsx	11	12	13
3.xlsx	31	32		33	34

DouweMeer · ‎11-27-2024

Tried to replicate your Chinese onto my English, but I tested it by having an extra column in one of the files with a newly added column, it doesn't pop-up :(.

Am surprised by what it keeps in English, and what it translates to Chinese :).

Edit: It does work! The modification I pushed onto the source file wasn't saved to the server apparently. Silly synchronization issue.

FarhanJeelani · ‎11-26-2024

Hi @DouweMeer ,

To achieve this in Power BI using Power Query, where the final output dynamically adjusts to changing columns across multiple Excel files in a folder, follow these steps:

Step-by-Step Process

Place Files in a Folder:
Store all Excel files in a single folder. Ensure they have consistent data structures (e.g., headers in the first row). This folder will serve as the data source.
Connect Power BI to the Folder:
- Open Power BI Desktop.
- Go to Home > Get Data > Folder.
- Navigate to the folder containing your files and click OK.
- Power BI will create a query listing the files in the folder.
Combine Files:
- In the query editor, click Combine Files. Power BI will use the first file as a sample and combine all files in the folder.
- Choose the sheet or table from the first file when prompted.
Promote Headers and Handle Missing Columns:
- In the Power Query editor, Power BI will attempt to standardize the columns across all files.
- Any columns missing in some files will be filled with null values automatically.
- To ensure consistency, click on Transform > Use First Row as Headers to promote the first row as headers.
Add a Step to Handle Future Columns:
- Go to the Advanced Editor in Power Query (on the Home tab).
- Locate the step where column names are hardcoded (e.g., Table.SelectColumns or Table.TransformColumnTypes).
- Modify or remove this step to ensure Power Query dynamically adapts to new columns. Specifically:
  - Remove any Table.SelectColumns step, which restricts the output to specific columns.
  - Replace it with a step that promotes all available columns dynamically:
```
Table.PromoteHeaders(#"Previous Step", [PromoteAllScalars=true])
```
Rename or Standardize Column Names (Optional):
If column names may be misspelled, consider applying transformations to standardize names:
- Use Transform > Replace Values to correct common misspellings.
- Add a custom step to clean column names dynamically:
```
Table.TransformColumnNames(#"Previous Step", Text.Proper)
```
Close and Load:
- Click Close & Apply to load the transformed data into Power BI.
- Power BI will now combine all files, with columns that are missing in some files left blank (null), and dynamically adapt to new columns in future files.

Dynamic Handling of Future Changes

When new files with additional columns (e.g., F) are added to the folder, Power Query will automatically include them in the output as long as they exist in the header row of at least one file.
If a column is misspelled in a file, it will appear as a new column in the output. Correcting the spelling in the source file will merge the data back into the appropriate column upon refresh.

Testing and Validation

Add files with varying column structures to the folder and refresh the query in Power BI.
Verify that all columns (existing and new) appear correctly, with missing values represented as null.

This setup ensures a flexible and scalable solution for combining Excel files with changing column structures, adapting to both new columns and corrected column names automatically.

Please mark this as solution if it helps. Appreciate Kudos.

DouweMeer · ‎11-27-2024

Believe yours doesn't work, I tried to work off your suggestion.

I believe the reason it would fail is this part:

If you choose the template from the first file, future files that do not contain the headers from the sample file will be automatically removed. Sure, the files will be processed, unless it is missing columns that are in the sample file.

If you'd select the last file, if the first file misses out of a column, say F, it will hit an error in the process.

I accepted another reply as a solution. I'm happy to review your proposal and accept it as well as a solution (if it allows me), but right now I don't see it solving my challenge.

Import Excel file with changing columns

Step-by-Step Process

Dynamic Handling of Future Changes

Testing and Validation

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - February 2025

Join our Community Sticker Challenge 2025

Fabric Community Update - February 2025

Join us at the 2025 Microsoft Fabric Community Conference

Import Excel file with changing columns

Step-by-Step Process

Dynamic Handling of Future Changes

Testing and Validation

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - February 2025

Join our Community Sticker Challenge 2025

Fabric Community Update - February 2025