Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
ValeriaBreve
Post Patron
Post Patron

combining Excel files with dfferent columns

Hello!

I am trying to combine Excel files in Powerquery that have different columns:

The year will always change, and that bit I was able to somehow capture dynamically based on the column position (current year, next year, etc) as long as the file format was kept the same.

 

But surprise, this year the file format is changing, so there will be a lot more columns in the new files and of course randomly added amongst other columns.

 

ValeriaBreve_0-1679047661295.png

 

I wanted to add blank columns in the previous Excel files to make it work, but then I thought that a better solution with PowerQuery might exist... 

Is it possible to catch this automatically with PowerQuery? If yes, can you please help me with the thought process?

 

Thanks!

Kind regards

Valeria

 

1 ACCEPTED SOLUTION

Hello - yes, this is possible.  You would just change the portion of the script that removes columns to only remove those that have duplicates, not to remove the empty columns.

BEFORE

    // Combine duplicate plan columns and generic columns into one list.
    ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
    // --------------------------------------------------------
    // Clean up the column names. 
    // --------------------------------------------------------
    // Remove specific columns from the source table.
    RemoveColumns = Table.RemoveColumns ( Source, ColumnNamesToRemove ),

AFTER

    // Combine duplicate plan columns and generic columns into one list.
    // ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
    // --------------------------------------------------------
    // Clean up the column names. 
    // --------------------------------------------------------
    // Remove specific columns from the source table.
    RemoveColumns = Table.RemoveColumns ( Source, DuplicatePlanColumns),

View solution in original post

5 REPLIES 5
jennratten
Super User
Super User

Hello - another option would be to normalize the tables without a mapping table prior appending.  This solution dynamically removes the years from the column names and places it in two normalized columns: Actuals Year and Forecast Year.  I have included explanations for each step in the query script below.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcgQCJR0lQyNjEyCFjAJyEvNAMgZAAKURFJIkjIqNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Project Title" = _t, #"Project Number" = _t, Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, #"2022 PLAN" = _t, #"2022 Budget" = _t, #"Jan Actual Spend (2022)" = _t, #"Feb Actual Spend (2022)" = _t, #"2023 PLAN" = _t, #"Jan Forecast Spend (2023)" = _t, #"Feb Forecast Spend (2023)" = _t]),
    // --------------------------------------------------------
    // Store some variables 
    // --------------------------------------------------------
    // The column names actually appearing in the source file.
    FileColumnNames = Table.ColumnNames ( Source ),
    // The year appearing in the Actual Spend columns.
    GetYear_Actual = Text.Select (
        List.First (
            List.Select (
                FileColumnNames, each Text.Contains ( _, "Actual Spend", Comparer.OrdinalIgnoreCase ) 
            ) 
        ), {"0".."9"}
    ),
    // The year appearing in the Forecast Spend columns.
    GetYear_Forecast = Text.Select (
        List.First (
            List.Select (
                FileColumnNames, each Text.Contains ( _, "Forecast Spend", Comparer.OrdinalIgnoreCase ) 
            ) 
        ), {"0".."9"}
    ),
    // Get a list of column names including the word 'Plan' (case non-sensitive), excluding the first.
    DuplicatePlanColumns = List.RemoveFirstN ( 
        List.Select ( 
            FileColumnNames, each Text.Contains ( _, "Plan", Comparer.OrdinalIgnoreCase ) 
        )
    ),
    // Get a list of column names that start with 'Column' (case non-sensitive).
    GenericColumnNames = List.Select ( 
        FileColumnNames, each Text.StartsWith ( _, "Column", Comparer.OrdinalIgnoreCase ) 
    ),
    // Combine duplicate plan columns and generic columns into one list.
    ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
    // --------------------------------------------------------
    // Clean up the column names. 
    // --------------------------------------------------------
    // Remove specific columns from the source table.
    RemoveColumns = Table.RemoveColumns ( Source, ColumnNamesToRemove ),
    // Remove numbers (years), parenthesis and leading/trailing spaces.
    TransformedFileColumnNames = Table.TransformColumnNames (
        RemoveColumns,
        each Text.Trim ( Text.Remove ( _, {"0".."9", "(", ")"} ) )
    ),
    // --------------------------------------------------------
    // If they don't already exist, add years columns to the table. 
    // --------------------------------------------------------
    AddYear_Actual = 
        if not Table.HasColumns ( TransformedFileColumnNames, {"Actual Spend"} ) 
        then Table.AddColumn ( TransformedFileColumnNames, "Actual Year", each GetYear_Actual, Int64.Type )
        else TransformedFileColumnNames,
    AddYear_Forecast = 
        if not Table.HasColumns ( AddYear_Actual, {"Forecast Spend"} ) 
        then Table.AddColumn ( AddYear_Actual, "Forecast Year", each GetYear_Forecast, Int64.Type )
        else AddYear_Actual
in
    AddYear_Forecast

You can put it in a function like this:

let
    fn = ( Table as table ) =>
let
    Source = Table,
    // --------------------------------------------------------
    // Store some variables 
    // --------------------------------------------------------
    // The column names actually appearing in the source file.
    FileColumnNames = Table.ColumnNames ( Source ),
    // The year appearing in the Actual Spend columns.
    GetYear_Actual = Text.Select (
        List.First (
            List.Select (
                FileColumnNames, each Text.Contains ( _, "Actual Spend", Comparer.OrdinalIgnoreCase ) 
            ) 
        ), {"0".."9"}
    ),
    // The year appearing in the Forecast Spend columns.
    GetYear_Forecast = Text.Select (
        List.First (
            List.Select (
                FileColumnNames, each Text.Contains ( _, "Forecast Spend", Comparer.OrdinalIgnoreCase ) 
            ) 
        ), {"0".."9"}
    ),
    // Get a list of column names including the word 'Plan' (case non-sensitive), excluding the first.
    DuplicatePlanColumns = List.RemoveFirstN ( 
        List.Select ( 
            FileColumnNames, each Text.Contains ( _, "Plan", Comparer.OrdinalIgnoreCase ) 
        )
    ),
    // Get a list of column names that start with 'Column' (case non-sensitive).
    GenericColumnNames = List.Select ( 
        FileColumnNames, each Text.StartsWith ( _, "Column", Comparer.OrdinalIgnoreCase ) 
    ),
    // Combine duplicate plan columns and generic columns into one list.
    ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
    // --------------------------------------------------------
    // Clean up the column names. 
    // --------------------------------------------------------
    // Remove specific columns from the source table.
    RemoveColumns = Table.RemoveColumns ( Source, ColumnNamesToRemove ),
    // Remove numbers (years), parenthesis and leading/trailing spaces.
    TransformedFileColumnNames = Table.TransformColumnNames (
        RemoveColumns,
        each Text.Trim ( Text.Remove ( _, {"0".."9", "(", ")"} ) )
    ),
    // --------------------------------------------------------
    // If they don't already exist, add years columns to the table. 
    // --------------------------------------------------------
    AddYear_Actual = 
        if not Table.HasColumns ( TransformedFileColumnNames, {"Actual Spend"} ) 
        then Table.AddColumn ( TransformedFileColumnNames, "Actual Year", each GetYear_Actual, Int64.Type )
        else TransformedFileColumnNames,
    AddYear_Forecast = 
        if not Table.HasColumns ( AddYear_Actual, {"Forecast Spend"} ) 
        then Table.AddColumn ( AddYear_Actual, "Forecast Year", each GetYear_Forecast, Int64.Type )
        else AddYear_Actual
in
    AddYear_Forecast
in
    fn

 

Snip of Result

jennratten_0-1679242931981.png

 

@jennratten , thanks! Very interesting approach. However, I would like to include all other columns as well - so practically column 9,10 and 11 belonging to the second file should be there in the final combination of files with null columns for the first file.

Is this at all possible?

Thanks again!

Kind regards

Valeria

 

 

Hello - yes, this is possible.  You would just change the portion of the script that removes columns to only remove those that have duplicates, not to remove the empty columns.

BEFORE

    // Combine duplicate plan columns and generic columns into one list.
    ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
    // --------------------------------------------------------
    // Clean up the column names. 
    // --------------------------------------------------------
    // Remove specific columns from the source table.
    RemoveColumns = Table.RemoveColumns ( Source, ColumnNamesToRemove ),

AFTER

    // Combine duplicate plan columns and generic columns into one list.
    // ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
    // --------------------------------------------------------
    // Clean up the column names. 
    // --------------------------------------------------------
    // Remove specific columns from the source table.
    RemoveColumns = Table.RemoveColumns ( Source, DuplicatePlanColumns),

@jennratten thanks! I am still testing and testing, I can't just get it to work yet as I want but I see the logic - so it's just a matter of keeping trying from my side 🙂

Thanks! 🙂

 

AlienSx
Super User
Super User

Hello, Valeria

I'd create and manually maintain some kind of mapping table like this

YR2022 == YR2023 == Combined_report

2022 PLAN == 2023 PLAN == This YR PLAN

2023 PLAN == 2024 PLAN == Next YR PLAN

and so on. Then rename columns of your tables in PQ with Table.RenameColumns function. And finally Table.Combine your tables with new column names. 

 

2022_map = List.Zip({map_tbl[YR2022], map_tbl[Combined_report]}),
2023_map = List.Zip({map_tbl[YR2023], map_tbl[Combined_report]}),
renamed_2022 = Table.RenameColumns(tbl_2022, 2022_map),
renamed_2023 = Table.RenameColumns(tbl_2023, 2023_map),
combined_report = Table.Combine(renamed_2022, renamed_2023)

 

 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors