The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hello!
I am trying to combine Excel files in Powerquery that have different columns:
The year will always change, and that bit I was able to somehow capture dynamically based on the column position (current year, next year, etc) as long as the file format was kept the same.
But surprise, this year the file format is changing, so there will be a lot more columns in the new files and of course randomly added amongst other columns.
I wanted to add blank columns in the previous Excel files to make it work, but then I thought that a better solution with PowerQuery might exist...
Is it possible to catch this automatically with PowerQuery? If yes, can you please help me with the thought process?
Thanks!
Kind regards
Valeria
Solved! Go to Solution.
Hello - yes, this is possible. You would just change the portion of the script that removes columns to only remove those that have duplicates, not to remove the empty columns.
BEFORE
// Combine duplicate plan columns and generic columns into one list.
ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
// --------------------------------------------------------
// Clean up the column names.
// --------------------------------------------------------
// Remove specific columns from the source table.
RemoveColumns = Table.RemoveColumns ( Source, ColumnNamesToRemove ),
AFTER
// Combine duplicate plan columns and generic columns into one list.
// ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
// --------------------------------------------------------
// Clean up the column names.
// --------------------------------------------------------
// Remove specific columns from the source table.
RemoveColumns = Table.RemoveColumns ( Source, DuplicatePlanColumns),
Hello - another option would be to normalize the tables without a mapping table prior appending. This solution dynamically removes the years from the column names and places it in two normalized columns: Actuals Year and Forecast Year. I have included explanations for each step in the query script below.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcgQCJR0lQyNjEyCFjAJyEvNAMgZAAKURFJIkjIqNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Project Title" = _t, #"Project Number" = _t, Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, #"2022 PLAN" = _t, #"2022 Budget" = _t, #"Jan Actual Spend (2022)" = _t, #"Feb Actual Spend (2022)" = _t, #"2023 PLAN" = _t, #"Jan Forecast Spend (2023)" = _t, #"Feb Forecast Spend (2023)" = _t]),
// --------------------------------------------------------
// Store some variables
// --------------------------------------------------------
// The column names actually appearing in the source file.
FileColumnNames = Table.ColumnNames ( Source ),
// The year appearing in the Actual Spend columns.
GetYear_Actual = Text.Select (
List.First (
List.Select (
FileColumnNames, each Text.Contains ( _, "Actual Spend", Comparer.OrdinalIgnoreCase )
)
), {"0".."9"}
),
// The year appearing in the Forecast Spend columns.
GetYear_Forecast = Text.Select (
List.First (
List.Select (
FileColumnNames, each Text.Contains ( _, "Forecast Spend", Comparer.OrdinalIgnoreCase )
)
), {"0".."9"}
),
// Get a list of column names including the word 'Plan' (case non-sensitive), excluding the first.
DuplicatePlanColumns = List.RemoveFirstN (
List.Select (
FileColumnNames, each Text.Contains ( _, "Plan", Comparer.OrdinalIgnoreCase )
)
),
// Get a list of column names that start with 'Column' (case non-sensitive).
GenericColumnNames = List.Select (
FileColumnNames, each Text.StartsWith ( _, "Column", Comparer.OrdinalIgnoreCase )
),
// Combine duplicate plan columns and generic columns into one list.
ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
// --------------------------------------------------------
// Clean up the column names.
// --------------------------------------------------------
// Remove specific columns from the source table.
RemoveColumns = Table.RemoveColumns ( Source, ColumnNamesToRemove ),
// Remove numbers (years), parenthesis and leading/trailing spaces.
TransformedFileColumnNames = Table.TransformColumnNames (
RemoveColumns,
each Text.Trim ( Text.Remove ( _, {"0".."9", "(", ")"} ) )
),
// --------------------------------------------------------
// If they don't already exist, add years columns to the table.
// --------------------------------------------------------
AddYear_Actual =
if not Table.HasColumns ( TransformedFileColumnNames, {"Actual Spend"} )
then Table.AddColumn ( TransformedFileColumnNames, "Actual Year", each GetYear_Actual, Int64.Type )
else TransformedFileColumnNames,
AddYear_Forecast =
if not Table.HasColumns ( AddYear_Actual, {"Forecast Spend"} )
then Table.AddColumn ( AddYear_Actual, "Forecast Year", each GetYear_Forecast, Int64.Type )
else AddYear_Actual
in
AddYear_Forecast
You can put it in a function like this:
let
fn = ( Table as table ) =>
let
Source = Table,
// --------------------------------------------------------
// Store some variables
// --------------------------------------------------------
// The column names actually appearing in the source file.
FileColumnNames = Table.ColumnNames ( Source ),
// The year appearing in the Actual Spend columns.
GetYear_Actual = Text.Select (
List.First (
List.Select (
FileColumnNames, each Text.Contains ( _, "Actual Spend", Comparer.OrdinalIgnoreCase )
)
), {"0".."9"}
),
// The year appearing in the Forecast Spend columns.
GetYear_Forecast = Text.Select (
List.First (
List.Select (
FileColumnNames, each Text.Contains ( _, "Forecast Spend", Comparer.OrdinalIgnoreCase )
)
), {"0".."9"}
),
// Get a list of column names including the word 'Plan' (case non-sensitive), excluding the first.
DuplicatePlanColumns = List.RemoveFirstN (
List.Select (
FileColumnNames, each Text.Contains ( _, "Plan", Comparer.OrdinalIgnoreCase )
)
),
// Get a list of column names that start with 'Column' (case non-sensitive).
GenericColumnNames = List.Select (
FileColumnNames, each Text.StartsWith ( _, "Column", Comparer.OrdinalIgnoreCase )
),
// Combine duplicate plan columns and generic columns into one list.
ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
// --------------------------------------------------------
// Clean up the column names.
// --------------------------------------------------------
// Remove specific columns from the source table.
RemoveColumns = Table.RemoveColumns ( Source, ColumnNamesToRemove ),
// Remove numbers (years), parenthesis and leading/trailing spaces.
TransformedFileColumnNames = Table.TransformColumnNames (
RemoveColumns,
each Text.Trim ( Text.Remove ( _, {"0".."9", "(", ")"} ) )
),
// --------------------------------------------------------
// If they don't already exist, add years columns to the table.
// --------------------------------------------------------
AddYear_Actual =
if not Table.HasColumns ( TransformedFileColumnNames, {"Actual Spend"} )
then Table.AddColumn ( TransformedFileColumnNames, "Actual Year", each GetYear_Actual, Int64.Type )
else TransformedFileColumnNames,
AddYear_Forecast =
if not Table.HasColumns ( AddYear_Actual, {"Forecast Spend"} )
then Table.AddColumn ( AddYear_Actual, "Forecast Year", each GetYear_Forecast, Int64.Type )
else AddYear_Actual
in
AddYear_Forecast
in
fn
Snip of Result
@jennratten , thanks! Very interesting approach. However, I would like to include all other columns as well - so practically column 9,10 and 11 belonging to the second file should be there in the final combination of files with null columns for the first file.
Is this at all possible?
Thanks again!
Kind regards
Valeria
Hello - yes, this is possible. You would just change the portion of the script that removes columns to only remove those that have duplicates, not to remove the empty columns.
BEFORE
// Combine duplicate plan columns and generic columns into one list.
ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
// --------------------------------------------------------
// Clean up the column names.
// --------------------------------------------------------
// Remove specific columns from the source table.
RemoveColumns = Table.RemoveColumns ( Source, ColumnNamesToRemove ),
AFTER
// Combine duplicate plan columns and generic columns into one list.
// ColumnNamesToRemove = List.Combine ( { DuplicatePlanColumns, GenericColumnNames } ),
// --------------------------------------------------------
// Clean up the column names.
// --------------------------------------------------------
// Remove specific columns from the source table.
RemoveColumns = Table.RemoveColumns ( Source, DuplicatePlanColumns),
@jennratten thanks! I am still testing and testing, I can't just get it to work yet as I want but I see the logic - so it's just a matter of keeping trying from my side 🙂
Thanks! 🙂
Hello, Valeria
I'd create and manually maintain some kind of mapping table like this
YR2022 == YR2023 == Combined_report
2022 PLAN == 2023 PLAN == This YR PLAN
2023 PLAN == 2024 PLAN == Next YR PLAN
and so on. Then rename columns of your tables in PQ with Table.RenameColumns function. And finally Table.Combine your tables with new column names.
2022_map = List.Zip({map_tbl[YR2022], map_tbl[Combined_report]}),
2023_map = List.Zip({map_tbl[YR2023], map_tbl[Combined_report]}),
renamed_2022 = Table.RenameColumns(tbl_2022, 2022_map),
renamed_2023 = Table.RenameColumns(tbl_2023, 2023_map),
combined_report = Table.Combine(renamed_2022, renamed_2023)
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.