Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi everybody,
I need help to finish my import of Excel files from a folder. My inspiration for this is the YouTube video from Goodly (Chandeep), that is importing data without creating a sample file.
This works fine for me, when I just need to import all files in folder, but in this case, I create a column with a date that is based on the name of the file.
But I need help to do a Table.Combine to expand all the files in the correct way, but to keep either the date column or the column that has the filename on all rows after combining data, when I do the Table.combine.
I can not use the Table.ExpandTableColumn as the headers for the different columns can and will vary over time. The Table.Combine handles this, but I can not get this extra column called "Date" to be included on all rows. Please Help
I can send you sample files of what I want to import (I can't find where to upload), but basically it is an export from a booking system, where I will skip the first 5 rows and the result is data formatted as a pivot table that I will convert into records afterwards (that part works fine).
I hope the community can help.
Best regards
Jørgen
let
/* Here is an alternative way to import a folder and all the filtered subfolders
Using the M Script instead of the User Interface provided in Power Query gives you a lot more options
This Script is created by Jørgen Wulff Rasmussen, Zealand Data ApS
Contact jwr@zealand-data.com phone +45 23732009
Inspired by the YouTube video "Combine Data from Multiple Excel Files | Most Dynamic Method (Dynamic Columns & Sheets)" fra Goodly (Chandeep) */
// Step 1: Source - Load files from specified folder
Source = Folder.Files(FolderName),
// Lowercase transformation for consistency
LowerCase_All_TextColumns = Table.TransformColumns(Source,{{"Name", Text.Lower, type text}, {"Extension", Text.Lower, type text}, {"Folder Path", Text.Lower, type text}}),
// Filter files that include 'balance' in the name
Filter_NameColumn_To_Files_Named_Sales = Table.SelectRows(LowerCase_All_TextColumns, each Text.StartsWith([Name], "product")),
// Step 2: Custom Function to process each Excel file
ProcessWorkbook = (excelFile as binary) =>
let
// Load the workbook, preserve existing headers for identification
ExcelContent = Excel.Workbook(excelFile, true),
// Apply transformations to each sheet in the workbook
TransformedSheets = Table.TransformColumns(ExcelContent, {"Data", each
let
// Skip the first row
SkippedFirstRow = Table.Skip(_, 3),
// Promote the next row as headers
PromotedHeaders = Table.PromoteHeaders(SkippedFirstRow, [PromoteAllScalars=true]),
// Remove columns that contain only null values
ColumnsToRemove = List.Select(Table.ColumnNames(PromotedHeaders), each List.NonNullCount(Table.Column(PromotedHeaders, _)) = 0),
CleanedTable = Table.RemoveColumns(PromotedHeaders, ColumnsToRemove)
in
CleanedTable
}),
// Combine data from all sheets
CombinedSheets = Table.Combine(TransformedSheets[Data])
in
CombinedSheets,
// Step 3: Transform the 'Content' column using the custom function
Transform_Content_Column_To_Table = Table.TransformColumns(
Filter_NameColumn_To_Files_Named_Sales,
{"Content", each ProcessWorkbook(_)}
),
#"Added Custom" = Table.AddColumn(Transform_Content_Column_To_Table, "Date", each Date.FromText("01-" & Text.Middle([Name],8,2) & "-" & Text.Middle([Name],11,2))),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Content", "Date"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns",{"Date", "Content"}),
#"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns",{{"Date", type date}})
in
#"Changed Type"
I can not use the Table.ExpandTableColumn as the headers for the different columns can and will vary over time.
Red flag. You will have endless problems trying to refresh this in the service. Use Table.SelectColumns (rather than Table.RemoveColumns) to ignore newly added columns and keep your Power Query output meta data stable. If you need these new columns then you need to factor in the manual maintenance effort.
The script you listed seems a bit excessive. Should not be needed for at least vagualy similar Excel files.
If you like more assistance please post a handful of the Excel file samples and indicate expected outcome.
Sorry for the late reply to this, but I actually ended up finding the solution myself with a little help from youtube and ChatGPT.
I hope that this can be used by others as well, since I can find a lot of use for it among my customers.
The solution was as follows:
Creating 2 custom functions that will be included in the end of this document and then the query, that is below:
let
/* Here is an alternative way to import a folder and all the filtered subfolders
Using the M Script instead of the User Interface provided in Power Query gives you a lot more options
This Script is created by Jørgen Wulff Rasmussen, Zealand Data ApS
Contact jwr@zealand-data.com phone +45 23732009
Inspired by the YouTube video "Combine Data from Multiple Excel Files | Most Dynamic Method (Dynamic Columns & Sheets)" fra Goodly (Chandeep) */
// Step 1: Source - Load files from specified folder
Source = Folder.Files(FolderName),
// Lowercase transformation for consistency
LowerCase_All_TextColumns = Table.TransformColumns(Source,{{"Name", Text.Lower, type text}, {"Extension", Text.Lower, type text}, {"Folder Path", Text.Lower, type text}}),
// Filter files that include 'balance' in the name
Filter_NameColumn_To_Files_Named_Sales = Table.SelectRows(LowerCase_All_TextColumns, each Text.StartsWith([Name], "product")),
// Step 2: Custom Function to process each Excel file
ProcessWorkbook = (excelFile as binary) =>
let
// Load the workbook, preserve existing headers for identification
ExcelContent = Excel.Workbook(excelFile, true),
// Apply transformations to each sheet in the workbook
TransformedSheets = Table.TransformColumns(ExcelContent, {"Data", each
let
// Skip the first row
SkippedFirstRow = Table.Skip(_, 3),
// Promote the next row as headers
PromotedHeaders = Table.PromoteHeaders(SkippedFirstRow, [PromoteAllScalars=true]),
// Remove columns that contain only null values
ColumnsToRemove = List.Select(Table.ColumnNames(PromotedHeaders), each List.NonNullCount(Table.Column(PromotedHeaders, _)) = 0),
CleanedTable = Table.RemoveColumns(PromotedHeaders, ColumnsToRemove)
in
CleanedTable
}),
// Combine data from all sheets
CombinedSheets = Table.Combine(TransformedSheets[Data])
in
CombinedSheets,
// Step 3: Transform the 'Content' column using the custom function
Transform_Content_Column_To_Table = Table.TransformColumns(
Filter_NameColumn_To_Files_Named_Sales,
{"Content", each ProcessWorkbook(_)}
),
AddedDateColumnBasedOnFileName = Table.AddColumn(Transform_Content_Column_To_Table, "Date", each Date.FromText("01-" & Text.Middle([Name],8,2) & "-" & Text.Middle([Name],11,2))),
#"Removed Meta Data Columns" = Table.SelectColumns(AddedDateColumnBasedOnFileName,{"Content", "Date"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Meta Data Columns",{"Date", "Content"}),
ChangedTypeForDateColumn = Table.TransformColumnTypes(#"Reordered Columns",{{"Date", type date}}),
#"Invoked Custom Function" = Table.AddColumn(ChangedTypeForDateColumn, "Filecontent", each fnGetFilDato([Date], [Content])),
// Step 4: Promote headers for all tables in the "Filecontent" column
PromotedHeaders = Table.TransformColumns(#"Invoked Custom Function", {"Filecontent", each Table.PromoteHeaders(_, [PromoteAllScalars=true])}),
// Step 5: Dynamically rename the last column of each table to "Date"
RenamedDateColumn = Table.TransformColumns(PromotedHeaders, {"Filecontent", each Table.RenameColumns(_, {{Table.ColumnNames(_) { List.Count(Table.ColumnNames(_)) - 1 }, "Date"}})}),
// Step 6: Dynamically rename the first column of each table to "Product"
RenamedProductColumn = Table.TransformColumns(RenamedDateColumn, {"Filecontent", each Table.RenameColumns(_, {{Table.ColumnNames(_) {0}, "Product"}})}),
// Step 7: Remove the original "Content" column
CleanedColumns = Table.RemoveColumns(RenamedProductColumn, {"Date", "Content"}),
ShowData = Table.Combine(CleanedColumns[Filecontent]),
// Step 8: Data is now imported and the rest of the code is preparing data in an unpivoted format and with the correct columns in the table
#"Filtered Rows" = Table.SelectRows(ShowData, each ([Product] <> null)),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Date", "Product"}, "Attribute", "Value"),
#"Filtered Rows1" = Table.SelectRows(#"Unpivoted Other Columns", each ([Attribute] <> "Total") and ([Value] <> "0")),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows1",{{"Attribute", "SalesPerson"}, {"Value", "SalesAmount"}}),
#"Changed Type" =
Table.TransformColumnTypes(
#"Renamed Columns",
{
{"Product", type text} ,
{"Date", type date},
{"SalesPerson", type text},
{"SalesAmount", Int64.Type}
}
),
#"Added Column with first 2 Characters from Country" = Table.AddColumn(#"Changed Type", "Country", each Text.Start([Product], 2), type text),
#"Replaced Country for Codes to Italy" = Table.ReplaceValue(#"Added Column with first 2 Characters from Country",each [Country],each if Text.StartsWith([Product], "HITA") or Text.StartsWith([Product], "HSICI") or Text.StartsWith([Product], "TITAL") then "IT" else [Country],Replacer.ReplaceValue,{"Country"}),
#"Replaced Country for Antarctic to Argentine" = Table.ReplaceValue(#"Replaced Country for Codes to Italy",each [Country],each if Text.StartsWith([Product], "AQ") then "AR" else [Country],Replacer.ReplaceValue,{"Country"}),
#"Replaced Country Codes for Non Travel Products" = Table.ReplaceValue(#"Replaced Country for Antarctic to Argentine",each [Country],each if Text.StartsWith([Product], "MOMS") or Text.StartsWith([Product], "IVODK") or Text.StartsWith([Product], "ICAN") or Text.StartsWith([Product], "IAREU")or Text.StartsWith([Product], "HEUR") or Text.StartsWith([Product], "CHOLI") or Text.StartsWith([Product], "NPDUN") then "" else [Country],Replacer.ReplaceValue,{"Country"}),
#"Replaced Country Codes for Non Travel Products Step 2" = Table.ReplaceValue(#"Replaced Country Codes for Non Travel Products",each [Country], each if [Country] = "EB" or [Country] = "EX" or [Country] = "IA" or [Country] = "IC" or [Country] = "IO" or [Country] = "IX" or [Country] = "OG" or [Country] = "OR" or [Country] = "GA" or [Country] = "UD" then "" else [Country],Replacer.ReplaceValue,{"Country"}),
#"Changed Country Type to Text" = Table.TransformColumnTypes(#"Replaced Country Codes for Non Travel Products Step 2",{{"Country", type text}})
in
#"Changed Country Type to Text"
And the first function called ProcessWorkbook
let
ProcessWorkbook = (excelFile as binary) =>
let
// Load the workbook content
ExcelContent = Excel.Workbook(excelFile, true),
// Transform each sheet
TransformedSheets = Table.TransformColumns(ExcelContent, {"Data", each Table.PromoteHeaders(Table.Skip(_, 1), [PromoteAllScalars=true])}),
// Select only the data column from the transformed sheets
DataOnly = TransformedSheets[Data]
in
DataOnly
in
ProcessWorkbook
And the second function called fnGetFilDato
(Fildato as date, Producttable as table )=>
let
#"Added Custom" = Table.AddColumn(Producttable, "Fildato", each Fildato)
in
#"Added Custom"