Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Prepping for a Fabric certification exam? Join us for a live prep session with exam experts to learn how to pass the exam. Register now.

lbendlin

Streamlined process for appending multiple files with similar structure in a folder

Streamlined process for appending multiple files with similar structure in a folder


Let’s assume you have multiple files with similar or identical structure that you need to combine into a single table. These files can be on a local folder or (preferably) on a SharePoint or OneDrive (which eliminates the need for a gateway)

lbendlin_0-1746118999610.png

The standard approach would be to use the “Combine Files” option of the “Content” column in Power Query.

lbendlin_1-1746119037665.png

This works relatively well but comes with some unintended side effects. It makes assumptions on your file origin encoding, and on the column types – based on a sample file (usually the first one) and the first 200 rows.

lbendlin_2-1746119063396.png

It then produces multiple folders and artifacts that can quickly clutter up your query list.

lbendlin_3-1746119084703.png

As you can see it also messed up the column headers for the sample files. The Power Query code is less than optimal. Invoking a custom function outside of our partition can later get us in trouble with the formula firewall, for example.

let
Source = Folder.Files("C:\Users\bendlin\Documents\Power BI"),
#"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([Name], "data") and [Extension]=".txt"),
#"Filtered Hidden Files1" = Table.SelectRows(#"Filtered Rows", each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File")))
in
#"Expanded Table Column1"

Since you know the structure of these files better than Power Query it makes sense explore alternative options that achieve the same goal but with less clutter.
First step is to pick one of the files for the basic transforms. The first file is usually the oldest, so it may make sense to pick the last file instead.

Clicking on the Binary link will recognize the file type (in this case CSV) and start the basic ingestion

lbendlin_4-1746119145779.png

Let’s fix a couple of things. Encoding is UTF-8, not ANSI. Number of columns isn’t really needed (and it would prevent the Power Query code from handling scenarios when columns are added to the source files). Lastly, QuoteStyle should always be set to ignore quoted line breaks.
Some of these settings can be done through the dialog

lbendlin_5-1746119172407.png

But the column count needs to be removed manually.

lbendlin_6-1746119200288.png

Much better. The final transform is to elevate the first row to header status.

lbendlin_7-1746119224896.png

Note that all column types are listed as text (since this is a CSV source). We leave that alone for now.
Here is the code that was produced for us. Note that we sampled the third file.

let
Source = Folder.Files("C:\Users\bendlin\Documents\Power BI"),
#"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([Name], "data") and [Extension]=".txt"),
#"C:\Users\bendlin\Documents\Power BI\_data3 txt" = #"Filtered Rows"{[#"Folder Path"="C:\Users\bendlin\Documents\Power BI\",Name="data3.txt"]}[Content],
#"Imported CSV" = Csv.Document(#"C:\Users\bendlin\Documents\Power BI\_data3 txt",[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.Csv]),
#"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true])
in
#"Promoted Headers"

Power Query code is usually written such that a line of code is referencing the previous line. We can replace these references with the actual code, like so:

let
Source = Folder.Files("C:\Users\bendlin\Documents\Power BI"),
#"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([Name], "data") and [Extension]=".txt"),
#"C:\Users\bendlin\Documents\Power BI\_data3 txt" = #"Filtered Rows"{[#"Folder Path"="C:\Users\bendlin\Documents\Power BI\",Name="data3.txt"]}[Content],
#"Promoted Headers" = Table.PromoteHeaders(Csv.Document(#"C:\Users\bendlin\Documents\Power BI\_data3 txt",[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.Csv]), [PromoteAllScalars=true])
in
#"Promoted Headers"

The #”Imported CSV” step is now part of the #“Promoted Headers”step. We can go one further and replace the previous reference too

= Table.PromoteHeaders(Csv.Document([Content],[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.Csv]), [PromoteAllScalars=true])

This is now the basis for our new approach. We remove all steps after the “Filtered Rows” step and then select only the columns we need – the [Name] column and the [Content] column.
(Pro tip: remove steps from the bottom up)

lbendlin_8-1746119330623.png

lbendlin_9-1746119340217.png

Then we add a custom column and paste the ingestion code.

lbendlin_10-1746119372953.png

We end up with a column that contains all the data from our CSV files, with the correct encoding and headers. You can verify that by clicking to the right of one of the “Table” links. This will produce a preview of the contents of that table object.

lbendlin_11-1746119390140.png

We don’t need the [Content] column any more.
(Pro tip: More advanced users can use a replacer function to replace the [Content] binary with the transformation instead of adding a new column)

lbendlin_12-1746119414258.png

Next we expand the [Custom] column to rows which automagically appends all the tables. Note that this will work even if these tables don’t have the exact same structure.

lbendlin_13-1746119439098.png

The “List may be incomplete” warning can be cleared with the “Load more” link if you are observing missing columns.

This looks great – the last few steps are to assign the column types and to remove the now unnecessary groups and artifacts

Here is our final version of the code:

let
Source = Folder.Files("C:\Users\bendlin\Documents\Power BI"),
#"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([Name], "data") and [Extension]=".txt"),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content", "Name"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each Table.PromoteHeaders(Csv.Document([Content],[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.Csv]), [PromoteAllScalars=true])),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom",{"Name", "Custom"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns1", "Custom", {"Timestamp", "Temperature", "Battery"}, {"Timestamp", "Temperature", "Battery"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Custom",{{"Timestamp", type datetime}, {"Temperature", type number}, {"Battery", Int64.Type}})
in
#"Changed Type"

(Pro tip: The Table.ExpandTableColumn step can be streamlined further by referencing the column names of one of the “Table” objects from the previous step, instead of specifying all column names..)

#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns1", "Custom", Table.ColumnNames(#"Removed Other Columns1"[Custom]{0})),


And that’s it. Yes, it does involve a couple of steps but the result is cleaner code and a slight performance improvement as we are no longer loading the sample file twice. You have control over the column types (instead of relying on the sample based guessing), and you avoid all the extra folders and expressions.

Comments