Solved: Inconsistent file transformations from combine fil...

alee023 · ‎04-02-2024

I have an existing dataflow where the source is set to a Sharepoint folder. The dataflow then filters for the CSV file with the most recent "creation date" and I used the "Combine file" option to expand the CSV file into a table so I can work with the data inside the file. When I first created the dataflow, the latest CSV file had 20 columns. My goal is a dataflow that I can trigger every month - it will apply the same transformations to the most recent file uploaded to Sharepoint every month.

I refreshed the dataflow today after I uploaded a new CSV file to Sharepoint, but the dataflow ran into an error. I found out that this CSV file had 21 columns (the same columns as the last CSV file, but with an additional column). However, only 20 columns showed up when the "Combine file" step was run, rather than all 21 cols.

I tested creating a new dataflow with the same steps, and all 21 cols showed up. Since I created the original dataflow when the latest file only had 20 columns, is there some kind of internal logic/code that limits the transformation to only showing 20 columns? How can I fix the "Combine file" transformation to show all the columns in every future file I upload to Sharepoint?

miguel · ‎04-02-2024

Follow these steps:

go to the "Transform file from Query" > "Helper Queries" > (query) "Transform Sample File"
Find the Source step and in the formula bar remove the "Columns = 20," from the code

This change will make sure that you always get all the columns from every single file. It won't be set to a specific number of columns anymore.

View solution in original post

alee023 · ‎04-02-2024

I'm not sure how to get the Power Query code for each step. But I've copied the formulas for the steps:

(Here, I'm using the base URL for the sharepoint folder as the source, then filtering for the specific subfolder and for the latest file created)

SharePoint.Files("https://contoso.sharepoint.com/sites/folder", [ApiVersion = 15])Table.SelectRows(Source, each [Folder Path] = "https://contoso.sharepoint.com/sites/folder/subfolder/CSV Files/")
Table.Sort(#"Filtered rows", {{"Date created", Order.Descending}})Table.FirstN(#"Sorted rows", 1)Table.SelectRows(#"Kept top rows", each [Attributes]?[Hidden]? <> true)

The below steps are automatically created by the dataflow when I "combine file" - e.g. expand the 1 latest created file resulting from the filtering done above
Table.AddColumn(#"Filtered hidden files", "Transform file", each #"Transform file"([Content]))
Table.RenameColumns(#"Invoke custom function", {{"Name", "Source.Name"}})
Table.SelectColumns(#"Renamed columns", {"Source.Name", "Transform file"})

(this is where dataflow expands the file into a table)
Table.ExpandTableColumn(#"Removed other columns", "Transform file", Table.ColumnNames(#"Transform file"(#"Sample file")))

Table.TransformColumnTypes(#"Choose columns", {{"A", type text}, {"B", type text}, {"C", type text}, {"D", type text}, {"E", type datetime}, ... , {"T", type text}})

In the sample code above and when I first created this dataflow, there were only 20 columns in the original CSV file (A -> T). I refreshed the dataflow with the new CSV file for this month - which has 21 columns (for example: A -> ... -> J, J1 -> ... -> T), and the resulting columns from the "ExpandTableColumn" are A -> ... -> J, J1 -> ... -> S (no T).

======================================================

I wanted to add that I looked into the "Queries" panel of the dataflow more carefully, at "Transform file from Query" > "Helper Queries" > (function) "Transform File". The function in the "Transform file" is:

(Parameter as binary) => let

Source = Csv.Document(Parameter, [Delimiter = ",", Columns = 20, QuoteStyle = QuoteStyle.None]),

#Promoted headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true])

in

#"Promoted headers"

So I think I found the "internal" logic that limits the column count to 20. Is there a way I can set this number to the variable total number of columns detected in the file - rather than a set number?

miguel · ‎04-02-2024

Follow these steps:

go to the "Transform file from Query" > "Helper Queries" > (query) "Transform Sample File"
Find the Source step and in the formula bar remove the "Columns = 20," from the code

This change will make sure that you always get all the columns from every single file. It won't be set to a specific number of columns anymore.

alee023 · ‎04-02-2024

I didn't know it was an optional field. Thanks!

miguel · ‎04-02-2024

Hi!

Could you please share the Power Query Script (M code) of your exemplar query (sample transform query)? That would help us understand what logic you have implemented.

Inconsistent file transformations from combine file in Dataflow

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

New Offer! Become a Certified Fabric Data Engineer

Inconsistent file transformations from combine file in Dataflow

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025