Solved: Combined CSV files top remove top 3 rows of each w...

Katie484 · ‎08-13-2024

I have several csv files combining from a folder. Each csv file has 3 rows of junk data / heading info which needs to be removed. The 4th row of each file is the actual header row but I need to remove duplicates. Please can you assist in the best way of achieving this?

KNP · ‎08-13-2024

Hi @Katie484,

I use the below custom function for processing CSV files.

I've created it so the column number that you specify that has no 'junk' data is used to dynamically find the headers. e.g. it won't matter if one file has 3 rows and another has 5.

Create a blank query and paste this code into the advanced editor...

// fProcessFiles
// Use with Add Column - Invoke Custom Function
// Specify the column number that has only blank rows before the headers/data to deal with inconsistent files.
let
    fProcessFiles =
        (myFile as binary, myColumn as text) =>
            let
                CSV = Csv.Document(myFile, [Encoding = 1252]),
                ConditionalBlankIndex =
                    Table.AddColumn(
                        CSV,
                        "Custom",
                        each
                            if Record.Field(_, "Column" & myColumn) = "" then
                                0
                            else
                                1
                    ),
                HeaderPosition =
                    List.PositionOf(
                        ConditionalBlankIndex[Custom],
                        1
                    ),
                RemoveRows = Table.Skip(CSV, HeaderPosition),
                PromotedHeaders =
                    Table.PromoteHeaders(
                        RemoveRows,
                        [PromoteAllScalars = true]
                    )
            in
                PromotedHeaders
in
    fProcessFiles

You can then use this function to apply to a list of files.

https://youtu.be/rXFzuvqIbsg

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.

Proud to be a Super User!

View solution in original post

ahadkarimi · ‎08-13-2024

Hey @Katie484, just change your folder path instead of "C:\Users\Ahadk\OneDrive\Desktop\New folder (7)", let me know if there is any problem.

let
    Source = Folder.Files("C:\Users\Ahadk\OneDrive\Desktop\New folder (7)"),
    #"Imported CSVs" = Table.AddColumn(Source, "Custom", each Csv.Document([Content], [Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.None])),
    #"Removed Other Columns" = Table.SelectColumns(#"Imported CSVs", {"Custom"}),
    #"Expanded Tables" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", Table.ColumnNames(#"Removed Other Columns"[Custom]{0})),
    #"Removed Top Rows" = Table.Skip(#"Expanded Tables", 3),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
    #"Removed Duplicates" = Table.Distinct(#"Promoted Headers")
in
    #"Removed Duplicates"

KNP · ‎08-13-2024