Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
I have several csv files combining from a folder. Each csv file has 3 rows of junk data / heading info which needs to be removed. The 4th row of each file is the actual header row but I need to remove duplicates. Please can you assist in the best way of achieving this?
Solved! Go to Solution.
Hi @Katie484,
I use the below custom function for processing CSV files.
I've created it so the column number that you specify that has no 'junk' data is used to dynamically find the headers. e.g. it won't matter if one file has 3 rows and another has 5.
Create a blank query and paste this code into the advanced editor...
// fProcessFiles
// Use with Add Column - Invoke Custom Function
// Specify the column number that has only blank rows before the headers/data to deal with inconsistent files.
let
fProcessFiles =
(myFile as binary, myColumn as text) =>
let
CSV = Csv.Document(myFile, [Encoding = 1252]),
ConditionalBlankIndex =
Table.AddColumn(
CSV,
"Custom",
each
if Record.Field(_, "Column" & myColumn) = "" then
0
else
1
),
HeaderPosition =
List.PositionOf(
ConditionalBlankIndex[Custom],
1
),
RemoveRows = Table.Skip(CSV, HeaderPosition),
PromotedHeaders =
Table.PromoteHeaders(
RemoveRows,
[PromoteAllScalars = true]
)
in
PromotedHeaders
in
fProcessFiles
You can then use this function to apply to a list of files.
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
Hey @Katie484, just change your folder path instead of "C:\Users\Ahadk\OneDrive\Desktop\New folder (7)", let me know if there is any problem.
let
Source = Folder.Files("C:\Users\Ahadk\OneDrive\Desktop\New folder (7)"),
#"Imported CSVs" = Table.AddColumn(Source, "Custom", each Csv.Document([Content], [Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.None])),
#"Removed Other Columns" = Table.SelectColumns(#"Imported CSVs", {"Custom"}),
#"Expanded Tables" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", Table.ColumnNames(#"Removed Other Columns"[Custom]{0})),
#"Removed Top Rows" = Table.Skip(#"Expanded Tables", 3),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
#"Removed Duplicates" = Table.Distinct(#"Promoted Headers")
in
#"Removed Duplicates"
Hi @Katie484,
I use the below custom function for processing CSV files.
I've created it so the column number that you specify that has no 'junk' data is used to dynamically find the headers. e.g. it won't matter if one file has 3 rows and another has 5.
Create a blank query and paste this code into the advanced editor...
// fProcessFiles
// Use with Add Column - Invoke Custom Function
// Specify the column number that has only blank rows before the headers/data to deal with inconsistent files.
let
fProcessFiles =
(myFile as binary, myColumn as text) =>
let
CSV = Csv.Document(myFile, [Encoding = 1252]),
ConditionalBlankIndex =
Table.AddColumn(
CSV,
"Custom",
each
if Record.Field(_, "Column" & myColumn) = "" then
0
else
1
),
HeaderPosition =
List.PositionOf(
ConditionalBlankIndex[Custom],
1
),
RemoveRows = Table.Skip(CSV, HeaderPosition),
PromotedHeaders =
Table.PromoteHeaders(
RemoveRows,
[PromoteAllScalars = true]
)
in
PromotedHeaders
in
fProcessFiles
You can then use this function to apply to a list of files.
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |