The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
I have several csv files combining from a folder. Each csv file has 3 rows of junk data / heading info which needs to be removed. The 4th row of each file is the actual header row but I need to remove duplicates. Please can you assist in the best way of achieving this?
Solved! Go to Solution.
Hi @Katie484,
I use the below custom function for processing CSV files.
I've created it so the column number that you specify that has no 'junk' data is used to dynamically find the headers. e.g. it won't matter if one file has 3 rows and another has 5.
Create a blank query and paste this code into the advanced editor...
// fProcessFiles
// Use with Add Column - Invoke Custom Function
// Specify the column number that has only blank rows before the headers/data to deal with inconsistent files.
let
fProcessFiles =
(myFile as binary, myColumn as text) =>
let
CSV = Csv.Document(myFile, [Encoding = 1252]),
ConditionalBlankIndex =
Table.AddColumn(
CSV,
"Custom",
each
if Record.Field(_, "Column" & myColumn) = "" then
0
else
1
),
HeaderPosition =
List.PositionOf(
ConditionalBlankIndex[Custom],
1
),
RemoveRows = Table.Skip(CSV, HeaderPosition),
PromotedHeaders =
Table.PromoteHeaders(
RemoveRows,
[PromoteAllScalars = true]
)
in
PromotedHeaders
in
fProcessFiles
You can then use this function to apply to a list of files.
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
Hey @Katie484, just change your folder path instead of "C:\Users\Ahadk\OneDrive\Desktop\New folder (7)", let me know if there is any problem.
let
Source = Folder.Files("C:\Users\Ahadk\OneDrive\Desktop\New folder (7)"),
#"Imported CSVs" = Table.AddColumn(Source, "Custom", each Csv.Document([Content], [Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.None])),
#"Removed Other Columns" = Table.SelectColumns(#"Imported CSVs", {"Custom"}),
#"Expanded Tables" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", Table.ColumnNames(#"Removed Other Columns"[Custom]{0})),
#"Removed Top Rows" = Table.Skip(#"Expanded Tables", 3),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
#"Removed Duplicates" = Table.Distinct(#"Promoted Headers")
in
#"Removed Duplicates"
Hi @Katie484,
I use the below custom function for processing CSV files.
I've created it so the column number that you specify that has no 'junk' data is used to dynamically find the headers. e.g. it won't matter if one file has 3 rows and another has 5.
Create a blank query and paste this code into the advanced editor...
// fProcessFiles
// Use with Add Column - Invoke Custom Function
// Specify the column number that has only blank rows before the headers/data to deal with inconsistent files.
let
fProcessFiles =
(myFile as binary, myColumn as text) =>
let
CSV = Csv.Document(myFile, [Encoding = 1252]),
ConditionalBlankIndex =
Table.AddColumn(
CSV,
"Custom",
each
if Record.Field(_, "Column" & myColumn) = "" then
0
else
1
),
HeaderPosition =
List.PositionOf(
ConditionalBlankIndex[Custom],
1
),
RemoveRows = Table.Skip(CSV, HeaderPosition),
PromotedHeaders =
Table.PromoteHeaders(
RemoveRows,
[PromoteAllScalars = true]
)
in
PromotedHeaders
in
fProcessFiles
You can then use this function to apply to a list of files.
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |