Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
asif999
Frequent Visitor

Combine CSV Files with Uneven Columns & Junk Rows

I have 4 different .csv files that sit in a folder. I want to combine all of these.

But there is a catch, the first 2 files have 4 top junk rows that need to be removed and 5 columns.
And the last 2 files have 5 top junks rows and 6 columns.

I am looking for a dynamic way of combining all my CSV files from a folder.

 

Download Files

 

Expecting Result

DateSettlement IDclosing feespromo rebatesTDStotal
01-01-202211122233312.5null3.5
02-01-202211122233312.5null3.5
03-01-202211122233312.5null3.5
04-01-202211122233312.5null3.5
05-01-202211122233312.5null3.5
01-02-202222233344423.5null4.5
02-02-202222233344423.5null4.5
03-02-202222233344423.5null4.5
04-02-202222233344423.5null4.5
01-03-202244455566635.519.5
02-03-202244455566635.519.5
03-03-202244455566635.519.5
01-04-20223337778887.51.40.759.65
02-04-20223337778887.51.40.759.65
03-04-20223337778887.51.40.759.65
1 ACCEPTED SOLUTION
AlienSx
Super User
Super User

Hello, @asif999 

let
    Source = Folder.Files(<path_to_your_folder>),
    file_names = List.Buffer(Table.AddColumn(Source, "fp", each [Folder Path] & [Name])[fp]),
    fx_csv = (path as text) as table =>
        let
        csv = Csv.Document(File.Contents(path),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
        remove_lines = Table.Skip( csv, List.PositionOf( csv[Column2], "Settlement ID")),
        promote = Table.PromoteHeaders(remove_lines)
    in
        promote,
    files = Table.Combine(List.Transform(file_names, fx_csv)),
    reorder = Table.ReorderColumns(files,{"Date", "Settlement ID", "closing fees", "promo rebates", "TDS", "total"})
in
    reorder

View solution in original post

4 REPLIES 4
AlienSx
Super User
Super User

Hello, @asif999 

let
    Source = Folder.Files(<path_to_your_folder>),
    file_names = List.Buffer(Table.AddColumn(Source, "fp", each [Folder Path] & [Name])[fp]),
    fx_csv = (path as text) as table =>
        let
        csv = Csv.Document(File.Contents(path),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
        remove_lines = Table.Skip( csv, List.PositionOf( csv[Column2], "Settlement ID")),
        promote = Table.PromoteHeaders(remove_lines)
    in
        promote,
    files = Table.Combine(List.Transform(file_names, fx_csv)),
    reorder = Table.ReorderColumns(files,{"Date", "Settlement ID", "closing fees", "promo rebates", "TDS", "total"})
in
    reorder

Thank @AlienSx for your answer.

 

I have done it this way.

 

let
    Source = Folder.Files("B:\Excel Learning\Power Query Learning\1. Combining CSV Files with Uneven Columns & Junk Rows\CSV Files"),
    Table_Cleaned = Table.AddColumn(Source, "Custom", each 
    let 
        MyTable = Csv.Document([Content]),
        MyList = MyTable[Column1],
        Position_Of_Date = List.PositionOf(MyList,"Date"),
        My_Table_Cleaned = Table.Skip(MyTable,Position_Of_Date),
        Promote_Headers = Table.PromoteHeaders(My_Table_Cleaned)
    in 
Promote_Headers),
    Col_Names = Table.AddColumn(Table_Cleaned, "Custom.1", each Table.ColumnNames([Custom])),
    Distinct_Col_Names = List.Distinct(List.Combine(Col_Names[Custom.1])),
    Expaned_Table = Table.ExpandTableColumn(Table.SelectColumns(Col_Names,"Custom"),"Custom",Distinct_Col_Names)
in
    Expaned_Table

 

 

I wouldn't be so confident about "Date" column in Position_Of_Date step because you have some junk data above. What if List.PositionOf finds "Date" text in there?

It won't ever find the date in junk rows. But I will consider your suggestion to remain 100% safe. I will use the second-column header instead.

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.