Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Learn from the best! Meet the four finalists headed to the FINALS of the Power BI Dataviz World Championships! Register now

Reply
manyad
New Member

decompress, skip lines, delimit

All, I have a bunch of .gz files in a folder that I have to import into Power BI. After looking up online, i have built the following function. My files have the very first record which is file info header (with about 4 strings concatenated by a pipe), while the rest of the data has about 20 odd fields split by a pipe.

Using the below function, I get only 4 fields because of the file header record. If i use the commented line, i end up with all data in a single column with a different challenge i.e. promoting headers becomes a problem.

In essence, i need to:

1. decompress the zips

2. skip the first line of the file

3. read the file after delimiting

4. promote headers as column names

Is anyone able to advise how the above is possible?

 

(gzFile) =>
let
Decompressed = Binary.Decompress(File.Contents(gzFile), Compression.GZip),
//Imported = Table.FromColumns({Lines.FromBinary(Decompressed,null,null,1252)}),
Imported = Csv.Document(Decompressed, [Delimiter="|", Encoding=1252]),
SkipRowFirst = Table.Skip(Imported,1),
SkipRowLast = Table.RemoveLastN(SkipRowFirst,1),
PromotedHeaders = Table.PromoteHeaders(SkipRowLast , [PromoteAllScalars=true])
in
PromotedHeaders

 

followed by this query to read the files:

let
Source = Folder.Files("C:\temp"),
gzFiles = Table.SelectRows(Source, each ([Extension] = ".gz")),
unpackGZ = Table.AddColumn(gzFiles, "Custom", each unzip([Folder Path]&[Name]))
in
unpackGZ

 

1 ACCEPTED SOLUTION
MarkLaf
Super User
Super User

Edit: actually, looking again at your issue + code (which is basically the same as mine), I think the main thing to change is to add a Columns field to your options record. When it's unspecified (as we are both doing in our M), I think it just looks at top row to determine column count (some number less than the ~20 you mentioned). So, I think the simple fix for you is:

 

Imported = Csv.Document(Decompressed, [Columns=20, Delimiter="|", Encoding=1252]), // <-- added Columns=#

 

Or, upstream, get the appropriate number of delimiters appended to first row such that it matches column count for primary data

 

Original post for references:

The following worked for me.

 

For reference, I made two simple csvs of format:

MarkLaf_2-1771797784606.png

 

Zipped each with 7zip into gz. Note: original csvs in same folder but we are filtering them out per your procedure.

MarkLaf_1-1771797087961.png

 

The following M successfully unpacks these.

let
    Source = Folder.Files( #"<FolderPath>" ),
    FilterGz = Table.SelectRows(Source, each [Extension] = ".gz"),
    AddGzParse = Table.AddColumn(
        FilterGz, "gz_tables", each 
        // steps for unpacking each gz - you can put these in a separate function if desired
        [ 
            gz = [Content],
            csv_binary = Binary.Decompress(gz, Compression.GZip), 
            csv_table = Csv.Document(csv_binary,null,"|"), 
            csv_table_trim = Table.RemoveLastN( Table.RemoveFirstN(csv_table,1), 1), 
            csv_table_promoted = Table.PromoteHeaders(csv_table_trim)
        ] [csv_table_promoted]
    ),
    // use our output from first row to automate column type and expansion
    // note, it's csv so all cols come in as text
    FirstTableType = Value.Type( List.First( AddGzParse[gz_tables] ) ),
    FixGzParseColType = Table.TransformColumns( 
        AddGzParse, 
        {{"gz_tables", each {_}{0}, FirstTableType }} 
    ),
    SelectMetaAndTables = Table.SelectColumns(
        FixGzParseColType,{"Name", "Date created", "gz_tables"}
    ),
    ExpandGz_tables = Table.ExpandTableColumn(
        SelectMetaAndTables, "gz_tables", 
        Type.TableSchema(FirstTableType)[Name] 
    )
in
    ExpandGz_tables

 

Output

MarkLaf_3-1771797831810.png

 

View solution in original post

5 REPLIES 5
v-pnaroju-msft
Community Support
Community Support

Thankyou, @lbendlin and @MarkLaf for your responses.

Hi manyad,

We are pleased to note that your issue has been resolved. Thank you for sharing your insights and approach in resolving the issue, which will be beneficial to other members of the community. Should you have any further queries, please feel free to contact the Microsoft Fabric community.

Thank you.

MarkLaf
Super User
Super User

Edit: actually, looking again at your issue + code (which is basically the same as mine), I think the main thing to change is to add a Columns field to your options record. When it's unspecified (as we are both doing in our M), I think it just looks at top row to determine column count (some number less than the ~20 you mentioned). So, I think the simple fix for you is:

 

Imported = Csv.Document(Decompressed, [Columns=20, Delimiter="|", Encoding=1252]), // <-- added Columns=#

 

Or, upstream, get the appropriate number of delimiters appended to first row such that it matches column count for primary data

 

Original post for references:

The following worked for me.

 

For reference, I made two simple csvs of format:

MarkLaf_2-1771797784606.png

 

Zipped each with 7zip into gz. Note: original csvs in same folder but we are filtering them out per your procedure.

MarkLaf_1-1771797087961.png

 

The following M successfully unpacks these.

let
    Source = Folder.Files( #"<FolderPath>" ),
    FilterGz = Table.SelectRows(Source, each [Extension] = ".gz"),
    AddGzParse = Table.AddColumn(
        FilterGz, "gz_tables", each 
        // steps for unpacking each gz - you can put these in a separate function if desired
        [ 
            gz = [Content],
            csv_binary = Binary.Decompress(gz, Compression.GZip), 
            csv_table = Csv.Document(csv_binary,null,"|"), 
            csv_table_trim = Table.RemoveLastN( Table.RemoveFirstN(csv_table,1), 1), 
            csv_table_promoted = Table.PromoteHeaders(csv_table_trim)
        ] [csv_table_promoted]
    ),
    // use our output from first row to automate column type and expansion
    // note, it's csv so all cols come in as text
    FirstTableType = Value.Type( List.First( AddGzParse[gz_tables] ) ),
    FixGzParseColType = Table.TransformColumns( 
        AddGzParse, 
        {{"gz_tables", each {_}{0}, FirstTableType }} 
    ),
    SelectMetaAndTables = Table.SelectColumns(
        FixGzParseColType,{"Name", "Date created", "gz_tables"}
    ),
    ExpandGz_tables = Table.ExpandTableColumn(
        SelectMetaAndTables, "gz_tables", 
        Type.TableSchema(FirstTableType)[Name] 
    )
in
    ExpandGz_tables

 

Output

MarkLaf_3-1771797831810.png

 

This is exactly what I needed. Thanks. I also added logic to read the number of fields from the second/good line and made the columns dynamic. Thanks for your help. Much appreciated. 

lbendlin
Super User
Super User

Can you post one or two sample gz files to a publicly available file store?

Thanks. Can't post the files. But based on the other response my problem was resolved by adding the columns attribute. 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

February Power BI Update Carousel

Power BI Monthly Update - February 2026

Check out the February 2026 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.