Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes! Register now.

Reply
Data-Crunch
Frequent Visitor

Getting .txt files from zipped file

Hi all

 

I am looking to extract data from multiple ZIP files all stored in the same folder. REsearching online I I was able to find Mark White's function that should enable the extraction of csv or txt files from any zip: Reading Zip files in PowerQuery / M (sql10.blogspot.com)

 

Upon testing with my files, for some reason I only get one file returned from the function and also I am unable to extract its content.

The zip files has a folder structure, meaning the actual txt files are nexted within folders. But as per my knowledeg that should not cause an issue to the function. 

Anyone have an idea?

 

For reference I've linked one of the ZIPs: Link to the ZIP 
And below you can find the function: 

 

(ZIPFile) =>
let
    Header = BinaryFormat.Record([
        MiscHeader = BinaryFormat.Binary(14),
        BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
        ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
    ]),
 
    HeaderChoice = BinaryFormat.Choice(
        BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
            then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
            else BinaryFormat.Choice(
                    BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                    each BinaryFormat.Record([
                        IsValid  = true,
                        Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
                        Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),
                        Content  = BinaryFormat.Transform(
                            BinaryFormat.Binary(Header(_)[BinarySize]),
                            (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                        )
                        ]),
                        type binary                   // enable streaming
                )
    ),
 
    ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
 
    Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
    )
in
    Table.FromRecords(Entries)

 

 

 

1 ACCEPTED SOLUTION
OwenAuger
Super User
Super User

Hi @Data-Crunch 

The below post by @artemus contains a function that appears to work with ZIP files containing multiple files:

https://community.fabric.microsoft.com/t5/Power-Query/How-to-connect-Azure-DevOps-REST-API-in-to-pow...

 

It returns a list of records, each of which contains fields FileName and Content.

 

I have pasted the function here for convenience:

// Unzip
(ZIPFile) => 
let
    ushort = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
    uint = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
    EDOCfn = BinaryFormat.Record([
        ZipContent = BinaryFormat.Binary(Binary.Length(ZIPFile) - 22),
        Magic = uint,
        DiskNum = ushort,
        CDirectoryDiskId = ushort,
        CDirectoryRecordCountOnDisk = ushort,
        CDirectoryRecordCount = ushort,
        SizeOfCentralDirectory = uint,
        CentralDirectoryOffset = uint,
        CommendLength = ushort
    ]),
    EDOC = EDOCfn(ZIPFile),
    BeforeCentralDirectory = BinaryFormat.Binary(EDOC[CentralDirectoryOffset]),
    CentralDirectory = BinaryFormat.Length(BinaryFormat.Record(
        [
            ZipContent = BeforeCentralDirectory,
            Items = BinaryFormat.List(BinaryFormat.Record(
                [
                    Magic = uint,
                    CurrentVersion = ushort,
                    MinVersion = ushort,
                    Flags = ushort,
                    CompressionMethod = ushort,
                    FileModificationTime = ushort,
                    FileModificationDate = ushort,
                    CRC32 = uint,
                    BinarySize = uint,
                    FileSize   = uint,
                    FileInfo = BinaryFormat.Choice(
                    BinaryFormat.Record(
                        [
                            Len = ushort,
                            FieldsLen = ushort,
                            FileCommentLength = ushort,
                            Disk = ushort,
                            InternalFileAttr = ushort,
                            ExternalAttr = uint,
                            PosOfFileHeader = uint
                        ]),
                    (fileInfo) => BinaryFormat.Record(
                        [
                            FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                            Fields = BinaryFormat.Binary(fileInfo[FieldsLen]),
                            FileComment = BinaryFormat.Text(fileInfo[FileCommentLength], TextEncoding.Ascii),
                            Disk = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                            InternalFileAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                            ExternalAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[InternalFileAttr]),
                            PosOfFileHeader = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[PosOfFileHeader])
                        ])
                    )
                ]), 
                EDOC[CDirectoryRecordCount]
            )
        ]), 
        EDOC[CentralDirectoryOffset] + EDOC[SizeOfCentralDirectory]),  
    Contents = List.Transform(
        CentralDirectory(ZIPFile)[Items],
            (cdEntry) => 
                let
                    ZipEntry = BinaryFormat.Record(
                    [
                        PreviousData = BinaryFormat.Binary(cdEntry[FileInfo][PosOfFileHeader]), 
                        Magic = uint,
                        ZipVersion = ushort,
                        ZipFlags = ushort,
                        CompressionMethod = ushort,
                        FileModificationTime = ushort,
                        FileModificationDate = ushort,
                        CRC32 = uint, 
                        BinarySize = uint,
                        FileSize   = uint,
                        FileName = BinaryFormat.Choice(
                            BinaryFormat.Record(
                                [
                                    Len = ushort,
                                    FieldsLen = ushort
                                ]),
                            (fileInfo) => BinaryFormat.Record(
                                [
                                    FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                                    Fields = BinaryFormat.Binary(fileInfo[FieldsLen])
                                ]) 
                        ),
                        FileContent = BinaryFormat.Transform(
                            BinaryFormat.Binary(cdEntry[BinarySize]), 
                            each Binary.Decompress(_, Compression.Deflate)
                        )
                    ])(ZIPFile)
                in
                    [FileName=ZipEntry[FileName][FileName], Content=ZipEntry[FileContent]]
    )
in
    Contents

 


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
LinkedIn

View solution in original post

2 REPLIES 2
OwenAuger
Super User
Super User

Hi @Data-Crunch 

The below post by @artemus contains a function that appears to work with ZIP files containing multiple files:

https://community.fabric.microsoft.com/t5/Power-Query/How-to-connect-Azure-DevOps-REST-API-in-to-pow...

 

It returns a list of records, each of which contains fields FileName and Content.

 

I have pasted the function here for convenience:

// Unzip
(ZIPFile) => 
let
    ushort = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
    uint = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
    EDOCfn = BinaryFormat.Record([
        ZipContent = BinaryFormat.Binary(Binary.Length(ZIPFile) - 22),
        Magic = uint,
        DiskNum = ushort,
        CDirectoryDiskId = ushort,
        CDirectoryRecordCountOnDisk = ushort,
        CDirectoryRecordCount = ushort,
        SizeOfCentralDirectory = uint,
        CentralDirectoryOffset = uint,
        CommendLength = ushort
    ]),
    EDOC = EDOCfn(ZIPFile),
    BeforeCentralDirectory = BinaryFormat.Binary(EDOC[CentralDirectoryOffset]),
    CentralDirectory = BinaryFormat.Length(BinaryFormat.Record(
        [
            ZipContent = BeforeCentralDirectory,
            Items = BinaryFormat.List(BinaryFormat.Record(
                [
                    Magic = uint,
                    CurrentVersion = ushort,
                    MinVersion = ushort,
                    Flags = ushort,
                    CompressionMethod = ushort,
                    FileModificationTime = ushort,
                    FileModificationDate = ushort,
                    CRC32 = uint,
                    BinarySize = uint,
                    FileSize   = uint,
                    FileInfo = BinaryFormat.Choice(
                    BinaryFormat.Record(
                        [
                            Len = ushort,
                            FieldsLen = ushort,
                            FileCommentLength = ushort,
                            Disk = ushort,
                            InternalFileAttr = ushort,
                            ExternalAttr = uint,
                            PosOfFileHeader = uint
                        ]),
                    (fileInfo) => BinaryFormat.Record(
                        [
                            FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                            Fields = BinaryFormat.Binary(fileInfo[FieldsLen]),
                            FileComment = BinaryFormat.Text(fileInfo[FileCommentLength], TextEncoding.Ascii),
                            Disk = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                            InternalFileAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                            ExternalAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[InternalFileAttr]),
                            PosOfFileHeader = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[PosOfFileHeader])
                        ])
                    )
                ]), 
                EDOC[CDirectoryRecordCount]
            )
        ]), 
        EDOC[CentralDirectoryOffset] + EDOC[SizeOfCentralDirectory]),  
    Contents = List.Transform(
        CentralDirectory(ZIPFile)[Items],
            (cdEntry) => 
                let
                    ZipEntry = BinaryFormat.Record(
                    [
                        PreviousData = BinaryFormat.Binary(cdEntry[FileInfo][PosOfFileHeader]), 
                        Magic = uint,
                        ZipVersion = ushort,
                        ZipFlags = ushort,
                        CompressionMethod = ushort,
                        FileModificationTime = ushort,
                        FileModificationDate = ushort,
                        CRC32 = uint, 
                        BinarySize = uint,
                        FileSize   = uint,
                        FileName = BinaryFormat.Choice(
                            BinaryFormat.Record(
                                [
                                    Len = ushort,
                                    FieldsLen = ushort
                                ]),
                            (fileInfo) => BinaryFormat.Record(
                                [
                                    FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                                    Fields = BinaryFormat.Binary(fileInfo[FieldsLen])
                                ]) 
                        ),
                        FileContent = BinaryFormat.Transform(
                            BinaryFormat.Binary(cdEntry[BinarySize]), 
                            each Binary.Decompress(_, Compression.Deflate)
                        )
                    ])(ZIPFile)
                in
                    [FileName=ZipEntry[FileName][FileName], Content=ZipEntry[FileContent]]
    )
in
    Contents

 


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
LinkedIn

This one worked like a charm! Thanks!

Helpful resources

Announcements
September Power BI Update Carousel

Power BI Monthly Update - September 2025

Check out the September 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors
Top Kudoed Authors