Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Data-Crunch
Frequent Visitor

Getting .txt files from zipped file

Hi all

 

I am looking to extract data from multiple ZIP files all stored in the same folder. REsearching online I I was able to find Mark White's function that should enable the extraction of csv or txt files from any zip: Reading Zip files in PowerQuery / M (sql10.blogspot.com)

 

Upon testing with my files, for some reason I only get one file returned from the function and also I am unable to extract its content.

The zip files has a folder structure, meaning the actual txt files are nexted within folders. But as per my knowledeg that should not cause an issue to the function. 

Anyone have an idea?

 

For reference I've linked one of the ZIPs: Link to the ZIP 
And below you can find the function: 

 

(ZIPFile) =>
let
    Header = BinaryFormat.Record([
        MiscHeader = BinaryFormat.Binary(14),
        BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
        ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
    ]),
 
    HeaderChoice = BinaryFormat.Choice(
        BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
            then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
            else BinaryFormat.Choice(
                    BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                    each BinaryFormat.Record([
                        IsValid  = true,
                        Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
                        Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),
                        Content  = BinaryFormat.Transform(
                            BinaryFormat.Binary(Header(_)[BinarySize]),
                            (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                        )
                        ]),
                        type binary                   // enable streaming
                )
    ),
 
    ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
 
    Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
    )
in
    Table.FromRecords(Entries)

 

 

 

1 ACCEPTED SOLUTION
OwenAuger
Super User
Super User

Hi @Data-Crunch 

The below post by @artemus contains a function that appears to work with ZIP files containing multiple files:

https://community.fabric.microsoft.com/t5/Power-Query/How-to-connect-Azure-DevOps-REST-API-in-to-pow...

 

It returns a list of records, each of which contains fields FileName and Content.

 

I have pasted the function here for convenience:

// Unzip
(ZIPFile) => 
let
    ushort = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
    uint = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
    EDOCfn = BinaryFormat.Record([
        ZipContent = BinaryFormat.Binary(Binary.Length(ZIPFile) - 22),
        Magic = uint,
        DiskNum = ushort,
        CDirectoryDiskId = ushort,
        CDirectoryRecordCountOnDisk = ushort,
        CDirectoryRecordCount = ushort,
        SizeOfCentralDirectory = uint,
        CentralDirectoryOffset = uint,
        CommendLength = ushort
    ]),
    EDOC = EDOCfn(ZIPFile),
    BeforeCentralDirectory = BinaryFormat.Binary(EDOC[CentralDirectoryOffset]),
    CentralDirectory = BinaryFormat.Length(BinaryFormat.Record(
        [
            ZipContent = BeforeCentralDirectory,
            Items = BinaryFormat.List(BinaryFormat.Record(
                [
                    Magic = uint,
                    CurrentVersion = ushort,
                    MinVersion = ushort,
                    Flags = ushort,
                    CompressionMethod = ushort,
                    FileModificationTime = ushort,
                    FileModificationDate = ushort,
                    CRC32 = uint,
                    BinarySize = uint,
                    FileSize   = uint,
                    FileInfo = BinaryFormat.Choice(
                    BinaryFormat.Record(
                        [
                            Len = ushort,
                            FieldsLen = ushort,
                            FileCommentLength = ushort,
                            Disk = ushort,
                            InternalFileAttr = ushort,
                            ExternalAttr = uint,
                            PosOfFileHeader = uint
                        ]),
                    (fileInfo) => BinaryFormat.Record(
                        [
                            FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                            Fields = BinaryFormat.Binary(fileInfo[FieldsLen]),
                            FileComment = BinaryFormat.Text(fileInfo[FileCommentLength], TextEncoding.Ascii),
                            Disk = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                            InternalFileAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                            ExternalAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[InternalFileAttr]),
                            PosOfFileHeader = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[PosOfFileHeader])
                        ])
                    )
                ]), 
                EDOC[CDirectoryRecordCount]
            )
        ]), 
        EDOC[CentralDirectoryOffset] + EDOC[SizeOfCentralDirectory]),  
    Contents = List.Transform(
        CentralDirectory(ZIPFile)[Items],
            (cdEntry) => 
                let
                    ZipEntry = BinaryFormat.Record(
                    [
                        PreviousData = BinaryFormat.Binary(cdEntry[FileInfo][PosOfFileHeader]), 
                        Magic = uint,
                        ZipVersion = ushort,
                        ZipFlags = ushort,
                        CompressionMethod = ushort,
                        FileModificationTime = ushort,
                        FileModificationDate = ushort,
                        CRC32 = uint, 
                        BinarySize = uint,
                        FileSize   = uint,
                        FileName = BinaryFormat.Choice(
                            BinaryFormat.Record(
                                [
                                    Len = ushort,
                                    FieldsLen = ushort
                                ]),
                            (fileInfo) => BinaryFormat.Record(
                                [
                                    FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                                    Fields = BinaryFormat.Binary(fileInfo[FieldsLen])
                                ]) 
                        ),
                        FileContent = BinaryFormat.Transform(
                            BinaryFormat.Binary(cdEntry[BinarySize]), 
                            each Binary.Decompress(_, Compression.Deflate)
                        )
                    ])(ZIPFile)
                in
                    [FileName=ZipEntry[FileName][FileName], Content=ZipEntry[FileContent]]
    )
in
    Contents

 


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
LinkedIn

View solution in original post

2 REPLIES 2
OwenAuger
Super User
Super User

Hi @Data-Crunch 

The below post by @artemus contains a function that appears to work with ZIP files containing multiple files:

https://community.fabric.microsoft.com/t5/Power-Query/How-to-connect-Azure-DevOps-REST-API-in-to-pow...

 

It returns a list of records, each of which contains fields FileName and Content.

 

I have pasted the function here for convenience:

// Unzip
(ZIPFile) => 
let
    ushort = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
    uint = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
    EDOCfn = BinaryFormat.Record([
        ZipContent = BinaryFormat.Binary(Binary.Length(ZIPFile) - 22),
        Magic = uint,
        DiskNum = ushort,
        CDirectoryDiskId = ushort,
        CDirectoryRecordCountOnDisk = ushort,
        CDirectoryRecordCount = ushort,
        SizeOfCentralDirectory = uint,
        CentralDirectoryOffset = uint,
        CommendLength = ushort
    ]),
    EDOC = EDOCfn(ZIPFile),
    BeforeCentralDirectory = BinaryFormat.Binary(EDOC[CentralDirectoryOffset]),
    CentralDirectory = BinaryFormat.Length(BinaryFormat.Record(
        [
            ZipContent = BeforeCentralDirectory,
            Items = BinaryFormat.List(BinaryFormat.Record(
                [
                    Magic = uint,
                    CurrentVersion = ushort,
                    MinVersion = ushort,
                    Flags = ushort,
                    CompressionMethod = ushort,
                    FileModificationTime = ushort,
                    FileModificationDate = ushort,
                    CRC32 = uint,
                    BinarySize = uint,
                    FileSize   = uint,
                    FileInfo = BinaryFormat.Choice(
                    BinaryFormat.Record(
                        [
                            Len = ushort,
                            FieldsLen = ushort,
                            FileCommentLength = ushort,
                            Disk = ushort,
                            InternalFileAttr = ushort,
                            ExternalAttr = uint,
                            PosOfFileHeader = uint
                        ]),
                    (fileInfo) => BinaryFormat.Record(
                        [
                            FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                            Fields = BinaryFormat.Binary(fileInfo[FieldsLen]),
                            FileComment = BinaryFormat.Text(fileInfo[FileCommentLength], TextEncoding.Ascii),
                            Disk = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                            InternalFileAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                            ExternalAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[InternalFileAttr]),
                            PosOfFileHeader = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[PosOfFileHeader])
                        ])
                    )
                ]), 
                EDOC[CDirectoryRecordCount]
            )
        ]), 
        EDOC[CentralDirectoryOffset] + EDOC[SizeOfCentralDirectory]),  
    Contents = List.Transform(
        CentralDirectory(ZIPFile)[Items],
            (cdEntry) => 
                let
                    ZipEntry = BinaryFormat.Record(
                    [
                        PreviousData = BinaryFormat.Binary(cdEntry[FileInfo][PosOfFileHeader]), 
                        Magic = uint,
                        ZipVersion = ushort,
                        ZipFlags = ushort,
                        CompressionMethod = ushort,
                        FileModificationTime = ushort,
                        FileModificationDate = ushort,
                        CRC32 = uint, 
                        BinarySize = uint,
                        FileSize   = uint,
                        FileName = BinaryFormat.Choice(
                            BinaryFormat.Record(
                                [
                                    Len = ushort,
                                    FieldsLen = ushort
                                ]),
                            (fileInfo) => BinaryFormat.Record(
                                [
                                    FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                                    Fields = BinaryFormat.Binary(fileInfo[FieldsLen])
                                ]) 
                        ),
                        FileContent = BinaryFormat.Transform(
                            BinaryFormat.Binary(cdEntry[BinarySize]), 
                            each Binary.Decompress(_, Compression.Deflate)
                        )
                    ])(ZIPFile)
                in
                    [FileName=ZipEntry[FileName][FileName], Content=ZipEntry[FileContent]]
    )
in
    Contents

 


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
LinkedIn

This one worked like a charm! Thanks!

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.