Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
tangcy123
Helper I
Helper I

How can I connect to zip file and read csv or txt file in this zip file directly

Hello,

 

Every time I have to extract the zip file by manually, get csv or txt file,  then let power bi report get data from csv or txt file. it is very crazy. 

How to connect to zip file and read the csv or txt files in this zip file directly? any good idea?

 

Thanks,

Linda

2 ACCEPTED SOLUTIONS

Hi @tangcy123 ,

 

You need add a cutome column or invoke column to get the data:

let
    Source =  File.Contents("c\abc\abc.zip"),
    //Function Start - Credit - https://sql10.blogspot.com/2016/06/reading-zip-files-in-powerquery-m.html
    UnzipContents=(ZIPFile) =>
    let
        Header = BinaryFormat.Record([
            MiscHeader = BinaryFormat.Binary(14),
            BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
            ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
        ]),
        HeaderChoice = BinaryFormat.Choice(
            BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
                then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
                else BinaryFormat.Choice(
                        BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                        each BinaryFormat.Record([
                            IsValid  = true,
                            Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
                            Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),
                            Content  = BinaryFormat.Transform(
                                BinaryFormat.Binary(Header(_)[BinarySize]),
                                (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                            )
                            ]),
                            type binary                   // enable streaming
                    )
        ),
        ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
        Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
        )
    in
        Table.FromRecords(Entries),
        //Function End
    Files = UnzipContents(Source),

    #"Added Custom" = Table.AddColumn(Files, "Custom", each Excel.Workbook([Content])),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Data"}, {"Custom.Data"}),
    #"Expanded Custom.Data" = Table.ExpandTableColumn(#"Expanded Custom", "Custom.Data", {"Column1", "Column2"}, {"Custom.Data.Column1", "Custom.Data.Column2"})
    
in
    #"Expanded Custom.Data"

 

vchenwuzmsft_0-1654159403673.png

Best Regards

Community Support Team _ chenwu zhu

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Thank you! I separated the functions and changed to Open CSV file.

 

  • UnzipContents:
let
//Function Start - Credit - https://sql10.blogspot.com/2016/06/reading-zip-files-in-powerquery-m.html
    Source = (ZIPFile) =>
    let
        Header = BinaryFormat.Record([
            MiscHeader = BinaryFormat.Binary(14),
            BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
            ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
        ]),
        HeaderChoice = BinaryFormat.Choice(
            BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
                then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
                else BinaryFormat.Choice(
                        BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                        each BinaryFormat.Record([
                            IsValid  = true,
                            Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
                            Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),
                            Content  = BinaryFormat.Transform(
                                BinaryFormat.Binary(Header(_)[BinarySize]),
                                (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                            )
                            ]),
                            type binary                   // enable streaming
                    )
        ),
        ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
        Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
        )
    in
        Table.FromRecords(Entries)
        //Function End
in
    Source

 

  • OpenCSVonZIPfile:
let
    Source = (
        ZipFilePath as text,
        CsvDelimiter as text,
        CsvColumns as number
    ) =>
    let
        Source =  File.Contents(ZipFilePath),
        Files = UnzipContents(Source),
        Added_Custom = Table.AddColumn(Files, "Custom", each Csv.Document([Content],[Delimiter=CsvDelimiter, Columns=CsvColumns, Encoding=1252, QuoteStyle=QuoteStyle.None])),
        Removed_Column_Content = Table.RemoveColumns(Added_Custom,{"Content"}),
        columnsList = ListColumnsNames(CsvColumns),
        Expanded_Custom = Table.ExpandTableColumn(Removed_Column_Content, "Custom", columnsList, columnsList),
        Final = Expanded_Custom
    in
        Final
in
    Source

 

  • ListColumnsNames:
let
    Source = (
        cols as number
    ) =>
    let
        Source = if(cols > 0) then List.Transform({1 .. cols}, each "Column" & Text.From(_)) else {}
    in
        Source
in
    Source

 

View solution in original post

7 REPLIES 7
Vijay_A_Verma
Super User
Super User

Use the code below (Source credit is mentioned in the code itself from where it was taken)

let
    Source = File.Contents("C:\Users\VV\abcd.zip"),
    //Function Start - Credit - https://sql10.blogspot.com/2016/06/reading-zip-files-in-powerquery-m.html
    UnzipContents=(ZIPFile) =>
    let
        Header = BinaryFormat.Record([
            MiscHeader = BinaryFormat.Binary(14),
            BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
            ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
        ]),
        HeaderChoice = BinaryFormat.Choice(
            BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
                then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
                else BinaryFormat.Choice(
                        BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                        each BinaryFormat.Record([
                            IsValid  = true,
                            Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
                            Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),
                            Content  = BinaryFormat.Transform(
                                BinaryFormat.Binary(Header(_)[BinarySize]),
                                (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                            )
                            ]),
                            type binary                   // enable streaming
                    )
        ),
        ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
        Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
        )
    in
        Table.FromRecords(Entries),
        //Function End
    Files = UnzipContents(Source)
in
    Files

 

tangcy123_0-1653895329840.png

I use the code you replied, get the data as above screens, seems it didn't get the detail data, I am not able to see the detail data in zip text file.

 

I changed the path of zip file and zip name, not able to get detail data from zip file  -> text file.

Source = File.Contents("C:\Users\VV\abcd.zip")

 

This path will have to be replaced with path of your zip file. Let's say your zip file name is tangcy.zip and path where this is C:\xyz\tangcy.zip then this statement would become

Source = File.Contents("C:\xyz\tangcy.zip") 

I updated the path of my zip file as below, then how to get the detail data into test table??

tangcy123_0-1653988675536.png

 

tangcy123_1-1653988755049.png

 

tangcy123_2-1653988792946.png

 

 

Hi @tangcy123 ,

 

You need add a cutome column or invoke column to get the data:

let
    Source =  File.Contents("c\abc\abc.zip"),
    //Function Start - Credit - https://sql10.blogspot.com/2016/06/reading-zip-files-in-powerquery-m.html
    UnzipContents=(ZIPFile) =>
    let
        Header = BinaryFormat.Record([
            MiscHeader = BinaryFormat.Binary(14),
            BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
            ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
        ]),
        HeaderChoice = BinaryFormat.Choice(
            BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
                then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
                else BinaryFormat.Choice(
                        BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                        each BinaryFormat.Record([
                            IsValid  = true,
                            Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
                            Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),
                            Content  = BinaryFormat.Transform(
                                BinaryFormat.Binary(Header(_)[BinarySize]),
                                (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                            )
                            ]),
                            type binary                   // enable streaming
                    )
        ),
        ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
        Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
        )
    in
        Table.FromRecords(Entries),
        //Function End
    Files = UnzipContents(Source),

    #"Added Custom" = Table.AddColumn(Files, "Custom", each Excel.Workbook([Content])),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Data"}, {"Custom.Data"}),
    #"Expanded Custom.Data" = Table.ExpandTableColumn(#"Expanded Custom", "Custom.Data", {"Column1", "Column2"}, {"Custom.Data.Column1", "Custom.Data.Column2"})
    
in
    #"Expanded Custom.Data"

 

vchenwuzmsft_0-1654159403673.png

Best Regards

Community Support Team _ chenwu zhu

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thank you! I separated the functions and changed to Open CSV file.

 

  • UnzipContents:
let
//Function Start - Credit - https://sql10.blogspot.com/2016/06/reading-zip-files-in-powerquery-m.html
    Source = (ZIPFile) =>
    let
        Header = BinaryFormat.Record([
            MiscHeader = BinaryFormat.Binary(14),
            BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
            ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
        ]),
        HeaderChoice = BinaryFormat.Choice(
            BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
                then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
                else BinaryFormat.Choice(
                        BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                        each BinaryFormat.Record([
                            IsValid  = true,
                            Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
                            Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),
                            Content  = BinaryFormat.Transform(
                                BinaryFormat.Binary(Header(_)[BinarySize]),
                                (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                            )
                            ]),
                            type binary                   // enable streaming
                    )
        ),
        ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
        Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
        )
    in
        Table.FromRecords(Entries)
        //Function End
in
    Source

 

  • OpenCSVonZIPfile:
let
    Source = (
        ZipFilePath as text,
        CsvDelimiter as text,
        CsvColumns as number
    ) =>
    let
        Source =  File.Contents(ZipFilePath),
        Files = UnzipContents(Source),
        Added_Custom = Table.AddColumn(Files, "Custom", each Csv.Document([Content],[Delimiter=CsvDelimiter, Columns=CsvColumns, Encoding=1252, QuoteStyle=QuoteStyle.None])),
        Removed_Column_Content = Table.RemoveColumns(Added_Custom,{"Content"}),
        columnsList = ListColumnsNames(CsvColumns),
        Expanded_Custom = Table.ExpandTableColumn(Removed_Column_Content, "Custom", columnsList, columnsList),
        Final = Expanded_Custom
    in
        Final
in
    Source

 

  • ListColumnsNames:
let
    Source = (
        cols as number
    ) =>
    let
        Source = if(cols > 0) then List.Transform({1 .. cols}, each "Column" & Text.From(_)) else {}
    in
        Source
in
    Source

 

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.