Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
anjbauer1993
Frequent Visitor

Data loaded in from Azure Blob Storage double the size of the actual files

I'm pulling in a large set of PDF files from a blob container in Azure, 4,735 files totalling 476 MB. The main issue I'm having is that when refreshing, or even just adding a step in Power Query, over 900 MB of data is loaded. It takes a very long time to refresh and apply any new steps in Power Query. I've attached the relavent Power Query code below. This report is just intended to identify any corrupted PDFs or PDFs that are in the wrong format so that they can be fixed for use in another report so this query is relatively simple.

// simpleLoad
let
    Source = (#"Sample File Parameter1" as binary) => let
        Source = Pdf.Tables(#"Sample File Parameter1"),
        Table002 = Source{[Id="Table002"]}[Data]

    in
        Table002
in
    Source
// timesheets (2)
let
    Source = AzureStorage.Blobs(acct),
    timesheets1 = Source{[Name="timesheets"]}[Data],
    #"buffer" = Table.Buffer(timesheets1),
    #"Invoke Custom Function1" = Table.AddColumn(#"buffer", "simpleLoad", each #"simpleLoad"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "simpleLoad"}),
    #"Kept Errors" = Table.SelectRowsWithErrors(#"Removed Other Columns1", {"simpleLoad"}),
    #"Added Custom" = Table.AddColumn(#"Kept Errors", "Error", each try [simpleLoad]),
    #"Expanded Error" = Table.ExpandRecordColumn(#"Added Custom", "Error", {"Error"}, {"Error.1"}),
    #"Expanded Error.1" = Table.ExpandRecordColumn(#"Expanded Error", "Error.1", {"Reason", "Message"}, {"Reason", "Message"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Error.1",{"simpleLoad"})
in
    #"Removed Columns"

 

 

I found another post that deals with the same issue but with gzip files, it was recommended there that you disable auto generated tables for date hierarchies, which did not change anything in my report. I added the buffer step in to hopefully make adding steps go quicker in Power Query but that also didn't seem to help.

3 REPLIES 3
Anonymous
Not applicable

Hello @anjbauer1993 ,
Were you able to figure out why your data load was doubled in PBI, I am facing similar issue in my load from Azure blob storage. I am reading a csv file which is around 4 GB but PBI load more than 8 GB for it. 
Can anyone help on it?

v-yuta-msft
Community Support
Community Support

@anjbauer1993 ,

 

From the power query you have posted, it seems like you have expend the Error column in several steps. I'm not sure but this could be the cause of this issue, so for troubleshooting you may remove these steps and check if the data size is normal.

let
    Source = AzureStorage.Blobs(acct),
    timesheets1 = Source{[Name="timesheets"]}[Data],
    #"buffer" = Table.Buffer(timesheets1),
    #"Invoke Custom Function1" = Table.AddColumn(#"buffer", "simpleLoad", each #"simpleLoad"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "simpleLoad"}),
    #"Kept Errors" = Table.SelectRowsWithErrors(#"Removed Other Columns1", {"simpleLoad"}),
    #"Added Custom" = Table.AddColumn(#"Kept Errors", "Error", each try [simpleLoad]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom", {"simpleLoad"})
in
    #"Removed Columns"

 

Community Support Team _ Jimmy Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@v-yuta-msft

Unfortunately that did not do the trick. I've created a sample dataset in order to move through things quicker, now only 69 MB. I also created a new query that only pulls the files in and does no transformation, but it's still downloading 132 MB when refreshing. The Power Query code is below

 

// timesheets-test
let
    Source = AzureStorage.Blobs(acct),
    #"timesheets-test1" = Source{[Name="timesheets-test"]}[Data]
in
    #"timesheets-test1"

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.