The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event! Join us in Stockholm, Sweden from September 24-27, 2024.
2-for-1 sale on June 20 only!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I have some data stored in Azure blob storage, as gzip'd CSV files.
I'm then pulling the data into Power BI desktop and using a function with Binary.Decompress to decompress the files.
When I refresh the data, it shows as downloading way more than is actually on the storage - a blob container which should have around 5-600mb of files results in a reported download of well over 1gb. The queries are as follows:
Unzip Function:
(gZipFile) => let #"Unzip" = Binary.Decompress(gZipFile, Compression.GZip), #"CSV" = Csv.Document(#"Unzip"), #"Headers" = Table.PromoteHeaders(#"CSV", [PromoteAllScalars=true]) in #"Headers"
Blob Retrieval:
let Source = AzureStorage.Blobs("apdigitalproducts"), #"blobcontainer" = Source{[Name="googleanalyticsdata"]}[Data], #"Removed Other Columns" = Table.SelectColumns(#"blobcontainer",{"Content", "Name"}), #"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns", "Data", each fnDecompress([Content])), #"Removed Columns1" = Table.RemoveColumns(#"Invoked Custom Function",{"Content"}), #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns1", "Data", {"ga:visitorType", "ga:sourceMedium", "ga:country", "ga:landingPagePath", "ga:date", "ga:deviceCategory", "ga:fullReferrer", "ga:newUsers", "ga:sessions", "ga:pageviews", "ga:avgSessionDuration", "ga:avgTimeOnpage", "ga:users", "ga:pageviewsPerSession", "ga:sessionDuration", "ga:timeOnPage"}, {"ga:visitorType", "ga:sourceMedium", "ga:country", "ga:landingPagePath", "ga:date", "ga:deviceCategory", "ga:fullReferrer", "ga:newUsers", "ga:sessions", "ga:pageviews", "ga:avgSessionDuration", "ga:avgTimeOnpage", "ga:users", "ga:pageviewsPerSession", "ga:sessionDuration", "ga:timeOnPage"}) in #"Expanded Data"
Any ideas? Are the blobs being decompressed at the server side or something? I cannot work out at all what is going on here.
@dpws88 Random thought. In the Desktop there are auto generated tables for date hierarchies that are automatically created for every date column you have. Chris Webb did a blog about this and describes the behaviour and how to disable it in the options.
It might be the issue.
@dpws88 Same answer as this post, try it out and let one of them know. It is appreciated if you don't double post the same thing.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the June 2024 Power BI update to learn about new features.
User | Count |
---|---|
34 | |
20 | |
19 | |
17 | |
13 |