Hello,
I need help figuring out why a small .csv file is coming into Power Query and reading as 6GB. I have a few merged queries in my workbook so I am wondering if that is whats causuing the slow down. Can anyone help?
Below is the source query
let
Source = Table.Buffer(Csv.Document(File.Contents("C:\Users\eddy.gramajo\Desktop\Invoice Upload\Billing Report (Post-Upload).csv"),[Delimiter=",", Columns=22, Encoding=1252, QuoteStyle=QuoteStyle.None])),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"OPPORTUNITY_ID", type text}, {"Arrangement Id", type text}, {"Customer Name", type text}, {"Bill To Customer Name", type text}, {"Item Number", type text}, {"Item Description", type text}, {"Sales Order", Int64.Type}, {"Sales Order Line", Int64.Type}, {"Sales Order Id", Int64.Type}, {"Sales Order Line Id", Int64.Type}, {"Qty Ordered", type text}, {"So Book Date", type text}, {"Qty Shipped", type text}, {"Invoice Number", type text}, {"Invoice Line Id", Int64.Type}, {"Invoice Line", Int64.Type}, {"Qty Invoiced", type number}, {"Invoice Date", type date}, {"Ext Sell Price", type number}, {"Rec Amt", type number}, {"Def Amt", type number}, {"Cancelled Flag", type text}})
in
#"Changed Type"
Here is an example of a merged query
let
Source = Table.Buffer(Csv.Document(File.Contents("C:\Users\eddy.gramajo\Desktop\Invoice Upload\INV_API_01Nov2017_30Nov2017_3cded995-0889-4798-97cb-86c7ff08e2d3-2017.12.05 (1).csv"),[Delimiter=",", Columns=73, Encoding=1252, QuoteStyle=QuoteStyle.None])),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"TRAN_TYPE", type text}, {"OPPORTUNITY_ID", type text}, {"OPPORTUNITY_NAME", type text}, {"PRODUCT_LINE", type text}, {"PRODUCT_FAMILY", type text}, {"PRODUCT_CLASS", type text}, {"PRODUCT_CATEGORY", type text}, {"BILL_TO_CUSTOMER_NUMBER", Int64.Type}, {"BILL_TO_CUSTOMER_NAME", type text}, {"BILL_TO_STATE", type text}, {"BILL_TO_COUNTRY", type text}, {"SHIP_TO_CUSTOMER_NUMBER", Int64.Type}, {"SHIP_TO_CUSTOMER_NAME", type text}, {"SHIP_TO_STATE_OR_PROVINCE", type text}, {"SHIP_TO_COUNTRY", type text}, {"PO_NUMBER", type text}, {"SO_BOOK_DATE", type date}, {"SALES_ORDER", Int64.Type}, {"SALES_ORDER_ID", Int64.Type}, {"SALES_ORDER_LINE_ID", Int64.Type}, {"SALES_ORDER_LINE", Int64.Type}, {"ORDER_TYPE", type text}, {"INVOICE_DATE", type date}, {"INVOICE_NUMBER", type text}, {"INVOICE_ID", Int64.Type}, {"INVOICE_LINE", Int64.Type}, {"INVOICE_LINE_ID", Int64.Type}, {"ORIG_INV_LINE_ID", type text}, {"ITEM_ID", Int64.Type}, {"ITEM_NUMBER", type text}, {"ITEM_DESCRIPTION", type text}, {"QUANTITY_ORDERED", type text}, {"QUANTITY_INVOICED", type number}, {"QUANTITY_TIER", type text}, {"UNIT_OF_MEASURE", type text}, {"UNIT_LIST_PRICE", type number}, {"UNIT_SELL_PRICE", type number}, {"EXT_LIST_PRICE", type number}, {"EXT_SELL_PRICE", type number}, {"RULE_START_DATE", type date}, {"RULE_END_DATE", type date}, {"DURATION_IN_MONTHS", Int64.Type}, {"TRANS_CURR_CODE", type text}, {"BASE_CURR_CODE", type text}, {"EX_RATE", Int64.Type}, {"RCURR_EX_RATE", Int64.Type}, {"DEF_ACCTG_SEG1", Int64.Type}, {"REV_ACCTG_SEG1", Int64.Type}, {"DEF_ACCTG_SEG2", type text}, {"REV_ACCTG_SEG2", type text}, {"DEFERRED_REVENUE_FLAG", type text}, {"UNBILLED_ACCOUNTING_FLAG", type text}, {"UNDELIVERED_FLAG", type text}, {"ELIGIBLE_FOR_FV", type text}, {"ELIGIBLE_FOR_CV", type text}, {"STANDALONE_FLAG", type text}, {"FLAG_97_2", type text}, {"PCS_FLAG", type text}, {"RETURN_FLAG", type text}, {"CANCELLED_FLAG", type text}, {"ORG_ID", Int64.Type}, {"BUSINESS_UNIT", type text}, {"REGION", type text}, {"DEPARTMENT", type text}, {"SALES_TEAM", type text}, {"DISTRIBUTION_TYPE", type text}, {"SOB_ID", Int64.Type}, {"CUSTOMER_CLASS", type text}, {"INDUSTRY", type text}, {"METRO", type text}, {"GLOBAL_EMP_COUNT", type text}, {"OPEN_JOB_COUNT", type text}, {"SEC_ATTR_VALUE", Int64.Type}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"QUANTITY_TIER", "GLOBAL_EMP_COUNT", "OPEN_JOB_COUNT"}),
#"Merged Queries" = Table.NestedJoin(#"Removed Columns",{"SALES_ORDER_LINE_ID"},#"All Arrangement Report",{"Sales Order Line Id"},"All Arrangement Report",JoinKind.LeftOuter),
#"Expanded All Arrangement Report" = Table.ExpandTableColumn(#"Merged Queries", "All Arrangement Report", {"Ext Sell Price"}, {"All Arrangement Report.Ext Sell Price"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded All Arrangement Report", each ([All Arrangement Report.Ext Sell Price] <> null)),
#"Merged Queries1" = Table.NestedJoin(#"Filtered Rows",{"INVOICE_LINE_ID"},#"Billing Report (Pre-Upload)",{"Invoice Line Id"},"Billing Report (Pre-Upload)",JoinKind.LeftOuter),
#"Expanded Billing Report (Pre-Upload)" = Table.ExpandTableColumn(#"Merged Queries1", "Billing Report (Pre-Upload)", {"Invoice Line Id"}, {"Billing Report (Pre-Upload).Invoice Line Id"}),
#"Filtered Rows1" = Table.SelectRows(#"Expanded Billing Report (Pre-Upload)", each ([#"Billing Report (Pre-Upload).Invoice Line Id"] = null)),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"Billing Report (Pre-Upload).Invoice Line Id"}),
#"Filtered Rows2" = Table.SelectRows(#"Removed Columns1", each [DEPARTMENT] = "Sales and Marketing : B2B Sales and Marketing : 110 Sales - Employer"),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows2",{{"All Arrangement Report.Ext Sell Price", "SO Amount"}}),
#"Merged Queries2" = Table.NestedJoin(#"Renamed Columns",{"SALES_ORDER_LINE_ID"},#"Billing Report (Pre-Upload)",{"Sales Order Line Id"},"Billing Report (Pre-Upload)",JoinKind.LeftOuter),
#"Aggregated Billing Report (Pre-Upload)" = Table.AggregateTableColumn(#"Merged Queries2", "Billing Report (Pre-Upload)", {{"Ext Sell Price", List.Sum, "Sum of Billing Report (Pre-Upload).Ext Sell Price"}}),
#"Replaced Value" = Table.ReplaceValue(#"Aggregated Billing Report (Pre-Upload)",null,0,Replacer.ReplaceValue,{"Sum of Billing Report (Pre-Upload).Ext Sell Price"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "Total Billings to Date", each [#"Sum of Billing Report (Pre-Upload).Ext Sell Price"] +[EXT_SELL_PRICE]),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Sum of Billing Report (Pre-Upload).Ext Sell Price", type number}, {"Total Billings to Date", type number}}),
#"Rounded Off" = Table.TransformColumns(#"Changed Type1",{{"Total Billings to Date", each Number.Round(_, 2), type number}}),
#"Added Custom1" = Table.AddColumn(#"Rounded Off", "Rebill?", each if [Total Billings to Date] > [SO Amount] then "Yes" else "No" ),
#"Filtered Rows3" = Table.SelectRows(#"Added Custom1", each ([#"Rebill?"] = "No")),
#"Renamed Columns1" = Table.RenameColumns(#"Filtered Rows3",{{"Sum of Billing Report (Pre-Upload).Ext Sell Price", "Sum of Billing Report Ext Sell Price"}})
in
#"Renamed Columns1"
Hi @v-shex-msft @newtoPowerBI,
I am also experianceing unusual performance degredataion with small files being read as very large ones
I have a 2,500 KB csv that shows the same behaviour described above.
let Source = Csv.Document(File.Contents("C:\input.csv"),[Delimiter=",", Columns=2, Encoding=65001, QuoteStyle=QuoteStyle.Csv]), #"Use First Row as Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]), #"Trimmed Text" = Table.TransformColumns(#"Use First Row as Headers",{{"ADD_CONFIG", Text.Trim, type text}}), #"Cleaned Text" = Table.TransformColumns(#"Trimmed Text",{{"ADD_CONFIG", Text.Clean, type text}}), #"Filtered Rows" = Table.SelectRows(#"Cleaned Text", each [ADD_CONFIG] <> null and [ADD_CONFIG] <> ""), #"Parsed XML" = Table.TransformColumns(#"Filtered Rows",{{"ADD_CONFIG", Xml.Tables}}), #"Expanded ADD_CONFIG" = Table.ExpandTableColumn(#"Parsed XML", "ADD_CONFIG", List.Union(List.Transform(#"Parsed XML"[ADD_CONFIG], each Table.ColumnNames(_)))) in #"Expanded ADD_CONFIG"
Using some trial and errors, i figured the performance degradation is directly related to the following expression:
#"Expanded ADD_CONFIG" = Table.ExpandTableColumn(#"Parsed XML", "ADD_CONFIG", List.Union(List.Transform(#"Parsed XML"[ADD_CONFIG], each Table.ColumnNames(_))))
If i replace it with an actual comma seperated list (generated using the same List.Union(List.Transform)), the expansion is immediate:
= Table.ExpandTableColumn(#"Parsed XML", "ADD_CONFIG", {"foobar0","foobar1","foobar10","foobar100","foobar101","foobar102","foobar103","foobar104","foobar107","foobar108","foobar109","foobar11","foobar110","foobar112","foobar12","foobar13","foobar14","foobar15","foobar16","foobar17","foobar18","foobar19","foobar2","foobar20","foobar21","foobar22","foobar23","foobar24","foobar25","foobar27","foobar28","foobar29","foobar3","foobar30","foobar31","foobar32","foobar33","foobar34","foobar35","foobar36","foobar37","foobar38","foobar39","foobar4","foobar40","foobar41","foobar42","foobar43","foobar44","foobar45","foobar46","foobar47","foobar48","foobar49","foobar5","foobar50","foobar51","foobar52","foobar53","foobar54","foobar55","foobar56","foobar57","foobar58","foobar59","foobar6","foobar60","foobar61","foobar62","foobar63","foobar64","foobar65","foobar67","foobar68","foobar69","foobar7","foobar70","foobar71","foobar72","foobar73","foobar74","foobar75","foobar76","foobar77","foobar78","foobar79","foobar8","foobar81","foobar82","foobar83","foobar84","foobar85","foobar86","foobar87","foobar88","foobar89","foobar9","foobar91","foobar92","foobar93","foobar94","foobar95","foobar96","foobar97","foobar98","foobar99"})
Note that this expansion does not create additional rows, only additional columns.
Hi @newtoPowerBI,
Can you please share pbix file and the csv file for test ? It is hard to find out the issue from your query formula.
Notice: please do mask on sensitive data before sharing.
Regards,
Xiaoxin Sheng
HI @v-shex-msft
It looks like when I merge and aggregate data from another table it increases the size significantly. In one case I merged a query and aggregated a sum of a column and the file read as 10GB. Does anyone know a way to not have this happen?
Thank you!
Eddy
Hi @newtoPowerBI,
Have you try to expand summary columns in your merge table?(column which stored list, records, tables )
When you expand merge table, it will generate lots duplicate rows.
Regards,
Xiaoxin Sheng
HI @v-shex-msft,
Here is a link to the data where most of the query is derived from.
https://www.dropbox.com/sh/5kzn1vm38mkis87/AABSRMdhGpGDi6TsOSkpZ5u5a?dl=0
Thank you!