The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hi,
I am pulling a csv file data into power query to do cleaning and transformations to the data. There are around 300K `Attrb` rows, of which i am only filtering Attrb1,Attrb3,Attrb4,Attr5,Attr6. These `Attrbs` i am pivoting from rows to columns after sorting them in descending order.
Initially, the data loads fast till the point i come to `adding custom calculated columns`. When i add a new calculated Column to my table in Power Query on the previous step, the time to calculate and refresh the table in the Power Query editor seems to increase exponentially (from 48mb it increases to 2+ gb) per added column in my actual workbook. This is not evident on this sample workbook that i have attached as i have removed most transformations and merges ; but in my original workbook which contains groupings,mergings and pivotings. Due to this, it takes time for the calculated columns to load each time.
Also, i am not able to figure out a way to add PERCENTILE calculations to the custom columns. See the last 3 columns in FORMULAS sheet. The formula calculations are shown in this sheet. Here is a sample data that i have attached in SampleBook.
Solved! Go to Solution.
Hi @Anonymous ,
For calculating PERCENTILE in Power Query, you can refer to this post:
How do I calculate Percentiles in PowerQuery based on grouping variables?
Best regards
Icey
If this post helps,then consider Accepting it as the solution to help other members find it faster.
Hello @Anonymous
I think the problem is here that you are referencing a column of a previous step on each row. Try this approach
let
Source = Excel.CurrentWorkbook(){[Name="Table13"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"INNO", type text}, {"INNO_SORT", Int64.Type}, {"Attrb6", Int64.Type}, {"Attrb5", Int64.Type}, {"Attrb4", Int64.Type}, {"Attrb3", Int64.Type}, {"Attrb1", Int64.Type}}),
InsertedTotal = Table.AddColumn(ChangedType, "Total", each [Attrb6] + [Attrb4], Int64.Type),
AverageAttrb6 = List.Average(InsertedTotal[Attrb6]),
InsertedAttrb6AvgPercent = Table.AddColumn(InsertedTotal, "Attrb6 Avg%", each [Attrb6]/AverarageAttrb6,Percentage.Type),
AverageAttrb4 = List.Average(InsertedAttrb6AvgPercent[Attrb4]),
InsertedAttrb4AvgPercent = Table.AddColumn(InsertedAttrb6AvgPercent, "Attrb4 Avg%", each [Attrb4]/AverageAttrb4,Percentage.Type),
AverageTotal = List.Average(InsertedAttrb4AvgPercent[Total]),
InsertedTotalAvgPercent = Table.AddColumn(InsertedAttrb4AvgPercent, "Total%", each [Total]/AverageTotal,Percentage.Type),
FilteredInnoRows = if drpInno =null then InsertedTotalAvgPercent else Table.SelectRows(InsertedTotalAvgPercent, each Text.Contains([INNO], drpInno,Comparer.OrdinalIgnoreCase))
in
FilteredInnoRows
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Hi @Jimmy801 ,
I copied your code to the Advanced editor, but i am getting the following error :
Formula.Firewall: Query 'tblOriginal' (step 'FilteredInnoRows') references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
Any idea why this is happening?
My original code was :
let
Source = Excel.CurrentWorkbook(){[Name="Table13"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"INNO", type text}, {"INNO_SORT", Int64.Type}, {"Attrb6", Int64.Type}, {"Attrb5", Int64.Type}, {"Attrb4", Int64.Type}, {"Attrb3", Int64.Type}, {"Attrb1", Int64.Type}}),
InsertedTotal = Table.AddColumn(ChangedType, "Total", each [Attrb6] + [Attrb4], Int64.Type),
InsertedAttrb6AvgPercent = Table.AddColumn(InsertedTotal, "Attrb6 Avg%", each [Attrb6]/List.Average(InsertedTotal[Attrb6]),Percentage.Type),
InsertedAttrb4AvgPercent = Table.AddColumn(InsertedAttrb6AvgPercent, "Attrb4 Avg%", each [Attrb4]/List.Average(InsertedAttrb6AvgPercent[Attrb4]),Percentage.Type),
InsertedTotalAvgPercent = Table.AddColumn(InsertedAttrb4AvgPercent, "Total%", each [Total]/List.Average(InsertedAttrb4AvgPercent[Total]),Percentage.Type),
FilteredInnoRows = if drpInno =null then InsertedTotalAvgPercent else Table.SelectRows(InsertedTotalAvgPercent, each Text.Contains([INNO], drpInno,Comparer.OrdinalIgnoreCase))
in
FilteredInnoRows
...which is refering previous steps. But the only difference between your code and mine is that you are calculating the averages upfront (which i had tried earlier but still same loading time) but they are also still referencing previous steps.
What would be the difference to change the loading time? Is it because the data is referencing a 300K rows CSV file that gets filtered on 5-6 Attrb codes and then these are pivoted as columns?
Hello @Anonymous
this is because of the safety settings. Or you are ignoring them in the settings, or you are using this code instead
let
Source = Excel.CurrentWorkbook(){[Name="Table13"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"INNO", type text}, {"INNO_SORT", Int64.Type}, {"Attrb6", Int64.Type}, {"Attrb5", Int64.Type}, {"Attrb4", Int64.Type}, {"Attrb3", Int64.Type}, {"Attrb1", Int64.Type}}),
InsertedTotal = Table.AddColumn(ChangedType, "Total", each [Attrb6] + [Attrb4], Int64.Type),
AverageAttrb6 = List.Average(InsertedTotal[Attrb6]),
InsertedAttrb6AvgPercent = Table.AddColumn(InsertedTotal, "Attrb6 Avg%", each [Attrb6]/AverageAttrb6,Percentage.Type),
AverageAttrb4 = List.Average(InsertedAttrb6AvgPercent[Attrb4]),
InsertedAttrb4AvgPercent = Table.AddColumn(InsertedAttrb6AvgPercent, "Attrb4 Avg%", each [Attrb4]/AverageAttrb4,Percentage.Type),
AverageTotal = List.Average(InsertedAttrb4AvgPercent[Total]),
InsertedTotalAvgPercent = Table.AddColumn(InsertedAttrb4AvgPercent, "Total%", each [Total]/AverageTotal,Percentage.Type),
GetFilter =
let
Source = Excel.CurrentWorkbook(){[Name="drpInno"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"INNO", type any}})[INNO]{0}
in
#"Changed Type",
FilteredInnoRows = if drpInno =null then InsertedTotalAvgPercent else Table.SelectRows(InsertedTotalAvgPercent, each Text.Contains([INNO], GetFilter,Comparer.OrdinalIgnoreCase))
in
FilteredInnoRows
no, it's the referecing that makes the difference. in a new column, you are refercing a list and it has to be evaluated each time.
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
The error was because of a typo `AverarageAttrb6` in this line:
InsertedAttrb6AvgPercent = Table.AddColumn(InsertedTotal, "Attrb6 Avg%", each [Attrb6]/AverarageAttrb6,Percentage.Type),
After changing that, it is working.
Which list are you referring that is causing the loading issue? How do i resolve it correctly?
Any Idea on how to calculate PERCENTILE in Power Query for the last 3 columns?
Hi @Anonymous ,
For calculating PERCENTILE in Power Query, you can refer to this post:
How do I calculate Percentiles in PowerQuery based on grouping variables?
Best regards
Icey
If this post helps,then consider Accepting it as the solution to help other members find it faster.