Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
Context:
I have log files of mutations with values and the modification time (dateTime). I want to create a overview of which rows changed on a certain date and how the individual values changed in comparison to their previous records. The report is out of scope here as I already have that running and tested it with the couple hundred rows I am able to load in. Sadly I cannot provide the dataset nor the original column names or the file because the data is confidential.
What I have (PowerQuery):
I have a dataset, this dataset is filtered. I need a ranking per serial number based on the rows with the highest modification date values per day (so grouped by serial number and then the max of a day). So per SERIAL a bunch of rows are selected and given a ranking based on their MODIFICATION_DATE, if more than one record exists for a SERIAL on a single day the record with the highest MODIFICATION_DATE is the only one taken into account.
This part seems to work and is quite performant (only is run once so quite fast) but put here to make debugging easier:
= let
// LOGTABLE is the original table that is loaded in here.
InitialTable = Table.SelectColumns(LOGTABLE, {"SERIAL", "B", "C","D","E","F","MODIFICATION_DATE"}),
// Removerows can be added here for testing as it the second query is a lot less efficient with row calculations. Example:
// Table.RemoveRows(Table.SelectRows(InitialTable, each [MODIFICATION_DATE] <> null),100,500000-500),
FilteredTable = Table.SelectRows(InitialTable, each [MODIFICATION_DATE] <> null),
AddedDateColumn = Table.AddColumn(FilteredTable, "Date", each Date.From([MODIFICATION_DATE]), type date),
// Groups by SERIALNUMBER/Date (date is without timestamp)
GroupedTable = Table.Group(
AddedDateColumn,
{"SERIAL", "Date"},
{
{"Max_MODIFICATION_DATE_Per_Day", each List.Max([MODIFICATION_DATE]), type datetime},
{"OtherColumns", each _, type table}
}
),
// Sort table and add index per SERIAL
SortedTable = Table.Sort(GroupedTable,{{"SERIAL", Order.Ascending}, {"Date", Order.Ascending}}),
AddIndexColumn = Table.AddIndexColumn(SortedTable, "Rank", 1, 1, Int64.Type),
ResetIndexPerGroup =
Table.Group(AddIndexColumn, {"SERIAL"}, {{"Index", each Table.AddIndexColumn(_, "Rank_Per_SERIAL", 1, 1, Int64.Type), type table}}),
// Expanding columns and data
ExpandedTable = Table.ExpandTableColumn(ResetIndexPerGroup, "Index",
{"Rank_Per_SERIAL", "Max_MODIFICATION_DATE_Per_Day","OtherColumns"},
{"Rank_Per_SERIAL", "Max_MODIFICATION_DATE_Per_Day","OtherColumns"}),
ExpandedTableOtherColumns =
Table.ExpandTableColumn(ExpandedTable, "OtherColumns",
List.Buffer(List.RemoveItems(Table.ColumnNames(ExpandedTable{0}[OtherColumns]), {"SERIAL","Date"}))),
FinalTableColumns = Table.SelectRows(ExpandedTableOtherColumns, each [MODIFICATION_DATE] = [Max_MODIFICATION_DATE_Per_Day]),
FinalTable = Table.RemoveColumns(FinalTableColumns,"Max_MODIFICATION_DATE_Per_Day")
in
FinalTable
And here the problem query.
= Table.AddColumn(
// Source refers the the result of the previous query.
Source,
"ChangeIndicators",
each (
let
// Function used in calculating the change indication value.
CalculateChangeType = (CurrentValue, PreviousValue, RowRankWithinSERIAL as number) as number =>
if RowRankWithinSERIAL > 1 then
if CurrentValue = PreviousValue then 0
else if (CurrentValue = null or CurrentValue = "") and
PreviousValue <> null then 3
else 2
else 1,
// *THIS VARIABLE DROPS PERFORMANCE AS IT LOOKS OVER THE ENTIRE SOURCE TABLE*
PreviousRow = Table.SelectRows(Source, (q) => q[#"SERIAL"] = [#"SERIAL"] and q[Rank_Per_SERIAL] = [Rank_Per_SERIAL]-1),
B_changeIndicator =
if [Rank_Per_SERIAL] = 1 then 1
else CalculateChangeType([#"B"], try List.First(PreviousRow[#"B"]) otherwise "", [Rank_Per_SERIAL]),
C_changeIndicator =
if [Rank_Per_SERIAL] = 1 then 1
else CalculateChangeType([#"C"], try List.First(PreviousRow[#"C"]) otherwise "", [Rank_Per_SERIAL]),
D_changeIndicator =
if [Rank_Per_SERIAL] = 1 then 1
else CalculateChangeType([#"D"], try List.First(PreviousRow[#"D"]) otherwise "", [Rank_Per_SERIAL]),
E_changeIndicator =
if [Rank_Per_SERIAL] = 1 then 1
else CalculateChangeType([#"E"], try List.First(PreviousRow[#"E"]) otherwise "", [Rank_Per_SERIAL]),
F_changeIndicator =
if [Rank_Per_SERIAL] = 1 then 1
else CalculateChangeType([#"F"], try List.First(PreviousRow[#"F"]) otherwise "", [Rank_Per_SERIAL]),
resultRecord = [
B_changeIndicator = B_changeIndicator,
C_changeIndicator = C_changeIndicator,
D_changeIndicator = D_changeIndicator,
E_changeIndicator = E_changeIndicator,
F_changeIndicator = F_changeIndicator],
resultTable = Table.FromRecords({resultRecord})
in resultTable
)
)
The final queries are of less importance but for ease of debugging I will provide them:
= Table.ExpandTableColumn(#"Added Custom", "ChangeIndicators", {"B_changeIndicator", "C_changeIndicator", "D_changeIndicator", "E_changeIndicator", "F_changeIndicator"}, {"ChangeIndicators.B_changeIndicator", "ChangeIndicators.C_changeIndicator", "ChangeIndicators.D_changeIndicator", "ChangeIndicators.E_changeIndicator", "ChangeIndicators.F_changeIndicator"})
I would really appreciate it if someone would be able to help me signifcinatly improve performance as the current dataset is large enough that it's not loading with the current implementation (I had everything in DAX first, had the same issue so moved to powerquery and managed to optimize the ranking but not the change calculations).
Power Query runs on disk, so is inherently slow when you force it to seek to the end of the data. Either use copious amounts of Table.Buffer, or do your computations in DAX (which runs in memory).
if you like more help, provide sample data that fully covers your issue.
Please show the expected outcome based on the sample data you provided.
User | Count |
---|---|
84 | |
76 | |
74 | |
48 | |
39 |
User | Count |
---|---|
114 | |
56 | |
51 | |
42 | |
42 |