Re: Slow Power Query - Calculating overallocations

matthias_vc · ‎08-30-2022

For a new Project I'm trying to visualise overallocations using Power BI.
Basically an overallocation occurs whenever a Resource is used within a timeframe on two different activities.
What i basically want to calculate is the OverallocationCount as it is shown below.

OverallocationCount = Amount of Tasks with the same resource that overlap in timeframe with the current activity:

TASKID	RSRCID	START	END	OverallocationCount
1	1	01/09/2022 12:00	15/09/2022 18:00	4
2	1	12/09/2022 12:00	20/09/2022 17:00	3
3	1	10/09/2022 12:00	12/09/2022 12:00	2
4	1	12/09/2022 17:00	13/09/2022 18:00	3
5	1	21/09/2022 08:00	25/09/2022:17:00	1
6	2	01/09/2022 12:00	15/09/2022 18:00	1
7	2	15/09/2022 18:00	20/09/2022 17:00	1

What I tried now was this:

- Merge table Back onto itself based on resource Id(RSRCID) (With Table.InnerJoin so it creates a table for each row)
- Create a Function that has as input a Table(InputTable) and a TaskId which counts Overallocations

(InputTable,TaskId)=>
let
    StartDate = #"InputTable"{[TASK_ID=TaskId]}[START],
    StopDate = #"InputTable"{[TASK_ID=TaskId]}[END],
    #"Filtered Rows" = Table.SelectRows(InputTable, each ([START] < StopDate and [END]>StartDate)),
    #"Counted Rows" = Table.RowCount(#"Filtered Rows")
in
    #"Counted Rows"

I can then implement this function for every row in the original table

= Table.AddColumn(#"Merged Queries", "Calculate Overallocations", each #"Calculate Overallocations"([RSRCTable], [TASK_ID]))

So this works, I don't get any errors in the Powerquery preview, but it takes ages to refresh my data.

So obvious question, what would be the best way to improve my query refresh? I think my current calculation is N² (with 150k Rows) so this isn't ideal, but I don't think any algorythm can do better than N².
Would a Table.Buffer help? Would a Table.Join instead of a Table.Innerjoin help? Would it be better to create a List.Aggregate Function? Should I never try to (pre)calculate this with PowerQuery and use a measure instead?

AlexisOlson · ‎08-30-2022

I think the StartDate and StopDate calculations can be eliminated by transforming the row (as a record) instead of a function on a table and ID.

Here's what I mean without creating a separate transformation query:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("lY9BCsAgDAS/UjwX3Ky12n5F/P83SkpEqLn0kMvusExaCxL29yARVyTITXgDGucZVY363gKNFy48MaMy+DR4rPvfCeUPZ78YnxyfbDynP6r5ZMfn1OLHv8X4pfT+7Q8=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [TASKID = _t, RSRCID = _t, START = _t, END = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"TASKID", Int64.Type}, {"RSRCID", Int64.Type}, {"START", type datetime}, {"END", type datetime}}, "en-IN"),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"RSRCID"}, #"Changed Type", {"RSRCID"}, "RSRCTable", JoinKind.LeftOuter),
    #"Added Custom" = Table.AddColumn(#"Merged Queries", "Overallocation", (row) => Table.RowCount(Table.SelectRows(row[RSRCTable], each [START] < row[END] and [END] > row[START])), Int64.Type),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"RSRCTable"})
in
    #"Removed Columns"

This has similar computation complexity in theory to what you're doing but might be a bit more efficient in practice (though I've not tried it on a large example).

Table.Buffer might help but it isn't obvious to me how best to use it in this case.