Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Hello everybody
I am working on a recursive function to build a hierarchy from Active Directory. Execution does not halt. I am not able to understand the mistake that I am committing. Please help
let
Source = ActiveDirectory.Domains("go.johnsoncontrols.com"),
Go.JohnsonControls.com = Source{[Domain="Go.JohnsonControls.com"]}[#"Object Categories"],
user1 = Go.JohnsonControls.com{[Category="user"]}[Objects],
finaltable= Table.FromRecords({[top.cn="",top.directReports.cn=""]}),
#"Removed Columns" = Table.RemoveColumns(user1,{"user", "organizationalPerson", "person", "jciUser", "msExchOmaUser", "msExchBaseClass", "msExchIMRecipient", "msExchCertificateInformation", "msExchMultiMediaUser", "msExchMailStorage", "msExchCustomAttributes", "shadowAccount", "posixAccount", "msDS-CloudExtensions", "securityPrincipal", "mailRecipient", "distinguishedName"}),
#"Expanded top" = Table.ExpandRecordColumn(#"Removed Columns", "top", {"cn", "directReports"}, {"top.cn", "top.directReports"}),
somestep=Table.SelectRows(#"Expanded top",each [top.directReports]<>null),
getreportees=(adtable as table, idlist as list, n as number ) =>
let
#"Filtered Rows1" = Table.SelectRows(adtable, each List.Contains(idlist,[top.cn]) ),
#"Expanded top.directReports" = Table.ExpandListColumn(#"Filtered Rows1", "top.directReports"),
#"Expanded top.directReports1" = Table.ExpandRecordColumn(#"Expanded top.directReports", "top.directReports", {"cn", "ProvStatus"}, {"top.directReports.cn","top.directReports.ProvStatus"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded top.directReports1", each [top.directReports.ProvStatus] = "active" or [top.directReports.ProvStatus] = "Active"),
finaltable =
if Table.IsEmpty(#"Filtered Rows") then finaltable
else Table.Combine({finaltable, Table.RemoveColumns(#"Filtered Rows",{"top.directReports.ProvStatus","displayName"})}),
y=
if Table.IsEmpty(#"Filtered Rows") and n = 0
then #"Filtered Rows"
else if Table.IsEmpty(#"Filtered Rows") and n > 0 then finaltable
else @getreportees(somestep,#"Filtered Rows"[top.directReports.cn],n+1)
in
y, luck=getreportees(somestep,{"cparabn"},0)
in
luck
Are you sure this hierarchy does not have any weird issues such as a circular relationship?
Could 1 report to 2, 2 report to 3, and 3 report to 1? Build that as a condition to end your loop just in case. If you find any person that is already on the list, end your loop. You could also just stop it at n equal to a little over the max number of levels you think you should have and see if there may be some issues.
See example.
let
Source = Binary.FromText("W3siRW1wbG95ZWUgSUQiOjEsIk1hbmFnZXIgSUQiOjJ9LHsiRW1wbG95ZWUgSUQiOjIsIk1hbmFnZXIgSUQiOjN9LHsiRW1wbG95ZWUgSUQiOjMsIk1hbmFnZXIgSUQiOjR9LHsiRW1wbG95ZWUgSUQiOjQsIk1hbmFnZXIgSUQiOjV9LHsiRW1wbG95ZWUgSUQiOjUsIk1hbmFnZXIgSUQiOjF9LHsiRW1wbG95ZWUgSUQiOjYsIk1hbmFnZXIgSUQiOjEwfSx7IkVtcGxveWVlIElEIjo3LCJNYW5hZ2VyIElEIjoxMH0seyJFbXBsb3llZSBJRCI6OCwiTWFuYWdlciBJRCI6MTB9LHsiRW1wbG95ZWUgSUQiOjksIk1hbmFnZXIgSUQiOjEwfSx7IkVtcGxveWVlIElEIjoxMCwiTWFuYWdlciBJRCI6bnVsbH1d"),
#"Imported JSON" = Json.Document(Source,1252),
#"Converted to Table" = Table.FromList(#"Imported JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
Expand = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"Employee ID", "Manager ID"}, {"Employee ID", "Manager ID"}),
priorStep = Table.AddColumn(Expand, "tree", each {[Employee ID]}),
build_tree = (tbl as table) =>
let
update_tree =
Table.TransformColumns(
tbl,
{
"tree",
each _ & Table.SelectRows(tbl, (T) => T[Employee ID] = List.Last(_))[Manager ID]}),
loop =
if update_tree = tbl or List.Transform(update_tree[tree], each List.Distinct(_)) = tbl[tree]
then update_tree
else @build_tree(update_tree)
in
loop,
final = build_tree(priorStep),
#"Extracted Values" = Table.TransformColumns(final, {"tree", each Text.Combine(List.Transform(_, Text.From), ","), type text})
in
#"Extracted Values"
Hello spinfuzer
I used your code(modified the source to AD as input). Your code works fine. Unfortunately the AD implementation in my org is convoluted and the response is unacceptable. I was able to part about 2000 records in 2 hours. Hence I will not be able to use your code.
Thank you for the code though. I learnt something new!
Table.Join with the LeftHash algorithm seems to work a lot faster.
Actually NestedJoin works faster, had to correct the query.
(c_list as list, p_list as list) =>
let
Source =
Table.AddIndexColumn(
Table.TransformColumnTypes(
Table.FromRows(
List.Zip({c_list,p_list}),
{"c", "p"}
),
{
{"c", type text},
{"p", type text}
}
)
, "path_index_column"
),
parent_table = Table.AddKey(Table.Buffer(Table.SelectRows(Source, each [p] <> null)), {"c"},true),
add_path_col = Table.RemoveColumns(Table.AddColumn(Source,"path",each {[c]}), "p"),
fnPath =
(tbl as table) =>
let
//join = Table.Join(tbl,"c",Table.PrefixColumns(parent_table,"n"),"n.c", JoinKind.LeftOuter, 3),
//remove_c = Table.RemoveColumns(join, {"c","n.c", "n.path_index_column"}),
//before_update_path = Table.RenameColumns(remove_c, {{"n.p", "c"}, {"path","old_path"}}),
join = Table.NestedJoin(tbl,"c",parent_table,"c", "merged", JoinKind.LeftOuter),
remove_c = Table.RemoveColumns(join, {"c"}),
before_update_path = Table.RenameColumns(Table.ExpandTableColumn(remove_c, "merged", {"p"}, {"c"}),{"path","old_path"}),
update_path = Table.Buffer(Table.AddColumn(before_update_path, "path", each if [c] <> null then [old_path] & {[c]} else [old_path])),
//update_path = Table.ReplaceValue(before_update_path, each [path], each if [c] <> null then [path] & {[c]} else [path], Replacer.ReplaceValue, {"path"}),
new_child = update_path[c],
old_child = join[c],
final =
if List.NonNullCount(new_child) < List.NonNullCount(old_child)
then @ fnPath(Table.RemoveColumns(update_path, "old_path"))
else if List.NonNullCount(new_child) = 0
then
Table.SelectColumns(
Table.TransformColumns(
Table.AddColumn(
Table.SelectColumns(update_path,{"path_index_column","path"}),
"child",
each [path]{0}
),
{"path", each Text.Combine(_, ",")}
),
{"path_index_column","child","path"}
)
else if
List.Transform(
Table.SelectRows(update_path, each [c] <> null)[path],
List.Distinct
)
<>
List.Transform(
Table.SelectRows(update_path, each [c] <> null)[old_path],
List.Distinct
)
// the condition above checks for circular relationships
then @ fnPath(Table.RemoveColumns(update_path, "old_path"))
else
Table.SelectColumns(
Table.TransformColumns(
Table.AddColumn(
Table.SelectColumns(update_path,{"path_index_column","path"}),
"child",
each [path]{0}
),
{"path", each Text.Combine(_, ",")}
),
{"path_index_column","child","path"}
)
in
final,
output = fnPath(add_path_col)
// Table.FromRows(
// List.Zip({c_list,fnPath(add_path_col)}),
// {"child_column","path_column"}
// )
in
output
cannot confirm. Left Join performs worse than Nested Join
I also tried Chris Webb's suggestion on the inner join with sort merge (mode 2) but that is underwhelming as well
Testing with 123K+ rows with up to 5 levels on a customer account hierarchy. I am joining up to 5 times maximum instead of doing selectrows 123x5 times.
challenge accepted 🙂 I am pretty sure our hierarchy is more ragged than yours (plus a couple more levels)
Testing on a dummy set right now.
The dataset probably looks something like this:
let
rows = 100000,
gap = 10000,
Source = Table.FromColumns({{1 .. rows},{gap .. (rows-gap)}&List.Repeat({null},gap)},{"EN","MN"}),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"EN", type text}, {"MN", type text}})
in
#"Changed Type"
Here is an approach that seems to work rather well (56 seconds for 100K records with up to 9 levels). The join type doesn't seem to matter as it goes against the buffer anyway.
let
Source = Table.AddKey(
Table.Buffer(
Sql.Database(
"server",
"database",
[
Query = "select employeenumber [EN],manageremployeenumber [MN] from table",
CreateNavigationProperties = false
]
)
),
{"EN"},
true
),
// Group by manager
Collapse = (tbl) =>
Table.ExpandTableColumn(
Table.NestedJoin(
Table.RenameColumns(
Table.Group(
tbl,
{"MN"},
{{"Rows", each _, type table [EN = nullable text, Rows = table, MN = nullable text]}}
),
{{"MN", "EN"}}
),
{"EN"},
Source,
{"EN"},
"Join",
JoinKind.LeftOuter
),
"Join",
{"MN"}
),
// Group until no more row difference
CollapseAll = (tbl) =>
let
s = Collapse(tbl)
in
if Table.RowCount(tbl) = Table.RowCount(s) then s else @CollapseAll(s),
Step1 = CollapseAll(Source),
#"Added Custom" = Table.AddColumn(Step1, "Path", each ""),
// construct reverse path
Expand = (tbl) =>
Table.ReplaceValue(
Table.ExpandTableColumn(
Table.SelectColumns(tbl, {"Rows", "Path"}),
"Rows",
{"EN", "Rows", "MN"}
),
each [Path],
each
if [MN] = [EN] then
""
else if [Path] = "" then
[MN] & "|" & [EN]
else
[Path] & "|" & [EN],
Replacer.ReplaceValue,
{"Path"}
),
// Expand until no more nested tables
ExpandAll = (tbl) =>
let
e = Expand(tbl)
in
if e[Rows]{0} = null then e else @ExpandAll(e),
Step2 = ExpandAll(#"Added Custom")
in
Step2
It goes up and down the hierarchy. On the way up it groups by manager (to avoid duplication of effort) and on the way down it assembles the Path variable.
It looks like I wrote my query wrong with my NestedJoin (was refering an old table which put me in an infinite loop). NestedJoin does perform better when I made the correction.
Interesting approach with the group by.
I am thinking about a different approach. Instead of recalculating the hierarchy for every single user it might be sufficient to create filtered intermediate tables. So let's say you have 100 employees with 10 managers and 2 VPs you should be able to reduce the processing by first calculate the lineage for the 10 managers etc rather than the 100 employees.
looks like your
#"Filtered Rows"
table is never empty?
Thank you Ibendlin. I checked. If I pass a user who does not any employee reporting to him/her, the table #"Filtered Rows" is empty. Which means after a few passes, the query should halt.
will you be able to provide some sample data including ragged branches and orphans?
Alternatively you could do this in DAX with the built-in PATH function.
Hi Ibendlin
I will not be able to share the data unfortunately. I have been mulling over the recursive function over last couple of days. I have narrowed down the problem to this line
else @getreportees(somestep,#"Filtered Rows"[top.directReports.cn],n+1)
I am passing the list of all managers who have reporting. I am doing this everytime the recursive function is called. I dont know the details of M. when I pass "somestep" does the parser invoke the connection to AD and execute all the steps prior to somestep?
This might be of interest to you Replicating PATH function of DAX in Power Query / M – Power BI & Excel are better together (excelnac...
It's a sound theoretical approach but hits some severe performance issues with real world data.
M Language Operators - PowerQuery M | Microsoft Learn
A cyclic value has an infinite expansion when applying structural recursion. The semantics of M makes no special accommodations for such infinite expansions—an attempt to compare cyclic values for equality, for instance, will typically run out of resources and terminate exceptionally.
I used the PATH function as per your suggestion. However the query executes grinds to a halt. It parsed 8000 rows over 8 hours. So this wont work for me 😞
I did some tests with real world data (100K employees, up to 9 hierarchy layers, as ragged as it gets). Both the recursive function and the version based on List.Generate perform abysmally .
Recursive:
let
Source = Table.AddKey(
Table.Buffer(
Sql.Database(
"server",
"db",
[
Query = "select employeenumber [EN],manageremployeenumber [MN] from table",
CreateNavigationProperties = false
]
)
),
{"EN"},
true
),
Path = (c, pt) =>
let
p = Table.SelectRows(Source, each [EN] = c)[MN]{0}
in
if p = c then pt else @ Path(p, p & "|" & pt),
#"Added Custom" = Table.AddColumn(Source, "Path", each Path([MN], [MN] & "|" & [EN]))
in
#"Added Custom"
List.Generate:
let
Source = Table.AddKey(
Table.Buffer(
Sql.Database(
"server",
"db",
[
Query = "select employeenumber [EN],manageremployeenumber [MN] from table",
CreateNavigationProperties = false
]
)
),
{"EN"},
true
),
#"Added Custom" = Table.AddColumn(Source,"Path",
each Text.Combine(
List.Reverse({[EN]} & List.Generate(
() => [c = [EN], p = [MN]],
each [c] <> [p],
each [c = [p], p = Table.SelectRows(Source, (k) => k[EN] = [p])[MN]{0}],
each [p]
)
),"|"),type text)
in
#"Added Custom"
No idea where else to apply a buffer to make it at least a little faster.
Still thinking through the idea to store and reuse prior results.
Thank you Ibendlin. I agree with you that this seems to be a power query issue. Earlier we had a DotNet code connecting to active directory and the program used to return results in seconds.
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 17 | |
| 9 | |
| 9 | |
| 7 | |
| 7 |