Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
In the last few months I have been answering dozens of users’ questions. And there were at least three or four of you with different use cases but the same nominator: I have a table with two columns From and To. And for every distinct value of the column From I want to see all nodes which I can reach by traversing the edges From --> To. This problem has a name: transitive closure.
What is a transitive closure?
The omniscient Wikipedia says to transitive closure: “In mathematics, the transitive closure of a binary relation R on a set X is the smallest relation on X that contains R and is transitive.
For example, if X is a set of airports and xRy means "there is a direct flight from airport x to airport y" (for x and y in X), then the transitive closure of R on X is the relation R+ such that x R+ y means "it is possible to fly from x to y in one or more flights". Informally, the transitive closure gives you the set of all places you can get to from any starting place.”
A picture is worth a thousand words:
What we want to achieve is shown on the following image:
The code
Let’s start with our data sample from the previous illustration. It is a table with two columns From and To. Every row is a directed edge from From to To.
It will be recursion, again, I know, but I don’t know any other way to solve this problem in PowerQuery. But before we come to a recursive function, let us create key-value pairs where key is a node and value is a list of all nodes, which you can reach directly from the key. The result won’t be a table but a record instead. (Why do I do that? It will be much faster – you can read more about this performance boost in my recent post Lookup Table in Power Query.)
// get all direct descendants as a record (key-value pairs From ==> To)
DirectDescendantsRecord = List.Accumulate(
FromDistinctList,
[],
(state, current) =>
state &
Expression.Evaluate(
"[" & current & "= Table.Column(Table.SelectRows(BufferedTable, each [From] = current), ""To"")]",
[Table.Column = Table.Column, Table.SelectRows = Table.SelectRows, BufferedTable = BufferedTable, current = current]
)
),
This fast access to all direct descendants of a node will be very helpful in the next step, in the recursive function fnTransitiveRelationList. This function gets a list of all nodes which we should visit and the newly created record with key-value pairs. It goes recursively through all the nodes until the list is empty. As a result, you get all nodes which you can visit from the current node.
// get recursively list of all descendants
fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record) as list =>
let
result =
if List.IsEmpty(toBeDoneList) then
{}
else
let
newToBeDoneList = List.RemoveItems(
// combine all lists together
List.Combine(
// get a list of direct descendants for every value
List.Transform(
toBeDoneList,
each Record.FieldOrDefault(directDescendantsRecord, _, {})
)
),
toBeDoneList
)
in
List.Union({toBeDoneList, newToBeDoneList, @fnTransitiveRelationList(newToBeDoneList, directDescendantsRecord)})
in
result,
And that’s all. The last part is only a call of the recursive function.
// get recursive all descendants of current value
TransitiveClosure = Table.AddColumn(
TableFromList,
"AllDescendantRelations",
(_) =>
let
allRelations = fnTransitiveRelationList({[From]}, DirectDescendantsRecord),
removeItself = List.RemoveItems(allRelations, {[From]}),
toText = Text.Combine(removeItself, ";")
in
toText,
type list
)
And the whole code in one piece.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTJSitWBsIzBLCMgywTMMgayLMAsUyDLDMwyA7LM4SygbCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [From = _t, To = _t]),
ChangedType = Table.TransformColumnTypes(Source,{{"From", type text}, {"To", type text}}),
// buffer the source table
BufferedTable = Table.Buffer(ChangedType),
// create a distinct sorted list of From values
FromDistinctList = List.Sort(List.Distinct(Table.Column(BufferedTable, "From"))),
// get all direct descendants as a record (key-value pairs From ==> To)
DirectDescendantsRecord = List.Accumulate(
FromDistinctList,
[],
(state, current) =>
state &
Expression.Evaluate(
"[" & current & "= Table.Column(Table.SelectRows(BufferedTable, each [From] = current), ""To"")]",
[Table.Column = Table.Column, Table.SelectRows = Table.SelectRows, BufferedTable = BufferedTable, current = current]
)
),
// create a table of distinct From values
TableFromList = Table.FromList(FromDistinctList, Splitter.SplitByNothing(), {"From"}),
// get recursively list of all descendants
fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record) as list =>
let
result =
if List.IsEmpty(toBeDoneList) then
{}
else
let
newToBeDoneList = List.RemoveItems(
// combine all lists together
List.Combine(
// get a list of direct descendants for every value
List.Transform(
toBeDoneList,
each Record.FieldOrDefault(directDescendantsRecord, _, {})
)
),
toBeDoneList
)
in
List.Union({toBeDoneList, newToBeDoneList, @fnTransitiveRelationList(newToBeDoneList, directDescendantsRecord)})
in
result,
// get recursive all descendants of current value
TransitiveClosure = Table.AddColumn(
TableFromList,
"AllDescendantRelations",
(_) =>
let
allRelations = fnTransitiveRelationList({[From]}, DirectDescendantsRecord),
removeItself = List.RemoveItems(allRelations, {[From]}),
toText = Text.Combine(removeItself, ";")
in
toText,
type list
)
in
TransitiveClosure
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.