Solved: Fastest way of checking if column value appears an...

Herbie · ‎09-01-2022

Hello,

For each row, I want to check if the id appears anywhere in the previous_id column.

id	previous_id	is_parent
1	null	True
2	null	True
3	1	True
4	3	True
5	2	False
6	4	False

I have done this using List.Contains and List.Distinct, but this step seems slower than it should be (simple operation?).

= Table.AddColumn(#"foo_step", "is_parent", each List.Contains(List.Distinct(#"this_step"[previous_id]), [id]), type logical)

Is there a better way to do this in Power Query?

Thanks,

Herbie

Vijay_A_Verma · ‎09-01-2022

1. If you use List.Buffer, it will speed up your operation

List.Contains(List.Buffer(List.Distinct(#"this_step"[previous_id])), [id])

In my view, List.Distinct is meaningless here. You can omit this.

2. Another way is self merge which is also quite fast. Test the code here.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUVKK1YlWMoIxjIEMQzDLBMgyBrNMgSwjMMsMyDJRio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, previous_id = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"previous_id", Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"id"}, #"Changed Type", {"previous_id"}, "Changed Type", JoinKind.LeftOuter),
    #"Expanded Changed Type" = Table.ExpandTableColumn(#"Merged Queries", "Changed Type", {"previous_id"}, {"Found"})
in
    #"Expanded Changed Type"

But ultimately everything depends upon your dataset and quantum of data and which approach is better for you if something you will have to see on your dataset

3. Otherwise, you can use below approach to make it very fast

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUVKK1YlWMoIxjIEMQzDLBMgyBrNMgSwjMMsMyDJRio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, previous_id = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"previous_id", Int64.Type}}),
    Listid = List.Buffer(#"Changed Type"[id]),
    Listpid = List.Buffer(#"Changed Type"[previous_id]),
    Cnt = List.Count(Listid),
    Output = List.Generate(()=>[x=Listid{0},i=0], each [i]<Cnt, each [i=[i]+1, x=Listid{i}], each List.Contains(Listpid,[x])),
    Result = Table.FromColumns(Table.ToColumns(#"Changed Type") & {Output},Table.ColumnNames(#"Changed Type") & {"Result"})
in
    Result

👍 It's been a pleasure to help you | Help Hours: 11 AM to 9 PM (UTC+05:30)

How to get your questions answered quickly -- How to provide sample data

View solution in original post

Vijay_A_Verma · ‎09-01-2022

1. If you use List.Buffer, it will speed up your operation

List.Contains(List.Buffer(List.Distinct(#"this_step"[previous_id])), [id])

In my view, List.Distinct is meaningless here. You can omit this.

2. Another way is self merge which is also quite fast. Test the code here.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUVKK1YlWMoIxjIEMQzDLBMgyBrNMgSwjMMsMyDJRio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, previous_id = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"previous_id", Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"id"}, #"Changed Type", {"previous_id"}, "Changed Type", JoinKind.LeftOuter),
    #"Expanded Changed Type" = Table.ExpandTableColumn(#"Merged Queries", "Changed Type", {"previous_id"}, {"Found"})
in
    #"Expanded Changed Type"

But ultimately everything depends upon your dataset and quantum of data and which approach is better for you if something you will have to see on your dataset

3. Otherwise, you can use below approach to make it very fast

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUVKK1YlWMoIxjIEMQzDLBMgyBrNMgSwjMMsMyDJRio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, previous_id = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"previous_id", Int64.Type}}),
    Listid = List.Buffer(#"Changed Type"[id]),
    Listpid = List.Buffer(#"Changed Type"[previous_id]),
    Cnt = List.Count(Listid),
    Output = List.Generate(()=>[x=Listid{0},i=0], each [i]<Cnt, each [i=[i]+1, x=Listid{i}], each List.Contains(Listpid,[x])),
    Result = Table.FromColumns(Table.ToColumns(#"Changed Type") & {Output},Table.ColumnNames(#"Changed Type") & {"Result"})
in
    Result

👍 It's been a pleasure to help you | Help Hours: 11 AM to 9 PM (UTC+05:30)

How to get your questions answered quickly -- How to provide sample data

ImTrainChooChoo · ‎04-18-2025

Why do you consider solution 3 faster as 1? I imagine*, that 1 does algorithmically the same as 3.
Did you simply test it or is do you have some algorithmic reasoning?

*Documentation is neither detailed, verbose nor comprehensive

Herbie · ‎09-01-2022

Thanks Vijay_A_Verma, all 3 options were faster and you were right, List.Distinct was meaningless.

Fastest way of checking if column value appears anywhere in another column

Helpful resources

Join our Fabric User Panel

Fabric Community Update - June 2025

Party with Power BI’s own Guy in a Cube

Fastest way of checking if column value appears anywhere in another column

Helpful resources

Join our Fabric User Panel

Fabric Community Update - June 2025