Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Hello,
For each row, I want to check if the id appears anywhere in the previous_id column.
id | previous_id | is_parent |
1 | null | True |
2 | null | True |
3 | 1 | True |
4 | 3 | True |
5 | 2 | False |
6 | 4 | False |
I have done this using List.Contains and List.Distinct, but this step seems slower than it should be (simple operation?).
= Table.AddColumn(#"foo_step", "is_parent", each List.Contains(List.Distinct(#"this_step"[previous_id]), [id]), type logical)
Is there a better way to do this in Power Query?
Thanks,
Herbie
Solved! Go to Solution.
1. If you use List.Buffer, it will speed up your operation
List.Contains(List.Buffer(List.Distinct(#"this_step"[previous_id])), [id])
In my view, List.Distinct is meaningless here. You can omit this.
2. Another way is self merge which is also quite fast. Test the code here.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUVKK1YlWMoIxjIEMQzDLBMgyBrNMgSwjMMsMyDJRio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, previous_id = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"previous_id", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"id"}, #"Changed Type", {"previous_id"}, "Changed Type", JoinKind.LeftOuter),
#"Expanded Changed Type" = Table.ExpandTableColumn(#"Merged Queries", "Changed Type", {"previous_id"}, {"Found"})
in
#"Expanded Changed Type"
But ultimately everything depends upon your dataset and quantum of data and which approach is better for you if something you will have to see on your dataset
3. Otherwise, you can use below approach to make it very fast
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUVKK1YlWMoIxjIEMQzDLBMgyBrNMgSwjMMsMyDJRio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, previous_id = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"previous_id", Int64.Type}}),
Listid = List.Buffer(#"Changed Type"[id]),
Listpid = List.Buffer(#"Changed Type"[previous_id]),
Cnt = List.Count(Listid),
Output = List.Generate(()=>[x=Listid{0},i=0], each [i]<Cnt, each [i=[i]+1, x=Listid{i}], each List.Contains(Listpid,[x])),
Result = Table.FromColumns(Table.ToColumns(#"Changed Type") & {Output},Table.ColumnNames(#"Changed Type") & {"Result"})
in
Result
👍 It's been a pleasure to help you | Help Hours: 11 AM to 9 PM (UTC+05:30)
How to get your questions answered quickly -- How to provide sample data
1. If you use List.Buffer, it will speed up your operation
List.Contains(List.Buffer(List.Distinct(#"this_step"[previous_id])), [id])
In my view, List.Distinct is meaningless here. You can omit this.
2. Another way is self merge which is also quite fast. Test the code here.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUVKK1YlWMoIxjIEMQzDLBMgyBrNMgSwjMMsMyDJRio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, previous_id = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"previous_id", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"id"}, #"Changed Type", {"previous_id"}, "Changed Type", JoinKind.LeftOuter),
#"Expanded Changed Type" = Table.ExpandTableColumn(#"Merged Queries", "Changed Type", {"previous_id"}, {"Found"})
in
#"Expanded Changed Type"
But ultimately everything depends upon your dataset and quantum of data and which approach is better for you if something you will have to see on your dataset
3. Otherwise, you can use below approach to make it very fast
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUVKK1YlWMoIxjIEMQzDLBMgyBrNMgSwjMMsMyDJRio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, previous_id = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"previous_id", Int64.Type}}),
Listid = List.Buffer(#"Changed Type"[id]),
Listpid = List.Buffer(#"Changed Type"[previous_id]),
Cnt = List.Count(Listid),
Output = List.Generate(()=>[x=Listid{0},i=0], each [i]<Cnt, each [i=[i]+1, x=Listid{i}], each List.Contains(Listpid,[x])),
Result = Table.FromColumns(Table.ToColumns(#"Changed Type") & {Output},Table.ColumnNames(#"Changed Type") & {"Result"})
in
Result
👍 It's been a pleasure to help you | Help Hours: 11 AM to 9 PM (UTC+05:30)
How to get your questions answered quickly -- How to provide sample data
Why do you consider solution 3 faster as 1? I imagine*, that 1 does algorithmically the same as 3.
Did you simply test it or is do you have some algorithmic reasoning?
*Documentation is neither detailed, verbose nor comprehensive
Thanks Vijay_A_Verma, all 3 options were faster and you were right, List.Distinct was meaningless.
User | Count |
---|---|
9 | |
8 | |
6 | |
6 | |
6 |