Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Score big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount

vojtechsima

Why Is Power Query List.Contains Slow? Faster Lookup Alternatives

Does List Size Matter?

First of all, I won’t be talking about five-item lists that don’t really cause any major struggle (or do they?). I’m talking about thousands of items you need to scan through to find your value. I’ll share my testing numbers, but your Excel with 10 lookup values is safe. However, if you want to squeeze every bit of performance or build a scalable solution, the list size does matter.

Why is my lookup slow?

If you have a bigger lookup table or list and you apply the lookup for each row, maybe hundreds of thousands or even millions, you’re essentially scanning the whole list every single time. If your lookup comes from an external source and it’s not a parameter or static list, you might even be re-evaluating and then scanning the whole thing on each iteration. So first of all, you’ll want to buffer that thing.

 

If List.Buffer does the magic for you, great. But it still adds extra time, doesn’t it? List.Contains, List.PositionOf or whatever you saw in a tutorial that works fine for a couple of values in your beautiful Excel or Power BI will still spam the same scan over and over. If you get lucky, your value will be at the start of the list and it might be quick, but with real-life luck, it’s going to be last, isn’t it? So yeah, this shouldn’t be a hardcore problem, but why allow extra evaluation time when we have a better solution?

 

Example: 100k lookup values in a buffered list. Then 30k rows using List.Contains to find matches. The first table was populated with values within the first 5% of the list. The second table used values within the last 5%. The run-time difference was 1 second vs 60 seconds. So yeah, with List.Contains the position in the list plays a major role.

 

Test: The speedier alternatives

I played with a bunch of approaches. The goal was simple: add a single column that says if a value exists in a lookup list. You can do this ten different ways. Some are solid. Some are… creative.

 

Testing environment

  • Fact table: a few columns, 1 million rows
  • Lookup table: single column, 20k unique rows
  • Type: text for both
  • Files: CSV on local machine
  • Lookup table: not loaded or included in refresh
  • Timing: measured by how long it takes to refresh the fact table
  • Partial load: I manually stopped the refresh at 2 minutes 25 seconds (2:25) and checked what size of the expected 46 MB had loaded
  • Measurement noise: differences of 1 to 2 seconds are irrelevant and likely just the inaccuracy of manual clicking, UI lag, or background processes

 

The results in a table (I explain below):

vojtechsima_0-1756033445297.png

 

 

* means estimated total time. I stopped it at 2:25, and the time shown reflects how long it would likely take to finish.

 

Explanations:

  1. Initial load of the fact table without any modifications or lookups.
  2. Lookup via List.Contains where I buffer the table, then turn it into a list by referencing the table’s column inside the lookup expression.
  3. Turn the table into a list and buffer it, then have List.Contains reference that list directly.
  4. Get the position of the row’s value in the list. If found, return true, otherwise false.
  5. Use List.Select and then List.IsEmpty for true or false. No List.Contains involved.
  6. Expression like this: table{[lookupField = rowContent]}.
  7. Use Table.Join as LeftOuter, add the lookup value, then use Table.Transform to change nulls to false and found values to true.
  8. Same as 07, only the buffering differs.
  9. Add a join column using a LeftOuter merge, expand the column, then transform the same way as 07 or 08.
  10. Same as 09, only the buffering differs.
  11. Turn the list into a hash map or dictionary as a record. Use Record.Field to find a match. If it errors, return false.
  12. Same hash map as 11, but use Record.FieldOrDefault with a fallback value of false.
  13. Same as 12, only the buffering differs.

 

You can see the List functions got obliterated, and simple merges via the GUI or hand-written M perform way better even though they look like extra overhead. Lastly, the best performance comes from turning the table into a list, converting that list into a record as a hash map or dictionary, and then using Record.FieldOrDefault.

 

Record.FieldOrDefault

What’s this magical specimen? Unlike List.Contains, it doesn’t scan in the background. If you set up your list correctly and turn it into a single record, there’s no scanning, just key mapping. That means when you want to look up or find a particular value, you know exactly where to go immediately.

 

Think of it like this: you know Shrek’s movies, right? If not, imagine your favorite movie. When someone asks, “Hey, which movie has the quote about onions having layers like some creatures?” you instantly reply, “That’s Shrek, when he explains ogres to Donkey.” Why did it come to mind immediately? Because it’s stored in your memory. Records work the same way, the data is stored in memory so the lookup is instant.

 

Now, if you used List.Contains in the same scenario, you would have to go through every single fairytale movie until you finally hit Shrek. That can be painfully slow, and the speed depends on whether Shrek is at the start or buried somewhere at the end.

 

Please note, if you only have a handful of values in a simple list, you can safely use List.Contains since the performance difference will be negligible.

 

Cautions and limitations

Simply replacing List.Contains with Record.FieldOrDefault won’t always speed up your query; it can even make it slower. Records behave a lot like buffering. If you want to turn thousands of rows from a streamed source into a list, you first need to pull everything and materialize it in a record. That breaks streaming, folding, and everything cool about the M engine. So not every scenario makes sense for this technique.

 

If you have a simple source where you’ll just load the data and use it as a lookup table, go for it. If you’re doing heavy transformations with streaming data like REST APIs, be careful. Sometimes a simple join with a few steps is the better call. As always, try to fold everything first, filter down rows, and use every possible optimization before turning it into a record. But as you could see from my simple examples, with the right usage, you can benefit a lot.

 

Turn the list into records

If you want to try this yourself, you need to know how to turn a plain list into a record.

 

The idea is simple: we’ll convert a list into a record where the field names are the list values, and each field’s value is just logical true. This way we can mimic how you’d normally use List.Contains to check if something exists.

 

This is our original list. The main function to flip a list into a record is Record.FromList. Straightforward enough. The tricky part is actually filling in the parameters the right way.

 

Record.FromList(list as list, fields as any) as record

/*

Returns a record given a list of field values and a set of fields. The fields can be specified either by a list of text values, or a record type. An error is thrown if the fields are not unique.

*/

So how do we set the value to TRUE and the field names to the list values?

 

If you try this:

= Record.FromList(list, list)

You’ll get the fields named correctly from your list, but the values won’t be TRUE; they’ll just be the same as the field names.

 

What about this:

= Record.FromList({true}, list)

Nice try, but nope. You need to provide a list of the same length as your fields. If your list has only one value, it works. With more than one, it fails because the first parameter has to mirror the number of fields. In our case, that’s 4.

 

If you don’t wanna hard-code TRUE over and over, use List.Repeat. It repeats whatever list of values you give it as many times as you tell it. So for 4 items, you get this:

= List.Repeat( {true}, 4)

Now let’s make it dynamic. Count how many items are in your original list via List.Count and use that as the repeat count. Stick that into the first parameter of Record.FromList, and the second parameter is just your original list.

= Record.FromList( List.Repeat( {true}, List.Count(list) ), list)

 

Filter with Record.FieldOrDefault instead of List.Contains

How do you ditch List.Contains and switch to Record.FieldOrDefault? Super simple. If you want to filter items that appear in a list, and you’d normally use List.Contains, here’s the quick alternative.

 

Filter with List.Contains:

= Table.SelectRows(changeTypes, each List.Contains( list, [letters] ) )

Filter with Record.FieldOrDefault:

= Table.SelectRows( changeTypes, each Record.FieldOrDefault(rec, [letters], false ) )

Where rec is the constructed record from:

= Record.FromList( List.Repeat( {true}, List.Count(list) ), list)

We then search for the field name based on the value in the current row of the column letters. If that letter isn’t found, the fallback value is false. This gives us a clean, error-free replacement for List.Contains with the same behaviour.