Inexact Lookup

rsaprano · ‎01-26-2018

One of the challenges I had working with a client recently was around how to work with ranges/bands of numbers in Power BI. I thought I’d share the approach I took. I was looking to deal with this in the data preparation layer using Edit Queries as this is the most appropriate place to get my data in shape before it is loaded into the data model.

Specifically, there are two key scenarios I will be covering:

Doing an ‘inexact’ lookup: This is like a VLOOKUP in Excel with a TRUE parameter at the end. It looks up a number against a set of bands/ranges and returns the appropriate band which the number sits in. A simple illustration of this is to consider grades for test results (e.g. as shown in the table below): A score below 30 is graded as a U, a score between 30-39 is graded as a F, 40-49 as an E and so on. So, the challenge is to come up with a function that can return the grade for any given score. This scenario also extends to many other cases like working out the appropriate tax rate to apply for a particular amount, or even looking at which date ranges a particular date falls in. I will look at this scenario and also an extension of this, where the bands to be used vary according to a category (e.g. in this exam case, the grade bandings may vary according to subject).

Score	Grade
0	U
30	F
40	E
50	D
60	C
70	B
80	A
90	A+

Finding overlapping ranges: This is the scenario where we have two ranges/bands of numbers, rather than having one number and one set of bands, and we want to see the areas of overlap between the ranges. An example of this might be where we have multiple ranges of dates, and we want to see which/how many days are common between the two sets.

This part deals with the first scenario (inexact lookups) and the second part deals with the second scenario

Inexact Lookup

I was sure someone had tackled this before and sure enough a quick search on the internet revealed that two of the Global experts in M/Power Query – Ken Puls and Miguel Escobar - who wrote the book M is for Data Monkey and do a number of courses on Power Query – had written a function (copied below; link to article here) which I could just copy and paste straight into the Query Editor of Edit Queries (By going to New Source -> Blank Query and pasting it in).

The function code is:

(lookup_value as any, table_array as table, col_index_number as number, optional approximate_match as logical ) as any =>

let

/*Provide optional match if user didn't */

matchtype =

if approximate_match = null

then true

else approximate_match,

/*Get name of return column */

Cols = Table.ColumnNames(table_array),

ColTable = Table.FromList(Cols, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

ColName_match = Record.Field(ColTable{0},"Column1"),

ColName_return = Record.Field(ColTable{col_index_number - 1},"Column1"),

/*Find closest match */

SortData = Table.Sort(table_array,{{ColName_match, Order.Descending}}),

RenameLookupCol = Table.RenameColumns(SortData,{{ColName_match, "Lookup"}}),

RemoveExcess = Table.SelectRows(RenameLookupCol, each [Lookup] <= lookup_value),

ClosestMatch=

if Table.IsEmpty(RemoveExcess)=true

then "#N/A"

else Record.Field(RemoveExcess{0},"Lookup"),

/*What should be returned in case of approximate match? */

ClosestReturn=

if Table.IsEmpty(RemoveExcess)=true

then "#N/A"

else Record.Field(RemoveExcess{0},ColName_return),

/*Modify result if we need an exact match */

Return =

if matchtype=true

then ClosestReturn

else

if lookup_value = ClosestMatch

then ClosestReturn

else "#N/A"

in Return

Exactly like the VLOOKUP function in Excel, this function looks up a number against a table of bands with the minimum threshold for each band as the first column of the table and returns the highest band where the minimum threshold value is less than or equal to the value being looked up. In the example above, the lowest score for each grade has to be the first column in the lookup table – this is why it starts with 0. So, if the lookup value (score) is 35, the highest band which has a minimum threshold value less than or equal to 35 is the row corresponding to grade F (min threshold value of 30). This lookup table has to be sorted by minimum threshold value, so the function sorts the table accordingly, and then returns the value in the Nth column of the dataset where N is a column number passed in as an argument.

So, this function takes as input:

The value to lookup (e.g. the score)
The table array (the table to lookup against; i.e. with each grade and the corresponding minimum threshold value in the first column)
The column number to return
Whether to use an exact or inexact (approximate match) – for this use case where we are looking up against a set of bands/ranges we need to pass in TRUE for this argument, exactly as we would in Excel [N.B: The scenario for an exact match vlookup is better achieved using the merge queries function in Query Editor instead]

To use this function, we go to the table containing the lookup values (the scores), go to ‘Add Column’ in the ribbon and select ‘Invoke Custom Function’. This comes up with the dialog box below:

We give the new column a name and specify the function we want to invoke. For the lookup value, we ensure the column option is selected (the table icon – i.e. telling Power Query to take a value from the current row in the table) and choose the score column. For the table_array we select the table that has scores and grades; for the col_index_number we specify 2 (because the second column in the grades table is the one that contains the grades) and we specify TRUE for requiring an approximate match.,

This then returns the corresponding grade for each score as the new column.

For my actual scenario, the bands depended on a category value – analogous in this example to having different gradings for different subjects. So, for example my scores table now looks like this:

Subject	Score
Math	15
Math	25
Science	67
Science	30
Science	43
Science	85
Science	70
Science	85
Science	90
Science	52
History	15

And my grades table (With subject and minimum threshold value) looks like this:

Lower Score Threshold	Subject	Grade
0	Math	U
30	Math	F
40	Math	E
50	Math	D
60	Math	C
70	Math	B
80	Math	A
90	Math	A+
0	Science	U
20	Science	F
35	Science	E
45	Science	D
60	Science	C
75	Science	B
85	Science	A
95	Science	A+
0	History	U
25	History	F
45	History	E
55	History	D
60	History	C
70	History	B
85	History	A
90	History	A+

The grade thresholds are different for each subject. Note that the minimum score threshold still needs to be in the first column; if this isn’t the case in the source data you can create a new lookup table by referencing the original one and then re-arranging the columns.

You then need to tweak the function to take a category (subject as a column) as an input, and to filter the lookup table for the category value passed in.

The changes I made to the function code are shown to take into account the category are highlighted in red:

(category_col_name as text, category_value as text, lookup_value as any, table_array as table, col_index_number as number, optional approximate_match as logical ) as any =>

let

/*Provide optional match if user didn't */

matchtype =

if approximate_match = null

then true

else approximate_match,

/*Get name of return column */

Cols = Table.ColumnNames(table_array),

ColTable = Table.FromList(Cols, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

ColName_match = Record.Field(ColTable{0},"Column1"),

ColName_return = Record.Field(ColTable{col_index_number - 1},"Column1"),

/*Find closest match */

FilterData = Table.SelectRows(table_array, each Record.Field(_, category_col_name) = category_value),

SortData = Table.Sort(FilterData ,{{ColName_match, Order.Descending}}),

RenameLookupCol = Table.RenameColumns(SortData,{{ColName_match, "Lookup"}}),

RemoveExcess = Table.SelectRows(RenameLookupCol, each [Lookup] <= lookup_value),

ClosestMatch=

if Table.IsEmpty(RemoveExcess)=true

then "#N/A"

else Record.Field(RemoveExcess{0},"Lookup"),

/*What should be returned in case of approximate match? */

ClosestReturn=

if Table.IsEmpty(RemoveExcess)=true

then "#N/A"

else Record.Field(RemoveExcess{0},ColName_return),

/*Modify result if we need an exact match */

Return =

if matchtype=true

then ClosestReturn

else

if lookup_value = ClosestMatch

then ClosestReturn

else "#N/A"

in Return

So, the only elements I added were to pass in the category column/category value, and to filter the lookup table by the category first.

This can now be called as follows:

This now passes in the name of the category column (in this case the text 'Subject’ – type text), and takes the category value from the table with the scores (type column – select the Subject column from the drop-down list).

This function can be used in the same way – copying the code into the Advanced Editor in a blank query and invoking the function as a new column in the table with scores/subjects.

In Part 2 of this blog series we look at finding overlapping ranges: This is the scenario where we have two ranges/bands of numbers (rather than having one number and one set of bands) and we want to see the areas of overlap between the ranges.

Working with ranges (Numbers/Dates) in Power Query/Power BI - Part 1

Inexact Lookup

#PBI10 Dataviz Contest

What Power BI Developers Often Miss and Why Does I...

How to Organize Measures Effectively in Power BI

Re: Coming soon | #PBI10 Dataviz Contest

Microsoft Fabric vs Power BI Licensing: What You N...

Join the #PBI10 DataViz contest

Working with ranges (Numbers/Dates) in Power Query/Power BI - Part 1

Inexact Lookup

#PBI10 Dataviz Contest

What Power BI Developers Often Miss and Why Does I...

How to Organize Measures Effectively in Power BI

Re: Coming soon | #PBI10 Dataviz Contest

Microsoft Fabric vs Power BI Licensing: What You N...