Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
MBZA
Helper I
Helper I

Efficiently iterate over and modify a list

Hi, I am trying to calculate the Levenshtein distance between two text values. There's no built in way to do this that I know of, so I rolled my own function but it is very slow. It requires iterating through a list (really a matrix) and changing the elements one at a time while referencing previous elements at each step.

 

Is there a way I can make this more efficient? The code I'm using is below, with both List.Generate and List.Accumulate versions (the latter is the version commented out).

 

let
levendist = (a as text, b as text) as number =>
let
lena = Text.Length(a),
lenb = Text.Length(b),
olist = List.Buffer(List.Generate(
() => [i=0,j=0,v=0],
each ((lena+1)*[j] + [i]) < ((lena+1) * (lenb+1)),
each [ i = if [i]+1 = (lena+1) then 0 else [i]+1,
j = if [i]+1 = (lena+1) then [j]+1 else [j],
v = if [j] = 0 and [i]+1 <= lena then [i]+1
else if [i] = lena then [j]+1
else 0 ])),
ulist = List.Generate(
() => [i=0, l=olist],
each [i] <= List.Count(olist),
(x) =>
let
curlist = x[l],
current = curlist{x[i]},
next = List.ReplaceRange(
curlist,
x[i],
1,
{
if current[i] = 0 or current[j] = 0 then current[v]
else List.Min({
curlist{current[i] - 1 + (current[j] * (lena+1))} + 1,
curlist{current[i] + ((current[j] - 1) * (lena+1))} + 1,
if Text.Upper(Text.At(a,current[i] - 1)) = Text.Upper(Text.At(b,current[j] - 1))
then curlist{current[i] - 1 + ((current[j] - 1) * (lena+1))}
else curlist{current[i] - 1 + ((current[j] - 1) * (lena+1))} + 1
})
}
)
in
[i=x[i]+1, l=next],
each [l]
)
/*ulist = List.Accumulate(olist, {},
(state, current) =>
List.Combine({
state,
{
if current[i] = 0 or current[j] = 0 then current[v]
else List.Min({
state{current[i] - 1 + (current[j] * (lena+1))} + 1,
state{current[i] + ((current[j] - 1) * (lena+1))} + 1,
if Text.Upper(Text.At(a,current[i] - 1)) = Text.Upper(Text.At(b,current[j] - 1))
then state{current[i] - 1 + ((current[j] - 1) * (lena+1))}
else state{current[i] - 1 + ((current[j] - 1) * (lena+1))} + 1
})
}
})
)*/
in
List.Last(List.Last(ulist))
// List.Last(ulist)
in
levendist

(Is there a way to paste code here without the alignment having to be manually fixed?)

 

Edited to add: here are some results of the function

Text AText BExpected Result
KittenSitting3
SaturdaySunday3
MoopMeep2
HelloGoodbye7
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8s4sKUnNU9JRCgYyMvPSgSxjpVidaKXgxJLSopTESpBUaR6EAZHxzc8vAHJ8U1NBlBFYzCM1JycfyHPPz09JqkwFssyVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Text A" = _t, #"Text B" = _t, #"Expected Result" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Text A", type text}, {"Text B", type text}, {"Expected Result", Int64.Type}})
in
#"Changed Type"
1 ACCEPTED SOLUTION

I've found this answer here:

 

"I would strongly recommend you to not use PowerQuery or VBA for this. There are much much much better libraries in both R and Python for implementing this methodology."


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

View solution in original post

9 REPLIES 9
dufoq3
Super User
Super User

Hi @MBZA, could you provide sample data and expected result please?


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Text AText BResult
KittenSitting3
SaturdaySunday3
MoopMeep2
HelloGoodbye7

 

Results of the current version are fine - it's just speed that I'm not happy with.

What about sample data? 🙂


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

I'm a little confused - the table I provided is the sample data (plus the result of Table.AddColumn(Source, "Result", each LevenshteinDist([Text A],[Text B])) ).

 

Edit:

If you prefer, there's this:

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8s4sKUnNU9JRCgYyMvPSlWJ1opWCE0tKi1ISK0HCpXkgBkjUNz+/ACjim5paAOZ7pObk5AMF3PPzU5IqU5ViYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Text A" = _t, #"Text B" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Text A", type text}, {"Text B", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each LevenshteinDist([Text A],[Text B]))
in
#"Added Custom"

OK, I see, but you should explain what exactly you want to achieve (and how). I see the result but maybe there is another way to achieve same result more efficiently.


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

My data consists of rows by company, reference, date and group. Some of the rows are missing the group, and I'm trying to populate this by finding earlier rows (by date) with a similar reference within the same company.

 

To do this I first do a nested self join on company, then I filter each nested table to remove dates after the current row's date, and rows missing the group. Then, for each row missing the group I calculate the Levenshtein distance between its reference and the reference of each row of the associated nested table. Finally I decide which row has the lowest distance, and keep that row (taking the first in case of ties).

 

I can't do this using a fuzzy merge because of the condition of matching by company and removing later dates. I also found clustering didn't help because the similarity algorithm that it uses doesn't seem to be granular enough (same would go for merging I guess).

 

I am blocked from uploading data to file sharing sites, but the following script covers the entire process and includes sample data. LevenshteinDist is the custom function from the original post.

 

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jVNNc4IwEP0rGc/WfKAix7anTnvRHnpwPGTiFlMlMBC09td3IYRGcZwe2EmWfW/f2yTr9ehxNB4tl3E04aSAssoNsWAqnRvMC8ZnlEU04s2mC5uxA63ehZhEJKsrrUgmjUyh7DFTGjHc8NgFj3l75YxPYqLN50FabEJUnmXaWgAHjSmbUyFwM41d8NArjTrdWQdJKIudwkt9PSAvDnBhSnDKuBN4z5SWXbmgTFCx+JcfCzJzqKixwjpdnZEnXOqsKPMjZGAssflpMGceueDrV1DU1pEX8tzAglG1JpL28/UvAb/S9txX/00pYA+qK5QTIBa963t6+jOIenYn5bnrywV5ICe9TcHiLdn7OxJTLhxiNnMhAC3YvAdVuKqLrbSwJUd5qKFX17m/hW+bVuhM7bGtKutqByWmDJwIZre1sp7Gy4iFCwOaD6e9OVjcSZPbhkvtpEnhmmSgJLmtBHOY+gLVuNIGHwH4Qd7RkwxmGcymxFhdkwxnIxjjgaJmut/nH9n9a9/F7e7Ie2kl95jAfIPY/AI=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Company = _t, Reference = _t, Date = _t, Group = _t, #"Correct group" = _t]),
#"Replaced Value" = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Group"}),
#"Changed Type" = Table.TransformColumnTypes(#"Replaced Value",{{"Company", type text}, {"Reference", type text}, {"Date", type date}, {"Group", type nullable text}, {"Correct group", type nullable text}}),
#"Filter for rows missing group" = Table.SelectRows(#"Changed Type", each ([Group] = null)),
#"Self join" = Table.NestedJoin(#"Filter for rows missing group", "Company", #"Changed Type", "Company", "AllData"),
#"Filter for usable rows" = Table.AddColumn(#"Self join", "AllData filtered", each Table.SelectRows(_[AllData], (x) => (x[Date] < _[Date]) and (x[Group] <> null))),
#"Push reference to nested table" = Table.AddColumn(#"Filter for usable rows", "AllData with ref", each Table.AddColumn(_[AllData filtered], "Outer ref", (x) => _[Reference])),
#"Get Levenshtein Distance" = Table.TransformColumns(#"Push reference to nested table", {{"AllData with ref", each Table.AddColumn(_, "LD", (x) => LevenshteinDist(x[Reference], x[Outer ref]))}}),
#"Get best group" = Table.TransformColumns(#"Get Levenshtein Distance", {{"AllData with ref", each
let
m = List.Min(_[LD]),
filter = Table.SelectRows(_, (x) => x[LD] = m),
v = filter{0}[Group]
in
v}})

in
#"Get best group"

 

I've found this answer here:

 

"I would strongly recommend you to not use PowerQuery or VBA for this. There are much much much better libraries in both R and Python for implementing this methodology."


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

The sample data is a CSV or .XLSX that is called upon as the Source in your work in progress .PBIX


Please provide your work-in-progress Power BI Desktop file (with sensitive information removed) that covers your issue or question completely in a usable format (not as a screenshot). You can upload the PBIX file to a cloud storage service such as OneDrive, Google Drive, Dropbox, or to a Github repository, and then share a file’s URL.

https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...

Please show the expected outcome based on the sample data you provided.

https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...

This allows members of the Forum to assess the state of the model, report layer, relationships, and any DAX applied.

This data covers it:

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8s4sKUnNU9JRCgYyMvPSgSxjpVidaKXgxJLSopTESpBUaR6EAZHxzc8vAHJ8U1NBlBFYzCM1JycfyHPPz09JqkwFssyVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Text A" = _t, #"Text B" = _t, #"Expected Result" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Text A", type text}, {"Text B", type text}, {"Expected Result", Int64.Type}})
in
#"Changed Type"

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.