Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Mark-JZ-Yeap
Helper I
Helper I

Fuzzy matching logic

I have already removed the extensions using this M code in Power Query.

 

let
// Load your Excel file/table
Source = Excel.Workbook(File.Contents("C:\Users\mark.jz.yeap\Documents\ATCi - Mark\MatchingDatasets - 10 Apr 2025\NRF 2024 delegate list.xlsx"), null, true),
Table2_Table = Source{[Item="Table2",Kind="Table"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(Table2_Table, {{"Company Name", type text}}),

// Define the list of suffixes to remove
Suffixes = {
" intl pte ltd", " (Private) Limited", " Pte Lte"," (Pte) Ltd"," PTE LTD", "PTE. LTD.", " Pvt. Ltd.", " PTE. LTD", " Pte Ltd", " Pte. Ltd.", " Private Limited", " Public Company Limited"," COMPANY LIMITED", " Co., LTD.","Co; Ltd", "co.,ltd.", " Co.,LTD"," Co.,Ltd.", " Co.,Ltd"," Co., Ltd.", " Co., Ltd", " CO., LTD.", " Company Limited", " LTD.", " LTDA", " LTD",
" Ltd.", " Ltda", " Ltd", ", Inc.", ", Inc", ",INC."," INC.", " Inc.", " Inc", " INC", " Corporation", " Corporate", " Corp.", " Corp", " Sdn Bhd", " Bhd", " PLC", " Pte", " Limited", " Pty", " limited", " LIMITED",
"PTe", " pte ltd", " CO.", " S.A.", " Plc", "CORPORATION", " LLC", " B.V.", ".com", " G.K.", " GmbH", " AE"
},

// Clean suffixes from each company name
Cleaned = Table.AddColumn(#"Changed Type", "RawAccount", each
Text.Trim(
List.Accumulate(
Suffixes,
[Company Name],
(state, current) => Text.Replace(state, current, "")
)
), type text
)
in
Cleaned

 

After removing those extensions, I need to match using this logic:

When the raw account only has 1 letter in it, then it should be exact match - i.e threshold should be 1

Same for 2 letters

Same for 3 letters

Same for 4 letters

When the raw account name has 5 letters - the threshold should be 0.9

and when it exceeds 5 letters - the threshold should be 0.8


May not need to merge anything.

 

Please assist.

1 ACCEPTED SOLUTION
v-pnaroju-msft
Community Support
Community Support

Thankyou, @lbendlin, for your response.

Hi Mark-JZ-Yeap,

We appreciate your inquiry through the Microsoft Fabric Community Forum.

Based on my understanding, you are seeking to match the RawAccount column against a reference company list using fuzzy logic, with varying similarity thresholds depending on the length of each cleaned name. As Power BI’s fuzzy matching in Power Query does not directly support dynamic thresholds for each row, please follow the steps outlined below, which may assist in resolving the issue:

  1. Continue using your suffix-removal M script to cleanse both the RawAccount and reference list.
  2. Add two columns: one for the string length and another to assign a similarity threshold based on that length.
  3. Split your dataset into three categories: Length ≤ 4 → exact match (threshold = 1.0); Length = 5 → fuzzy match (threshold = 0.9); and Length > 5 → fuzzy match (threshold = 0.8).
  4. Perform merges accordingly using Power Query’s fuzzy matching feature.
  5. Recombine the results using Append Queries.

This method effectively replicates your logic without any manual matching requirements and will scale well.

Additionally, please refer to the following links for more information:
How fuzzy matching works in Power Query - Power Query | Microsoft Learn
Merge queries overview - Power Query | Microsoft Learn

If you find our response helpful, kindly mark it as the accepted solution and provide kudos. This will assist other community members facing similar queries.

Thank you.

View solution in original post

5 REPLIES 5
v-pnaroju-msft
Community Support
Community Support

Thankyou, @lbendlin, for your response.

Hi Mark-JZ-Yeap,

We appreciate your inquiry through the Microsoft Fabric Community Forum.

Based on my understanding, you are seeking to match the RawAccount column against a reference company list using fuzzy logic, with varying similarity thresholds depending on the length of each cleaned name. As Power BI’s fuzzy matching in Power Query does not directly support dynamic thresholds for each row, please follow the steps outlined below, which may assist in resolving the issue:

  1. Continue using your suffix-removal M script to cleanse both the RawAccount and reference list.
  2. Add two columns: one for the string length and another to assign a similarity threshold based on that length.
  3. Split your dataset into three categories: Length ≤ 4 → exact match (threshold = 1.0); Length = 5 → fuzzy match (threshold = 0.9); and Length > 5 → fuzzy match (threshold = 0.8).
  4. Perform merges accordingly using Power Query’s fuzzy matching feature.
  5. Recombine the results using Append Queries.

This method effectively replicates your logic without any manual matching requirements and will scale well.

Additionally, please refer to the following links for more information:
How fuzzy matching works in Power Query - Power Query | Microsoft Learn
Merge queries overview - Power Query | Microsoft Learn

If you find our response helpful, kindly mark it as the accepted solution and provide kudos. This will assist other community members facing similar queries.

Thank you.

lbendlin
Super User
Super User

 I need to match using this logic

What do you need to match?

For example:

RawAccount

3m inc

7-evelen, inc.

Rolex PLC

Avanaid pte ltd

Deloite pty ltd

Here is one, slightly brute force, implementation.

 

lbendlin_0-1745255646871.png

let
  Source = Table.FromRows(
    Json.Document(
      Binary.Decompress(
        Binary.FromText(
          "i45WMs5VyMxLVorViVYy100tS81JzdMBieiBhYLyc1IrFAJ8nME8x7LEvMTMFIWCklSFnJIUsJhLak5+JpBfUFIJEYsFAA==", 
          BinaryEncoding.Base64
        ), 
        Compression.Deflate
      )
    ), 
    let
      _t = ((type nullable text) meta [Serialized.Text = true])
    in
      type table [RawAccount = _t]
  ), 
  Suffixes = List.Buffer(
    {
      " intl pte ltd", 
      " (Private) Limited", 
      " Pte Lte", 
      " (Pte) Ltd", 
      " PTE LTD", 
      "PTE. LTD.", 
      " Pvt. Ltd.", 
      " PTE. LTD", 
      " Pte Ltd", 
      " Pte. Ltd.", 
      " Private Limited", 
      " Public Company Limited", 
      " COMPANY LIMITED", 
      " Co., LTD.", 
      "Co; Ltd", 
      "co.,ltd.", 
      " Co.,LTD", 
      " Co.,Ltd.", 
      " Co.,Ltd", 
      " Co., Ltd.", 
      " Co., Ltd", 
      " CO., LTD.", 
      " Company Limited", 
      " LTD.", 
      " LTDA", 
      " LTD", 
      " Ltd.", 
      " Ltda", 
      " Ltd", 
      ", Inc.", 
      ", Inc", 
      ",INC.", 
      " INC.", 
      " Inc.", 
      " Inc", 
      " INC", 
      " Corporation", 
      " Corporate", 
      " Corp.", 
      " Corp", 
      " Sdn Bhd", 
      " Bhd", 
      " PLC", 
      " Pte", 
      " Limited", 
      " Pty", 
      " limited", 
      " LIMITED", 
      "PTe", 
      " pte ltd", 
      " CO.", 
      " S.A.", 
      " Plc", 
      "CORPORATION", 
      " LLC", 
      " B.V.", 
      ".com", 
      " G.K.", 
      " GmbH", 
      " AE"
    }
  ), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "Cleaned", 
    each List.Accumulate(
      {0 .. List.Count(Suffixes) - 1}, 
      [RawAccount], 
      (s, c) =>
        if Text.Lower(Text.End(s, Text.Length(Suffixes{c}))) = Text.Lower(Suffixes{c}) then
          Text.RemoveRange(s, Text.Length(s) - Text.Length(Suffixes{c}), Text.Length(Suffixes{c}))
        else
          s
    ), 
    type text
  )
in
  #"Added Custom"

How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done". Once you examined the code, replace the entire Source step with your own source.

 

Thank you.

 

What if you can remove those extensions without using hardcore?

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.