Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
Mark-JZ-Yeap
Helper I
Helper I

Fuzzy matching logic

I have already removed the extensions using this M code in Power Query.

 

let
// Load your Excel file/table
Source = Excel.Workbook(File.Contents("C:\Users\mark.jz.yeap\Documents\ATCi - Mark\MatchingDatasets - 10 Apr 2025\NRF 2024 delegate list.xlsx"), null, true),
Table2_Table = Source{[Item="Table2",Kind="Table"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(Table2_Table, {{"Company Name", type text}}),

// Define the list of suffixes to remove
Suffixes = {
" intl pte ltd", " (Private) Limited", " Pte Lte"," (Pte) Ltd"," PTE LTD", "PTE. LTD.", " Pvt. Ltd.", " PTE. LTD", " Pte Ltd", " Pte. Ltd.", " Private Limited", " Public Company Limited"," COMPANY LIMITED", " Co., LTD.","Co; Ltd", "co.,ltd.", " Co.,LTD"," Co.,Ltd.", " Co.,Ltd"," Co., Ltd.", " Co., Ltd", " CO., LTD.", " Company Limited", " LTD.", " LTDA", " LTD",
" Ltd.", " Ltda", " Ltd", ", Inc.", ", Inc", ",INC."," INC.", " Inc.", " Inc", " INC", " Corporation", " Corporate", " Corp.", " Corp", " Sdn Bhd", " Bhd", " PLC", " Pte", " Limited", " Pty", " limited", " LIMITED",
"PTe", " pte ltd", " CO.", " S.A.", " Plc", "CORPORATION", " LLC", " B.V.", ".com", " G.K.", " GmbH", " AE"
},

// Clean suffixes from each company name
Cleaned = Table.AddColumn(#"Changed Type", "RawAccount", each
Text.Trim(
List.Accumulate(
Suffixes,
[Company Name],
(state, current) => Text.Replace(state, current, "")
)
), type text
)
in
Cleaned

 

After removing those extensions, I need to match using this logic:

When the raw account only has 1 letter in it, then it should be exact match - i.e threshold should be 1

Same for 2 letters

Same for 3 letters

Same for 4 letters

When the raw account name has 5 letters - the threshold should be 0.9

and when it exceeds 5 letters - the threshold should be 0.8


May not need to merge anything.

 

Please assist.

1 ACCEPTED SOLUTION
v-pnaroju-msft
Community Support
Community Support

Thankyou, @lbendlin, for your response.

Hi Mark-JZ-Yeap,

We appreciate your inquiry through the Microsoft Fabric Community Forum.

Based on my understanding, you are seeking to match the RawAccount column against a reference company list using fuzzy logic, with varying similarity thresholds depending on the length of each cleaned name. As Power BI’s fuzzy matching in Power Query does not directly support dynamic thresholds for each row, please follow the steps outlined below, which may assist in resolving the issue:

  1. Continue using your suffix-removal M script to cleanse both the RawAccount and reference list.
  2. Add two columns: one for the string length and another to assign a similarity threshold based on that length.
  3. Split your dataset into three categories: Length ≤ 4 → exact match (threshold = 1.0); Length = 5 → fuzzy match (threshold = 0.9); and Length > 5 → fuzzy match (threshold = 0.8).
  4. Perform merges accordingly using Power Query’s fuzzy matching feature.
  5. Recombine the results using Append Queries.

This method effectively replicates your logic without any manual matching requirements and will scale well.

Additionally, please refer to the following links for more information:
How fuzzy matching works in Power Query - Power Query | Microsoft Learn
Merge queries overview - Power Query | Microsoft Learn

If you find our response helpful, kindly mark it as the accepted solution and provide kudos. This will assist other community members facing similar queries.

Thank you.

View solution in original post

5 REPLIES 5
v-pnaroju-msft
Community Support
Community Support

Thankyou, @lbendlin, for your response.

Hi Mark-JZ-Yeap,

We appreciate your inquiry through the Microsoft Fabric Community Forum.

Based on my understanding, you are seeking to match the RawAccount column against a reference company list using fuzzy logic, with varying similarity thresholds depending on the length of each cleaned name. As Power BI’s fuzzy matching in Power Query does not directly support dynamic thresholds for each row, please follow the steps outlined below, which may assist in resolving the issue:

  1. Continue using your suffix-removal M script to cleanse both the RawAccount and reference list.
  2. Add two columns: one for the string length and another to assign a similarity threshold based on that length.
  3. Split your dataset into three categories: Length ≤ 4 → exact match (threshold = 1.0); Length = 5 → fuzzy match (threshold = 0.9); and Length > 5 → fuzzy match (threshold = 0.8).
  4. Perform merges accordingly using Power Query’s fuzzy matching feature.
  5. Recombine the results using Append Queries.

This method effectively replicates your logic without any manual matching requirements and will scale well.

Additionally, please refer to the following links for more information:
How fuzzy matching works in Power Query - Power Query | Microsoft Learn
Merge queries overview - Power Query | Microsoft Learn

If you find our response helpful, kindly mark it as the accepted solution and provide kudos. This will assist other community members facing similar queries.

Thank you.

lbendlin
Super User
Super User

 I need to match using this logic

What do you need to match?

For example:

RawAccount

3m inc

7-evelen, inc.

Rolex PLC

Avanaid pte ltd

Deloite pty ltd

Here is one, slightly brute force, implementation.

 

lbendlin_0-1745255646871.png

let
  Source = Table.FromRows(
    Json.Document(
      Binary.Decompress(
        Binary.FromText(
          "i45WMs5VyMxLVorViVYy100tS81JzdMBieiBhYLyc1IrFAJ8nME8x7LEvMTMFIWCklSFnJIUsJhLak5+JpBfUFIJEYsFAA==", 
          BinaryEncoding.Base64
        ), 
        Compression.Deflate
      )
    ), 
    let
      _t = ((type nullable text) meta [Serialized.Text = true])
    in
      type table [RawAccount = _t]
  ), 
  Suffixes = List.Buffer(
    {
      " intl pte ltd", 
      " (Private) Limited", 
      " Pte Lte", 
      " (Pte) Ltd", 
      " PTE LTD", 
      "PTE. LTD.", 
      " Pvt. Ltd.", 
      " PTE. LTD", 
      " Pte Ltd", 
      " Pte. Ltd.", 
      " Private Limited", 
      " Public Company Limited", 
      " COMPANY LIMITED", 
      " Co., LTD.", 
      "Co; Ltd", 
      "co.,ltd.", 
      " Co.,LTD", 
      " Co.,Ltd.", 
      " Co.,Ltd", 
      " Co., Ltd.", 
      " Co., Ltd", 
      " CO., LTD.", 
      " Company Limited", 
      " LTD.", 
      " LTDA", 
      " LTD", 
      " Ltd.", 
      " Ltda", 
      " Ltd", 
      ", Inc.", 
      ", Inc", 
      ",INC.", 
      " INC.", 
      " Inc.", 
      " Inc", 
      " INC", 
      " Corporation", 
      " Corporate", 
      " Corp.", 
      " Corp", 
      " Sdn Bhd", 
      " Bhd", 
      " PLC", 
      " Pte", 
      " Limited", 
      " Pty", 
      " limited", 
      " LIMITED", 
      "PTe", 
      " pte ltd", 
      " CO.", 
      " S.A.", 
      " Plc", 
      "CORPORATION", 
      " LLC", 
      " B.V.", 
      ".com", 
      " G.K.", 
      " GmbH", 
      " AE"
    }
  ), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "Cleaned", 
    each List.Accumulate(
      {0 .. List.Count(Suffixes) - 1}, 
      [RawAccount], 
      (s, c) =>
        if Text.Lower(Text.End(s, Text.Length(Suffixes{c}))) = Text.Lower(Suffixes{c}) then
          Text.RemoveRange(s, Text.Length(s) - Text.Length(Suffixes{c}), Text.Length(Suffixes{c}))
        else
          s
    ), 
    type text
  )
in
  #"Added Custom"

How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done". Once you examined the code, replace the entire Source step with your own source.

 

Thank you.

 

What if you can remove those extensions without using hardcore?

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.