Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
adbm
Helper I
Helper I

Pattern detection in textual data

Hi

 

I have a dataset of the purchase register of a company. I have over 120k rows of various purchases made by a company over a 6 month period. Each row is a purchase from a specific vendor with distinct vendor invoice number, amount, material purchased etc. There are multiple rows of the same vendor with different invoice numbers for different purchases made during that period. Each vendor has their own system of numbering their invoices so the vendor invoice column is a multi-character column with different alpha-numeric and special character (like - or / ) combinations. 

 

My question is, can power bi, detect by vendor, a pattern of the alpha-number/special character string and throw up any deviation from the same, by vendor? If yes, can someone help me go about it. Currently i've sorted my data in power query and then looking at it in the matrix visual but its taking ages to manually go through the matrix for each vendor and pick out outliers from the pattern if any.

 

Thank you!

 

@EnterpriseDNA - Thank you for your videos on scatter plot, they were very useful for my other analysis re numbers. If you have any thoughts on the above, would love to hear them too.

3 REPLIES 3
Anonymous
Not applicable

Hi @adbm ,

 

Thanks Kedar_Pande  for the quick reply and solution. I have some other ideas to add:

(1) This is my test data.

vtangjiemsft_0-1735104542174.png

(2) In Power Query, click “Advanced Editor” to copy and paste the following code, you can check the steps in the right step column.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jdCxCsIwEAbgVymZG/InUXBt1YKDjiVaOhTsIBSFUvr83hUMJtTWKRyX78/lqkqU7fP+6pNMpOJ0KSWglYGxVGoAdJyboe0fTZc4KpxzdAUK3KvTGW4834b8+hcHrA8wYcBtISCnHjNJKQp6qkLNT7CSFKGWsOEqmj1fw0zHUWGaexfi/W/Mrex75zaa+sB/nv3xh/p922jm4yq1G2+jVReRrd8=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Vendor = _t, #"Invoice Number" = _t, Amount = _t, #"Material Purchased" = _t, #"Invoice Format" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Vendor", type text}, {"Invoice Number", type text}, {"Amount", Int64.Type}, {"Material Purchased", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "position-", each Text.PositionOf([Invoice Number], "-")),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "position/", each Text.PositionOf([Invoice Number], "/")),
    #"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Added Custom1", {{"position-", type text}, {"position/", type text}}, "en-US"),{"position-", "position/"},Combiner.CombineTextByDelimiter("&", QuoteStyle.None),"-&/"),
    #"Added Custom2" = Table.AddColumn(#"Merged Columns", "Custom", each Text.PositionOf([Invoice Format], "-")),
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Custom.1", each Text.PositionOf([Invoice Format], "/")),
    #"Merged Columns1" = Table.CombineColumns(Table.TransformColumnTypes(#"Added Custom3", {{"Custom", type text}, {"Custom.1", type text}}, "en-US"),{"Custom", "Custom.1"},Combiner.CombineTextByDelimiter("&", QuoteStyle.None),"Format-&/"),
    #"Added Conditional Column" = Table.AddColumn(#"Merged Columns1", "Flag", each if [#"-&/"] = [#"Format-&/"] then "Correct" else "Error")
in
    #"Added Conditional Column"

 

vtangjiemsft_1-1735104841775.png

(3) Create slicers in order to filter the correct invoices.

vtangjiemsft_2-1735104880958.png

If I have misunderstood you, please provide example input data and expected output.

 

Best Regards,

Neeko Tang

If this post  helps, then please consider Accept it as the solution  to help the other members find it more quickly. 

 

Kedar_Pande
Super User
Super User

@adbm 

Create a column to extract patterns in Power Query:

Pattern = Text.Length([Vendor Invoice]) & "-" & Text.Select([Vendor Invoice], {"A".."Z"}) & "-" & Text.Select([Vendor Invoice], {"0".."9"})

Group data by Vendor and summarize unique patterns.

Add calculated columns:

CommonPattern = 
CALCULATE(MAX('Table'[Pattern]), FILTER('Table', 'Table'[Vendor] = EARLIER('Table'[Vendor])))
IsDeviation = 
IF('Table'[Pattern] <> 'Table'[CommonPattern], "Yes", "No")

For complex patterns, consider Python/R in Power BI or external anomaly detection

 

💌 If this helped, a Kudos 👍 or Solution mark would be great! 🎉
Cheers,
Kedar
Connect on LinkedIn

 

Thanks Kedar, just a follow up question -

the column where you're extracting the pattern, where in the formula are we building special characters to be recognised?

 

Thanks,

Natali

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.