Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
jscivias
Helper I
Helper I

Comparing two databases and removing rows of identical (& similar) entries from one database

Hello, suppose I have two different excel files:

 

File 1:  File 2: 
Company Type  Company Type 
AAAA AAA Inc.B
BBBA BBBB
CCCA DDDB

 

I have appended these two files into one, but I would like power bi to remove rows from File 1 if the same company is found in File 2, i.e., since BBB is found in both files, keep only BBB Type B and delete BBB Type A in my combined database. How would I go about in doing this (either using power query or DAX)? 

 

Secondly, is there also a way to apply fuzzy matching and also get power BI to remove AAA Type A since AAA Inc. Type B exists? (suppose they are same entities but data entry issues led to this discrepancy). 

 

I hope i'm not being too unclear. Thank you for your help 

1 ACCEPTED SOLUTION
v-yueyunzh-msft
Community Support
Community Support

Hi, @rsbin 

You want to remove obscure data as well as duplicate data (first table is removed first). Right?

Here are the steps you can refer to in Power Query Editor:

(1)This is my test data:

vyueyunzhmsft_1-1665027532219.png

vyueyunzhmsft_2-1665027540062.png

 


(2) For fuzzy data, we can first let the data get back to normal format.

We can use this M language:

= Table.TransformColumns(test, {"Company",(x)=>Text.Split(x," "){0}   }  )

vyueyunzhmsft_0-1665027508171.png

(3)Then we can remove the duplicates between the two tables and combine the two tables (we can create a blank query and enter):

= Table.SelectRows(Sheet2 , (x)=>not List.ContainsAny({x[Company]},List.Intersect({Sheet2[Company],Sheet3[Company]}) )   ) & Sheet3

(4)Then we can meet your need , the result is as follows:

vyueyunzhmsft_3-1665027648484.png

 

If this method does not meet your needs, you can provide us with your special sample data and the desired output sample data in the form of tables, so that we can better help you solve the problem.

 

Best Regards,

Aniya Zhang

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

View solution in original post

2 REPLIES 2
v-yueyunzh-msft
Community Support
Community Support

Hi, @rsbin 

You want to remove obscure data as well as duplicate data (first table is removed first). Right?

Here are the steps you can refer to in Power Query Editor:

(1)This is my test data:

vyueyunzhmsft_1-1665027532219.png

vyueyunzhmsft_2-1665027540062.png

 


(2) For fuzzy data, we can first let the data get back to normal format.

We can use this M language:

= Table.TransformColumns(test, {"Company",(x)=>Text.Split(x," "){0}   }  )

vyueyunzhmsft_0-1665027508171.png

(3)Then we can remove the duplicates between the two tables and combine the two tables (we can create a blank query and enter):

= Table.SelectRows(Sheet2 , (x)=>not List.ContainsAny({x[Company]},List.Intersect({Sheet2[Company],Sheet3[Company]}) )   ) & Sheet3

(4)Then we can meet your need , the result is as follows:

vyueyunzhmsft_3-1665027648484.png

 

If this method does not meet your needs, you can provide us with your special sample data and the desired output sample data in the form of tables, so that we can better help you solve the problem.

 

Best Regards,

Aniya Zhang

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

rsbin
Super User
Super User

@jscivias ,

I believe I have a partial solution, at least to the first part of your question.  In Power Query involves using the Group By and Index functions:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnR0VNJRclSK1YlWcnJygrOdnZ3hbKAaBc+8ZD2ggBOSQgjbxcUFwo4FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Company " = _t, #"Type " = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Company ", type text}, {"Type ", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Type ", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Company "}, {{"Count", each _, type table [#"Company "=nullable text, #"Type "=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([Count],"Index",0)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Custom", type any}}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Changed Type1", "Custom", {"Company ", "Type ", "Index"}, {"Custom.Company ", "Custom.Type ", "Custom.Index"})
in
    #"Expanded Custom"

This was my source:

https://radacad.com/create-row-number-for-each-group-in-power-bi-using-power-query

Custom.CompanyCustom.TypeCustom.Index

AAA Inc. B 0
BBB B 0
BBB A 1
DDD B 0
AAA A 0
CCC A 0

 

Important first step was to sort your Type column in Descending order so the "B's" come out as 0.

This then enables you to filter out the 1's from the Index Column.

Please try to see if this works for you.  Then we can attack the "fuzzy logic" part of your question.

Regards,

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.