Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

[String/Text] Comparing two columns with repeated value but unequal index

Hi guys, 

I dont even know if the subject describles my problem well or not. 

 

I have two table as below:

1. ID in Table 1 and Table 2 are the link. 

 

2. Each ID in Table 1, there are N items while the same ID in Table 2 might have N or more than N items. Items can be repeated. 

For example, when you buy 5 donuts, and there are two lines in the receipt. 

 

3. Itemquant in table 1 and 2 can be approximately match, due to bad OCR.

 

My goal is, to compare item in the two table with below logic: 

- Line 1: Compere Apple4 is partly match with apple4 => then they are same line => compare Apple and apple to check if exact      match. 

- Line 2: Compare Donut 2 with Donut 3 => not match => not same line. 

- Line 3: Comprare Donut 2 with Donut 2 => match => same line => compre items (Donut = Donut)

- Line 4: Compare Banana 17 with Banana 17 => same

If item is repeated, its normal, just need to find the next occurence in the compare list.

Example.png

I've been trying different method but not work. Do you guys get my point? 

Any help/hint is highly appreciated.

 

Many many thanks

5 REPLIES 5
Jimmy801
Community Champion
Community Champion

Hello

 

to get you an idea.

If you have a logic or a alghorhytm, than mostly everthing is possible. But this logic has to be perfekt and waterproof. Otherwise the application of it will fail.

So when I checking your logic, it's difficult to find the logic.

- you are talking about LIne 4. so i would I assume you are meanging table 2, but this can't be the case as you are stating also comparing donat2 twice.

- why shouldnt this not be a match "nut 2 with Donut 3 => not match => not same line. " because of "not the same line" (not finding this parameter anyway) and on the other hand Banana, that aren't at the same line either, but this is a match.

 

What I did understand is that you need to merge 2 tables, using 2 columns whereas the second column has to be mergerd with a fuzzy search (matching only partly - by the way.. do you need a solution in Power BI or Excel?).

 

Another remark I want to make is that every time when I see such a little extract of the problem... maybe it would be good to understand the whole process - why exactly you need this? and what is the real goal? This would enable us to help you on a complete other level.

 

All the best

Jimmy

Anonymous
Not applicable

Hallo Jimmy801,

 

Danke schoen fuer deine Antwort. Yeah, the question itself is really a big mess. I struggle trying to explain it while giving as little context as posisble. Such irony and bit stupid haha. 😅
Thanks a lot for your response and the patience to even ask. Here is a clearer picture.


We have a bunch of people typing invoices into SAP system every month. The implicit and explicit cost we put into this is pretty crazy => Trying to implement a system with OCR that can read the invoice and extract it.


Our provider did a demo, it works but I would like to have closer view into how good is that OCR.

The Typed data by our staffs is the base for comparison. It is full and correct => Table 1
The OCR, sometimes cannot read the whole receive, it returns, for example 4 rows.


My logic is

Example.png

1. Compare Typed 1 with OCR 1 => if partly match => then they are same line in the invoice.
                                                   => if not match
 => then compare Typed 1 with OCR2 => if match => then compare Typed 2 with OCR 3 and continue. 

 

2. After each match, we eliminate/filter out that matched pair and all the entries above them (in the table). \

 

Note:

- It only party match because sometime the OCR can read, but do not get all letter correct. 

- Typed  1 and 2 could be the same value because we bought the same items => but still have to get them separately in the dataset if they are two separate lines in the receipt. 

 

BTW, I use Power BI.

 

Danke, 

K

 

Hello @Anonymous 

 

guten Morgen 😉

so now I have a much clearer picture.. and by the way, I'm an SAP-expert too.. 🙂

 

What if you would have something like this

image.png

 

And applying a fullouter fuzzynestedjoin with ignore space true, ignore case true and threshold to 0.1 you would have something like this and you coud do some analysis. Be aware that in such a case for red apple the software proposes ApPle and Red ApP e... so you have doubled values in the column Typed. You could also include the ID-column to the merge.

In case.. this function is not available in Excel yet. you would have to read the xlsx-file and then do some export or visualizing the result in Power BI 🙂

image.png

 

Hope you have now a clearer picture about it

 

If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

Anonymous
Not applicable

Hi Jimmy, sorry for late response. 

Thanks for giving a hint. However, it did not work.

I am not able to describe the problem, data structure and requirements more clear due to some regulations.

But I am looking into scripting to solve it. 

I will get back once this is done. 

 

Thank you. Nice weekend. 

Hello @Anonymous ,

 

I've asked you if my data presented would that what you are searching for. If yes, I can sent you the code or help you applying it.

Have a nice weekend

 

Jimmy

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors