Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! It's time to submit your entry. Live now!

Reply

How to get data from inaccurate OCR (inconsistent line feed and number mistaken as alphabet)?

For scanned text from statement of credit card transactions, how to use BI to get data without pre-processing.

Scanner output (text file) has following attibutes:

  1. partially accurate OCR
    1. same line can have more than one transaction
    2. some number became text
  2. each transaction
    1. start with date (DD MMM)
    2. description
    3. Location
    4. reference ID
    5. currency amount if applicable
    6. Amount converted to my currency
    7. Amount may have text suffixed (such as credit)

Do I have to pre-process before using Power BI?

 

Kind Regards

P.S.

Microsoft Power Automate does not yet have a template for credit card statement and requires admin authorisation to customize AI model built for invoice template.

1 REPLY 1
Anonymous
Not applicable

Hi @SunnySchindler ,

 

In Power Query, you can use split column feature to extract your value.

Extract Parts of a Text Value in Power BI using a Delimiter: Power Query Transformation - RADACAD

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.           

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! It's time to submit your entry.

January Power BI Update Carousel

Power BI Monthly Update - January 2026

Check out the January 2026 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.