Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Fabric Community site will be in read-only mode on Monday, Feb 24 from 12:01 AM to 8 AM PST for scheduled upgrades.

Reply
tamasv
New Member

Tesseract install

I want to use pytesseract in a notebook. I have added pytesseract to my environment. I can import it.

However it does not work, I get an error when trying to run:

pytesseract.image_to_string()

Error:

"tesseract is not installed or it's not in your PATH. See README file for more information."

 

In my local machine I had to install the Tesseract at UB Mannheim windows exe, and I had to provide the path to the installed file.

I am guessing Fabric is missing this Tesseract at UB Mannheim installation.

 

Anyone figured out installing tesseract?

1 ACCEPTED SOLUTION
cmaneu
Microsoft Employee
Microsoft Employee

Hello @tamasv,
@nilendraFabric is right. PyTesseract relies on Tesseract binary and libraries. They're not part of the Fabric environment, and there is no easy way to download them. Depending on your use case, you may have several options.

Use EasyOCR

@nilendraFabric suggestion was good. This library works within Fabric. Here is a sample code.

%pip install easyocr
import easyocr
reader = easyocr.Reader(['en'])
result = reader.readtext("/lakehouse/default/Files/screenshot.png")
for detection in result:
    print(detection[1])

In my (small) experience, tesseract provides better results than easyOCR, so please check your use cases.


Use Azure Services

Azure Vision AI Services provides several ML models to extract both printed and handwritter text. With Document intelligence, you can even extract structured information - for example parsing an image of an invoice and automatically get each line item.



Tesseract - manually build and reference
If you really want to use Tesseract, Technically, you could hand-install tesseract packages, this will involve manually downloading (deb) packages and untar them, but that would be quite time-consuming with all the chain of dependency. You could also compile it yourself to have a single exe with all the dependencies linked (someone on the Internet may have done that alreay).

Hope this helps! 

View solution in original post

4 REPLIES 4
v-prasare
Community Support
Community Support

@tamasv  As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by our super users for your issue worked? or let us know if you need any further assistance here?

 

@cmaneu@nilendraFabric, Thanks for your promt response here.

 

Thanks,

Prashanth Are

MS Fabric community support

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query

cmaneu
Microsoft Employee
Microsoft Employee

Hello @tamasv,
@nilendraFabric is right. PyTesseract relies on Tesseract binary and libraries. They're not part of the Fabric environment, and there is no easy way to download them. Depending on your use case, you may have several options.

Use EasyOCR

@nilendraFabric suggestion was good. This library works within Fabric. Here is a sample code.

%pip install easyocr
import easyocr
reader = easyocr.Reader(['en'])
result = reader.readtext("/lakehouse/default/Files/screenshot.png")
for detection in result:
    print(detection[1])

In my (small) experience, tesseract provides better results than easyOCR, so please check your use cases.


Use Azure Services

Azure Vision AI Services provides several ML models to extract both printed and handwritter text. With Document intelligence, you can even extract structured information - for example parsing an image of an invoice and automatically get each line item.



Tesseract - manually build and reference
If you really want to use Tesseract, Technically, you could hand-install tesseract packages, this will involve manually downloading (deb) packages and untar them, but that would be quite time-consuming with all the chain of dependency. You could also compile it yourself to have a single exe with all the dependencies linked (someone on the Internet may have done that alreay).

Hope this helps! 

Hello!

 

Thanks for the info provided. EasyOCR does work. Just the project is not built around that library. As our experience shows that easyocr is a bit worse for what we need it. Manual build is not my expretise, but maybe we will look into it. Also on the long run Azure Vision could be a potential candidate.

 

I accept this solution as it opened up some options to do the project in Azure ecosystem.

Thanks

nilendraFabric
Community Champion
Community Champion

Hello @tamasv 

 

You are right.

 

Fabric does not natively support the Tesseract OCR engine or its Linux-based binaries, which are required for pytesseract to function.

 

Did you tried you using EasyOCR ,does not require external binaries.

 

hope this is helpful. 
please accept the answer and give kudos if this helps

 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Feb2025 Sticker Challenge

Join our Community Sticker Challenge 2025

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

JanFabricDE_carousel

Fabric Monthly Update - January 2025

Explore the power of Python Notebooks in Fabric!

Feb2025 NL Carousel

Fabric Community Update - February 2025

Find out what's new and trending in the Fabric community.