Data extraction from images is an exciting field that has a wide range of applications. From detecting landmarks and labeling images to creating search engines for finding similar images based on their content, there are a lot of possibilities for creating exciting services for our customers. This blog will discuss how to convert image to text using Python.
What is OCR?
Optical character recognition, or OCR for short, is the process of converting an image of text into machine-encoded text. It’s one of the most important areas of Machine Learning. And it’s not just for nerdy machine learning enthusiasts, it has a ton of applications. One very useful application is for processing images of text for use in ebooks and other digital media. In this blog post, we will show you how to use Tesseract and easyocr to do just that.
How to extract text for images using Tesseract?
Tesseract is a powerful, accurate, and efficient Optical Character Recognition open-source engine for various operating systems. It supports a wide variety of languages. Tesseract can also be used as a system library to develop new image analysis applications.
To install tesseract in Linux run the following code in the terminal:
sudo apt install tesseract-ocr
To install tesseract in windows use the following link to go to the tesseract GitHub page there you will find the link to download the setup and simply install the file
Note: please check and remember the path where Tesseract is benign installed. Because tesseract is not a Python library we also need to install a few Python modules, you can install the modules using the following code:
pip3 install pytesseract
pip3 install tesseract
pip3 install opencv-python
Step -1: Package installation
To start, we’ve made a Python file and imported all of the necessary modules at the top.
# text recognition import cv2 import pytesseract
Step 2: Reading the image
Provided you have the test image stored in the same folder, then you can use the imread() function to import the image.
# read image img = cv2.imread('photo.jpg')
Step 3: Configuration
Here, we need to configure custom options.
# configurations config = ('-l eng --oem 1 --psm 3')
Step 4: Setting path
If you have not configured Tesseract executable path while installing in your System use the following path: (if you have configured/changed the installing path then include that path here)
# pytesseract path pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
Step 5: Converting the image to Text
Then, we need to convert the image into a single string using the method: image_to_string().
text = pytesseract.image_to_string(img, config=config)
Step 6: Printing the results of Convert Image to Text using Python
And in the last, we have to print the extracted text from the image
# print results text = text.split('\n') print(text)
This is the image I used for the process
After running the code you can see the extracted text in the output and as you can see the results are pretty accurate.
How to extract text for images using easyocr:
EasyOCR is a Python package that allows text to be extracted from any document using optical character recognition. It absorbs over 70 languages and more are being added even as we speak! EasyOCR is produced by Jaidai Industries – the leading updating machine in the industry which makes it quite capable of absorbing a vast amount of information, and bringing that information up to date through its efficient production abilities.
It’s important to install the PyTorch library before installing easyocr. PyTorch is an open-source machine learning library for Python, which is developed by Facebook. It is based on TensorFlow which is an open-source library for numerical computation. Pytorch is capable of running on GPUs, just like TensorFlow. It is a library for fast, flexible deep learning built on top of Torch.
To install PyTorch using conda use the following command:
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
To install PyTorch using pip in Linux use the following command
pip3 install torch torchvision torchaudio
Note: Before installing easyocr its important to install PyTorch otherwise it will not work
To install PyTorch using pip in windows use the following command
pip3 install torch==1.10.2+cu102 torchvision==0.11.3+cu102 torchaudio===0.10.2+cu102 -f https://download.pytorch.org/whl/cu102/torch_stable.html
To install easyocr use the following command
Pip install easyocr
We will also need matplotlib for the process, to install them use the following command
pip install matplotlib
Jupyter notebook is a web-based interactive computing platform. it is open-source software that is run using the Python programming language. It can illustrate the analysis process step by step by arranging the stuff like code, images, text, output, etc. You can install the Jupyter notebook using the following command in your terminal.
pip install jupyter notebook
For accessing the notebook you can use this command:
Step -1: Importing dependencies
# importing all the necessary modules to run the code import matplotlib.pyplot as plt import cv2 import easyocr from pylab import rcParams from IPython.display import Image rcParams['figure.figsize'] = 8,16
Step -2: Setting up locale/language
# here you can use any other language you want Reader = easyocr.Reader(['en'])
After running this line of code it will download the language pack in your system so it will take some time.
Step -3: Attaching the image
Let’s see an image that we are using for text extraction
Step -4: Printing the output
# using the read text function generating the text from image output = reader.readtext('photo.jpg') print(output)
In the given output, you will notice how the code successfully generated a variety of output from images. In the output first section shows the coordinates for each bounding box, then shows which sentence the text was predicted to be closest to, followed by an estimation of how certain it is that the sentence was generated from within these specified bounding boxes.
Step 5: Showing the bounding box on the image
# here we are only using first text form the image to show the bounding boxcord = output[-1][0 x_min, y_min = [int(min(idx)) for idx in zip(*cord)] x_max, y_max = [int(max(idx)) for idx in zip(*cord)] # using matplotlib to show the bounding box rectangle around the text in image image = cv2.imread('photo.png') cv2.rectangle(image,(x_min,y_min),(x_max,y_max),(0,0,255),2) plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
As you can see in the image there is a red bounding box that is generated around the text.
In this article, we looked at two ways how you can extract text from an image using EasyOCR and using Tesseract. The difference between both of the libraries is that tesseract is somewhat old now and easyocr is new it supports more languages and can show visual findings on images. But both of them are very capable libraries and It’s a great way to quickly convert scanned documents, faxes, photos, and other images into editable text for use in your documents and spreadsheets. Please let us know if you have any questions or comments about the process or tips for other users. We’d love to hear from you!
- Concurrency in Python
- Basic Neural Network in Python to Make Predictions
- Monitor Python scripts using Prometheus
- How to Implement Google Login in Flask App
- How to create a Word Guessing Game in Python
- Convert an image to 8-bit image
- Schedule Python Scripts with Apache Airflow
- Create Sudoku game in Python using Pygame
- How to Deploy Flask API on Heroku?
- Create APIs using gRPC in Python