Object detection is a process by which the computer program can identify the location and the classification of the object. Object detection is very useful in robotics, especially autonomous vehicles. There are many libraries and frameworks for object detection in python. In this tutorial, we are going to use object detection on both photos and videos using the OpenCV library and perform object detection using Python.

Prerequisites

Anaconda:

The Anaconda distribution is a collection of packages that consists of Python, R, and over 120 of the most popular open-source packages for science and data processing. This means that this distribution can be installed on a single machine which makes it immensely convenient to do data analytics on a large scale! You can download and install the Conda by going to the following link.

https://www.anaconda.com/products/individual

After that, you’ll get access to the terminal which you’ll use to install all of your libraries.

Jupyter notebook:

Jupyter notebook is a free and open-source web application. Its commonly used by data scientists and other technical users who are looking to share their work, but anyone can use this platform for knowledge sharing. You can install the Jupyter notebook using the following command in your Conda terminal.

pip install jupyter notebook

For accessing the notebook you can use this command.

jupyter notebook

Opencv:

openCV provides a tool that specializes in the analysis of images, enabling developers to solve lots of common problems when it comes down to creating computer vision applications. Open source tools like openCV enable developers to work on such projects for free and can prove very useful for getting maximum optimization out of their projects promptly.

The library can be installed using the following command in your prompt/terminal:

pip3 install opencv-python

Matplotlib:

Matplotlib is an essential library in Python. This Python library was designed based on MATLAB, a graphics program created by mathematicians and engineers primarily dedicated to arts and engineering fields.

The library can be installed using the following command in your prompt/terminal:

pip3 install matplotlib

Already trained model:

In this blog post, we are not going to create and train our model because this is a tutorial just a basic guide that is why we need a pre-trained model that is already available on the opencv GitHub page. Go to the following link to download the model. Then extract the .tar file and copy-paste the frozen_inference_graph.pb file in the same folder where you have your jupyter notebook running.

https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API

We also need one more file so download that from the following link. Then extract the .zip file and copy-paste the ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt file in the same folder where you have your jupyter notebook running.

https://gist.github.com/dkurt/54a8e8b51beb3bd3f770b79e56927bd7

Coco labels:

To detect the objects from photos or videos we also need the name/labels of the object and currently, this supports a total of 80 objects you can the following link and copy-paste it into a blank .txt file and save that file where you have your jupyter notebook running. 

https://github.com/pjreddie/darknet/blob/master/data/coco.names

Go to this link, copy-paste all 80 names in the text file, and save the file where you have your jupyter notebook running.

Code:

Detecting objects in the image:

# import the dependencies
import cv2
import matplotlib.pyplot as plt

# load the files
config_file = 'ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt'
frozen_model = 'frozen_inference_graph.pb'

#  load the model
model = cv2.dnn_DetectionModel(frozen_model, config_file)

# empty list of python
classLabels = []
# loading the labels
file_name = 'Labels.txt'
with open(file_name, 'rt') as fpt:
    classLabels = fpt.read().rstrip('\n').split('\n')
    #classlables.append(fpt.read())

# checking if the labels are loaded correctly or not
print(classLabels)
print(len(classLabels))

Now, we check to make sure that all of the settings for our labels are correct by running the code.

Output:

# we have to defend the input size 320/320 because our model only supports this format
model.setInputSize(320,320)
model.setInputScale(1.0/127.5)
model.setInputMean((127.5,127.5,127.5))
model.setInputSwapRB(True)

# load the image
img = cv2.imread('men.jpg')

# show the image
# its in bgr format
plt.imshow(img)

# converting the image into RGB format
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

# setting the threshold or accuracy you want we are just fixed for 50% confidence
ClassIndex, confidence, bbox = model.detect(img,confThreshold=0.6)

print(ClassIndex)

Here we reached that point where we can detect the object in the image. But if you run the code you will get an output like this 

It’s giving us just a number that indicates the position of the object present in the labels file. You can check in the labels file and you will see that number 3 is a car, which means that the code is running perfectly and also accurately. Let’s visualize by adding a box around the image and showing its label.

font_scale = 2
font = cv2.FONT_HERSHEY_PLAIN
for ClassInd, conf, boxes in zip(ClassIndex. flatten(), confidence.flatten(), bbox):
    # adding the box around the image and defining the color and width of the box
    cv2.rectangle(img, boxes, (255, 0, 0), 1)
    # adding the label in the box and defining the color,style and width of the font
    cv2.putText(img, classLabels[ClassInd-1], (boxes[0]+10, boxes [1]+40), font, fontScale=font_scale, color=(255, 255, 0), thickness=2 )

plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

Note: If there are multiple objects in the photo then we will also detect those and put up a boundary box around them.

Output:

Detecting objects in the video:

# import the dependencies
import cv2
import matplotlib.pyplot as plt

# load the files
config_file = 'ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt'
frozen_model = 'frozen_inference_graph.pb'

#  load the model
model = cv2.dnn_DetectionModel(frozen_model, config_file)

# empty list of python
classLabels = []
# loading the labels
file_name = 'Labels.txt'
with open(file_name, 'rt') as fpt:
    classLabels = fpt.read().rstrip('\n').split('\n')
    #classlables.append(fpt.read())
# we have to define the input size 320/320 because our model only supports this format
model.setInputSize(320,320)
model.setInputScale(1.0/127.5)
model.setInputMean((127.5,127.5,127.5))
model.setInputSwapRB(True)
cap = cv2.VideoCapture("sample.mp4") # you can leave this blank if you want to use your webcam for object detection.
# Check if the video is playing correctly or not
if not cap. isOpened():
    cap = cv2. VideoCapture (0)
if not cap. isOpened():
    raise IOError("Cannot open video")

font_scale = 3
font = cv2. FONT_HERSHEY_PLAIN

while True:
    ret, frame = cap.read()
    ClassIndex, confidence, bbox = model.detect(frame, confThreshold=0.55)

    print(ClassIndex)
    if (len(ClassIndex)!=0):
        for ClassInd, conf, boxes in zip(ClassIndex. flatten(), confidence.flatten(), bbox):
            if (ClassInd<=80):
                cv2.rectangle(frame,boxes, (255, 0, 0), 2)
                cv2. putText(frame, classLabels[ClassInd-1], (boxes[0]+10, boxes [1]+40), font, fontScale=font_scale, color=(0, 255, 0), thickness=3 )
    cv2.imshow('object Detection Tutorial', frame)
    if cv2.waitKey(2) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

Output:

Final Words

In this blog, we introduced the concept of object detection and how to implement it in Python. Our goal was to teach developers who are new to sample code on detecting specific objects in photographs or videos, as well as the view that same functionality in real-time through a webcam. Overall, we hope you enjoyed the post and please let us know if you have any questions about this!

Related Articles