Object detection and tracking is an active research topic in the field of computer vision that makes efforts to detect, recognize, and track objects through a series of frames. Object detection is the backbone of many computer vision system like autonomous self driving car, surveillance, face recognition etc. In this post we will try to make our own object detection program.
How object detection works?
How do we identify objects in our real life? With the help of color, shape, size, texture and other property of that object, of course! These properties are what we call features in computer vision. Computers also utilize these features to detect objects. However, features that human brain extracts and stores to identify object are very complex and invariant (with respect to size, orientation, illumination, perspective etc ) in nature. Deep Neural Networks (DNNs) can extract and learn more complex features of object to make object detection somewhat invariant. But, again, it’s nowhere near to the human visual system. In this post will stick to a simple yet important feature for object detection — COLOR.
Let’s get started!!
Throughout this post we are going to use our webcam for input feed using OpenCV. To install OpenCV please follow this link. OpenCV provides very simple interface for camera. To capture video from camera, we need to create VideoCapture object like this.
cap = cv2.VideoCapture(0)
Here argument is for device index. ‘0’ is for default camera (webcam in my case). Alternatively, you can pass path to video file for capturing frames from video or other integers (1,2…) for secondary camera devices. After that, we can capture frame by frame using
import cv2 as cv
# Create video capture object
cap = cv2.VideoCapture(0)
# read frame-by-frame
ret, frame = cap.read()
# This line automatically opens new window named Image Frame
# wait for 30 milisecond for key to be pressed. if 'q' is pressed break loop
if cv.waitKey(30) & 0xFF == ord('q'):
# When everything done, release the capture and destroy all window
After this simple setup, let’s dive into the code.
import cv2 as cv
import numpy as np
if len(centers) < length:
centers = centers[-length:]
np_centers = np.array(centers,dtype=np.int32)
cv.polylines(frame,[np_centers],isClosed = False,color = (0,0,255),thickness = 3)
# Creating video capture object
cap = cv.VideoCapture(0)
centers =  # Variable to hold center of object.
upper_limit = () # Upper HSV range in order (H,S,V)
lower_limit = () # Lower HSV range in order (H,S,V)
ret,frame = cap.read()
if frame is None:
blurred_frame = cv.GaussianBlur(frame, (11, 11), 0)
frame_HSV = cv.cvtColor(blurred_frame, cv.COLOR_BGR2HSV)
Line No 1-3: we are importing modules that we will need.
In Line 12 we are creating video capture object. And variable
centers is defined for storing the center of object in Line 13.
Line 14 and 15 define the upper and lower limit of HSV value for the object we are trying to track. This limit should be updated carefully because it determines which colored object to detect. If you don’t know how to find perfect HSV range follow this link:
Line 17 starts a loop that will continue until
- we press the q key, indicating that we want to end the program.
- or, our video file reaches its end and runs out of frames.
In Line 18 we read frame from video source. If
None is returned, that means our video file has reached its end, which in turn breaks the loop.
Line 21 and 22 are for little bit of image pre-processing which consists of:
- Blurring image to minimize noise in image.
- And, Converting image to HSV color space. (You can find more on this topic in this link.)
threshold_frame = cv.inRange(frame_HSV,upper_limit,lower_limit)
# Some morphological operation to minimize noise.
threshold_frame = cv.erode(threshold_frame ,None,iterations = 2)
threshold_frame = cv.dilate(threshold_frame ,None,iterations=2)
# Find the white in binary image
contours, hierarchy = cv.findContours(threshold_frame , cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
# Iterate through each such white area
for contour in contours:
center, radius = cv.minEnclosingCircle(contour)
# White area less than 10 pixel are treated as noises.
# convert center's float co-ordinates to integers.
center = (int(center),int(center))
# Draw a minimum enclosing circle.
cv.imshow('Object Detection', frame)
cv.imshow('Binary Image',threshold_frame )
key = cv.waitKey(30)
if key == ord('q') or key == 27:
except Exception as e:
Line No 23 performs the threshold operation and returns binary image. If we put perfect range for HSV value, we have output like this:
Salt (white dots) and Pepper (black dots) noises are filtered using morphological erode and dilate operation respectively in Line No 25 and 26.
Next challenge is finding white portion in image or closed area which represents object in our case. OpenCV has
findContours function for this(Line 28) . This function finds boundary of area or contour which has same color or intensity and returns its co-ordinates. If there are multiple areas like this, then it returns multiple contours. More on contours could be found in this page.
Line 31 finds the smallest circle that encloses contour using
minEnclosingCirlce. This function also returns radius and center of such circle. And small noise contours are filtered by comparing radius of enclosing circle in Line 33.
At this point we have done all tasks related to finding object in frame. Now its time for pointing out object visually. And we will draw circle around object in each frame so wherever object moves in frame, it would have bounding circle.
Line 39 draws red circle around object with line thickness of 3 pixels.
Line 42 shows the final output on window named
Where did the tail come from? Well, the function
follow_center in Line 40 creates this tail. This function takes three arguments:
centers: List containing current and previous centers of object in frame.
frame: Frame to draw line.
length: This integer determines how many centers to use while drawing tail.
Line 6 and 7 limits the count of center points to use while drawing the tail. It always maintains 10 in this case. And Line 10 draws tail in frame.
Bonus Point: Even after maintaining the count of points constant, Why does tail’s length increases when it’s moving fast?