The Artistry and Innovation of Object Detection

Sculpting Vision

7 min readJul 21, 2024

--

In this article, we will delve into the intricacies of object detection and we will try to explore that, how well it works in different domains and how important it is for us. In a world where pixels converge into meaning and algorithms decode reality, object detection stands as the sentinel of innovation. Like a digital oracle, it sifts through streams of data unveiling hidden patterns and transforming pixels into profound insights. From autonomous vehicles navigating bustling streets to healthcare systems diagnosing ailments with precision, the impact of object detection reverberates across industries shaping a future where machines see and understand as humans do.

Table of Content

Performing object detection
Visualizing sample files

Performing Object Detection

First, let’s use the Haar cascade algorithms from the

face_object_detection module. We use the FaceObjectDetector object to
extract the face, profile face, eyes and smile. The class

CascadeObjectDetector initializes the specialized cascade classifiers for the
aforementioned people attributes using the specialized imported resource.
The function detects a method of the CascadeClassifier from OpenCV

to detect objects in images. For each attribute, we will use a different shape
and color to mark/highlight the extracted object.

Frontal Face: Green rectangle
Eye: Red circle
Smile: Red rectangle
Profile Face: Blue rectangle

Note that due to the large amount of false positives, we deactivated the smile
detector.
We apply the function for face detection to a selection of images from the

train sample videos. This code block performs this operation.

The preceding code run will yield three image captures for three different
videos. Each image is decorated with the highlighted objects extracted. The
figures show the three image captures with the extracted objects.

In Figure 1.1, we see both the frontal and profile faces detected and one
eye detected. Figure 1.2 shows both the frontal and profile faces detected
and two eyes detected. Figure 1.3 shows both the frontal and profile faces

detected, the two eyes correctly detected and one false positive, one of the
nostrils is detected as an eye. The smile detection is not activated in this
case, too many false positives.

Figure 1.1 - Face, face profile, and eye detection in image captures from three different videos

Figure 1.2 - Face, face profile, and eye detection in image captures from three different videos

Figure 1.3 - Face, face profile, and eye detection in image captures from three different videos

Running these algorithms with other images as well, we can see that they are
not very robust and frequently yield false positives as well as incomplete
results. In Figure 1.4, we show two examples of such incomplete
detections. In Figure 1.4a, only the face was detected. In Figure 1.4b,
only one face profile was detected, although two people are present in the

scene.

Figure 1.4a - Face, face profile, and eye detection in image captures from two different videos

Figure 1.4b - Face, face profile, and eye detection in image captures from two different videos

In the preceding image, there is also a strange detection, the fire sprinkler in
the ceiling is detected as an eye and so is the candle fixture on the far left.

This type of false detection, false positives is quite frequent with these
filters. One common problem is that objects like eyes, noses or lips are
detected in areas where there is no face. Since the search is done
independently for these different objects, the likelihood of getting such false

the positives are quite large.

With the alternative solution implemented by us in face_detection_mtcnn, a

unique framework is used to detect simultaneously the face bounding box
and the position of face elements like eyes, nose and lips. Let’s compare the
results obtained with the Haar cascade algorithm, as shown in Figures 1.3
and 1.4, with the results for the same images obtained with the MTCNN
algorithm.

In Figure 1.5, we show one image of the person dressed in yellow, this
time, face detection is performed with our MTCNNFaceDetector.

Figure 1.5 - MTCNN face detection: one genuine face and one artifact detected

Two face objects are detected. One is correct, and the second is an artifact.
The detection JSONs are.

From the experiments we conducted with a considerable number of samples,
we concluded that the real faces will have a confidence factor very close to 1. Because the second detected “face” has a confidence of 0.87, we can
easily dismiss it. Only faces with a confidence factor above 0.99 are
actually to be trusted.

Let’s see another example. In Figure 1.6, we compare the results for the
same images from Figure 1.4. In both figures, all the faces of the people
present in the scene are correctly identified. In all the cases, the confidence

score is above 0.999. No artifacts are incorrectly extracted as human figures.
The algorithm appears to be more robust than the alternative
implementations using Haar cascades.

Figure 1.6 - MTCNN face detection: a scene with one person and a scene with two people

For the next example, we selected a case where, if there are two people
present in the video from which we capture the image, the faces are correctly
identified and the confidence score is high. In the same image, an artifact is

also identified as a human face.

Figure 1.7 - MTCNN face detection: a scene with two people

Besides the two real people, for which the confidence factors are 0.9995 and
0.9999 (rounded to 1), respectively, the face of the Dead Alive character on
the T-shirt of the first person in the scene is also detected as a face. The
bounding box is correctly detected and all the face elements are also

detected correctly. The only indication that this is a false positive is the
lower confidence factor, which in this case is 0.9075. Such examples can
help us to correctly calibrate our face detection approach. Only faces

detected with a confidence above 0.95 or even 0.99 should be considered.

Visualizing Sample Files

This code block selects a few video files from the set of fake

videos and then visualizes an image capture from them, using the
display_image_from_video function from the utility script video_utils.

The preceding code will plot one image capture per each of the three videos.
In Figure 1.8, we only show one of these image captures, for the first video.

Figure 1.8 - Example of an image capture from a fake video

The next code block selects a sample of three real videos and then, for each
selected video creates and plots an image capture.

In Figure 1.9, we show one of the images captured from the first real video.

Figure 1.9 - Example of an image capture from a real video

We would also like to inspect videos that are all derived from the same
original video. We will pick six videos from the same original video and
show one image capture from each video. This code block

performs this operation.

In Figure 2.0, we show two of these image captures from several different
videos, of which we used the same original file to deepfake.

Figure 2.0 - Image captures from faked videos modified from the same original file

We performed similar checks with videos from the test set. Of course, in the
case of the test set, we will not be able to know in advance which video is
real or not. This code selects image captures from two sample
videos from the data.

Figure 2.1 displays these selected images.

Figure 2.1 - Image captures from faked videos modified from the same original file

Let’s now start to use the algorithms for face detection introduced in the
Face and body detection utils section.

Conclusion

Finally, we will take a descriptive drive of learning face and body detections by trying out different video data samples and in that opencv package is a best friend of ours in that. In the ever expanding universe of artificial intelligence and computer vision, object detection emerges as a cornerstone that enabling machines to perceive and interact with the world in ways previously unimaginable. As algorithms grow more sophisticated and datasets richer, the applications of object detection span realms from autonomous vehicles navigating complex environments to medical diagnostics enhancing precision in healthcare. Yet, beyond its technological prowess lies a profound promise, the democratization of efficiency and safety, empowering industries and societies alike. As we venture deeper into this era of innovation, let us embrace the transformative potential of object detection with both curiosity and responsibility, ensuring it enriches lives, preserves dignity and propels us towards a future where the boundaries of possibility are continually redefined.