Mastering Identity

The Future of Face and Body Detection

A.I Hub
11 min readJul 21, 2024
Image owned by learnopencv

In this article, we will walk you through the fantastic journey of learning face and body detection. In the ever-evolving landscape of technology, the quest to understand and interact with human features has reached new heights. From unlocking smartphones to enhancing security systems, face and body detection technology has revolutionized how we perceive and engage with the world around us. As our capabilities in artificial intelligence and computer vision advance, so too does our ability to recognize, analyze and harness the complexities of human identity and movement. Join us on a journey into the cutting edge realm where innovation meets humanity, reshaping industries and shaping the future of interaction as we know it.

Table of Content

  • Face and body detection utils
  • Meta Data Exploration
  • Video Data Exploration

Face and Body Detection Utils

Image owned by Banuba

In the detection of deepfake videos, the analysis of video features such as
desynchronization between sound and lip movement or unnatural motions of
parts of the faces of people appearing in the video were at the time of this

competition, valuable elements to train models to recognize deepfake videos.
Therefore, we include here the utility script specialized to detect bodies and
faces.

The first module for face detection used the Haar cascade algorithm. Haar
cascade is a lightweight machine learning algorithm for object detection. It is
usually trained to identify specific objects. The algorithm uses Haar like

features and the Adaboost classifier to create a strong classifier. The
algorithm operates on a sliding window, applying a cascade of weak
classifiers that rejects regions of the image less likely to contain the object of

interest. In our case, we want to use the algorithm to identify details in video images that are usually altered in the case of a deepfake, such as the facial

expression, the gaze and the mouth shape. This module includes two
classes. We start with one of these classes. CascadeObjectDetector is a

generic class for the detection of objects using the Haar cascade algorithm.
The CascadeObjectDetector class which is modified from the code has an init function where we initialize the object with the

specific Haar cascade object that stores the trained model. The class also
has a detect function.

This is the code for CascadeObjectDetector.

In the init function, we initialize the cascade object:

The next code snippet contains the detect function of the

CascadeObjectDetector class. This function returns the rectangle

co-ordinates of the object detected in the image.

The init function receives a path to one of the object detection models

included in the database. The function detect receives, as parameters, the
image to process for object extraction and a few parameters that can be used

to adjust the detection. These parameters are the scale factor the minimum
number of neighbors used in detection and the minimum size of the
bounding box used for object detection. Inside the detect function we call

the function detectMultiscale from the Haar cascade model.

The next class defined in the utility script is FaceObjectDetector. This class
initializes four CascadeObjectDetector objects, for the face, face profile,
eyes and smile detection. This code block shows the class
definition with the init function, where these objects are defined.

For each face element, the frontal view of a person, profile view of a

person, eye view and smile view, we first initialize a dedicated variable with
the value of the path to the Haar cascade resource. Then, for each of the
resources, we initialize a CascadeObjectDetector object.

The objects are stored as the member variables face_detector,

eyes_detector, profile_detector, and smile_detector.

In the next code block, we show the detect_objects function, where we
call, for each of the CascadeObjectDetector objects defined in the init
function, the detect function. To make it easier to follow, we will split the
code snippet into three parts. The first part shows the call, from the

detect_object function of the FaceObjectDetector class and the detect
function of the CascadeObjectDetector object initialized with the eyes Haar

cascade object. Then, we use the OpenCV Circle function to mark on the
initial image, with a circle, the position of the eyes detected in the image.

Next, we apply the same approach to the smile objects in the image. We
first detect the smile and if detected, we display it using a rectangle, drawn
with the opencv function over the bounding box of the detected object.
Because this function tends to give a lot of false positives, by default, this

functionality is deactivated using a flag set to False.

Finally, we extract the profile and face objects using the specialized Haar
cascade algorithms. If detected, we draw rectangles to mark the bounding
boxes of the detected objects.

For each of the four specialized object detectors the face, face profile, eyes and smile we call the detect function and the results a list of rectangles
with the bounding box of the detected object and then we draw in the

context of the initial image either circles for eyes or rectangles for the
smile, face and face profile around the detected object. Finally, the function
displays the image, with the superposed layers marking the bounding boxes

of detected objects. Because the model for smile gives many false positives,
we set an additional parameter, a flag to decide whether we show the
extracted bounding boxes with smiles.

Next, the class has a function to extract image objects. The function receives
a video path, captures an image from the video, and applies the
detect_objects function on the image capture for detection of the face and

face details the eyes, smile and so on from that image. This code

block shows the function for extraction.

We introduced a module for face detection using Haar cascade algorithms.
Next, we will review an alternative approach, where we use the MTCNN
model for face detection. We want to test multiple approaches to decide
which one works better for face detection. MTCNN stands for Multi-Task

Cascaded Convolution Networks and is based on a concept developed first
in the paper Joint Face Detection and Alignment using Multi task Cascaded

Convolutional Networks. In another article titled Face

Detection using MTCNN, the authors propose a cascaded multi task
framework using different features of sub models. The

implementation of face element extraction using the MTCNN approach is
done in the utility script face_detection_mtcnn. In this module, we define the class MTCNNFaceDetector. In the next code
block, we show the class definition with the init function.

The init function receives, as a parameter, an instance of the MTCNN
model, imported and instantiated in the calling application from the mtcnn

library. The class member variable detector is initialized with this object.
The rest of the class variables are used for visualization of the detected
objects. The class also has a detect function. The next code block shows the detect
function implementation.

The function receives, as a parameter, the path to the video file. After
capturing an image from the video file, we read it and transform it from the
BGR format to the RGB format. The transformation is needed because we
want to use library functions that expect the RGB color order. After we
apply the detect_faces function of the MTCNN model to the transformed
image, the detector returns a list of extracted JSONs. Each extraction JSON

has the format.

In the 'box' field is the bounding box of the detected face area. In the
'keypoints' field are the keys and coordinates of the five objects detected,
the left eye, right eye, nose, left-most mouth limit and right most mouth
limit. There is an additional field, 'confidence’, which gives the confidence

factor of the model.

For real faces, the confidence factor is above 0.99, the maximum is 1. If the
model detected an artifact or things like a poster with a face image, this
factor could be as large as 0.9. Confidence factors under 0.9 are most likely

associated with artifact detection or false positives.

In our implementation see the preceding code, we parse the list of
detection JSONs and add a rectangle for each face and a point or a very
small rectangle for each of the five face features. On the top of the face
bounding box rectangle, we write the confidence factor rounded to four

decimals. Besides the utility scripts for image capture from video and playing videos and for object detection from video data, we will also reuse the utility scripts
for data quality and plotting that we started using in section 4. In the next section, we start with a few preparatory activities and continue
with a metadata exploration of the competition data. We will cover, in this
section, importing the libraries, a few checks of the data files, as well as a
statistical analysis of the metadata files.

Meta Data Exploration

Image owned by Linked-In

We start by importing the utility functions and classes from the utility scripts
for data quality, plot utils, video utils and face object detection. This code block shows what we import from the utility scripts.

After we load the data files the train and test samples, we are ready to start
our analysis. The following code block checks the types of files in

TRAIN_SAMPLE_FOLDER.

The result shows that there are two types of files, JSON files and MP4 files.
This code checks the content of the JSON file present in

TRAIN_SAMPLE_FOLDER. It samples the first five records for files in

TRAIN_SAMPLE_FOLDER, as included in the JSON file.

In Figure 1.1, we show the data sample obtained when we created the
Data Frame meta_train_df from the JSON file. The index is the name of the
file. label is either FAKE for deepfake videos or REAL for real videos.
The split field gives the set to which the video belongs train. original is
the name of the initial video from which the deepfake was created.

Figure 1.1 - Sample of files in the train sample folder

We also check a few stats about the metadata, using the missing_data,

unique_values and most_frequent_values functions from the utility script

data_quality_stats. These functions were introduced in section 3.

Figure 1.2 shows the missing values from meta_train_df. As you can see,
19.25% of the original fields are missing.

Figure 1.2 - Missing values in the sample train data

In Figure 1.3, we show the unique values from meta_train_df. There are
323 original values with 209 unique ones. The other two fields, label and
split, have 400 values, with 2 unique values for label FAKE and REAL
and 1 for the split train.

Figure 1.3 - Unique values in sample train data

Figure 1.4 displays the most frequent values from meta_train_df. From the

total of 400 labels, 323 or 80.75% are FAKE. The most frequent original
value is atvmxvwyns.mp4, with a frequency of 6 that is, it was used in 6
FAKE videos. All the values in the split column are train.

Figure 1.4 - Most frequent values in the sample train data

In this analysis, we will use a custom color schema, with tones of blues and
grays. This code block shows the code for the generation of the
custom color map.

In Figure 1.5, we show the color map.

Figure 1.5 - Most frequent values in the sample train data

Figure 1.6 shows the label distribution in the sample train dataset. There are
323 records with the FAKE label and the rest of the labels have a REAL
value.

Figure 1.6 - Most frequent values in the sample train data

In the next section, we will start analyzing the video data.

Video Data Exploration

Image owned by Keboola

In this section, we will visualize a few samples of files and then we will
begin performing object detection to try to capture the features from the
images that might have some anomalies when processed to create deep fakes.

These are mostly the eyes, mouths, and figures. We will start by visualizing sample files, both genuine images and
deep fakes. We will then apply the first algorithm introduced previously for
face, eye and mouth detection, the one based on Haar cascade. We then
follow with the alternative algorithm based on MTCNN.

Conclusion

Finally, we will crack out the concept of face and body detection and along with that, we also understand how we will drive pictorially using opencv and python. In a world increasingly shaped by technology, the evolution of face and body detection stands as a testament to the fusion of innovation and practicality. From enhancing security protocols to revolutionizing entertainment experiences, these technologies transcend mere utility to redefine human interaction with machines.

As we stride forward into a future where precision meets possibility, the horizon for face and body detection holds limitless promise, promising a world where recognition is not just about sight but about understanding and empowerment.

--

--

A.I Hub
A.I Hub

Written by A.I Hub

We writes about Data Science | Software Development | Machine Learning | Artificial Intelligence | Ethical Hacking and much more. Unleash your potential with us

No responses yet