Facial Recognition

using AI

Histogram of Oriented Gradients (HOG)


First, we make black and white images since there is no need for colour to detect faces.

Looking at each pixel and comparing it with its neighbours, we find out the flow of colour density (draw an arrow from lighter to darker). Indeed, find the gradient of the image. So, by specifying the basic flow of lightness/darkness at a higher level, we could see the basic pattern of the image. Considering a scrolling 16x16 square on the image, we will find arrows for each pixel and replace the whole square with the strongest arrow. Using HOG, we will turn the image into a simple format. By repeating the process for many images of one person, we can find a more complicated and accurate structure of the face.

Figure 1 - Histogram of Oriented Gradients

Figure 2 - HOG Face Pattern of Many Face Images

Figure 3 - Face Image

Figure 4 - HOG of the Face Image

Posing and Projecting Face

Transform each picture in the order that the lips and eyes always are in the same place.

For face Landmark Estimation, we consider 68 specific points (landmarks) on each face. We are using machine learning to find these points.

Figure 5 - 68 Landmark Points

Figure 6 - Face Landmark Detection and Tranformation

Encoding faces

A simple way of recognizing a face is to compare its elements with tagged pictures that we already have. So, the main problem with this approach is that we cannot do it for massive data set since it may take a long time!

So, one way to solve this issue is to categorize faces based on eye colour, the distance between eyes, ear size, etc. The results showed that this approach is not accurate enough if humans make it.

Finally, scientists decided to let the computer do these measurements -> generating 128 measurements for each face!

Deep Metric Learning

Typically, in deep learning approaches for face recognition, we have two steps:

    • Accept a single input image

    • Output a classification/label for that image


In deep metric learning, we are outputting a real-valued feature vector, called EMBEDDINGS, instead of trying to output a single label (or even the coordinates/bounding box of objects in an image),

For example, in the dlib facial recognition network, the embedding is 128-d (i.e., a list of 128 real-valued numbers) used to quantify the face.


For training a neural network using triplets, we provide three images to the network:

      • Two of these images are examples of faces of the same person.

      • The third image is a random face from our data set and is not the same person as the other two images.


We tweak the weights of the neural network; as a result, the 128-d measurements of the two identical images will be closer to each other and farther from the third one.

Required Libraries

In order to perform face recognition with Python and OpenCV, we need to install two additional libraries:

Dlib:

The dlib library contains our implementation of “deep metric learning,” which is used to construct our face embeddings for the actual recognition process.

face_recognition:

The face_recognition library wraps around dlib’s facial recognition functionality, making it easier to work with.


Dataset

The dataset has been chosen from politicians

      • Boris Johnson (39 images)

      • Justin Trudeau (42 images)

      • Emmanuel Macron (37 images)

      • Angela Merkel (32 images)

      • Trump (30 images)

Based on this data set, we create embeddings (128-d real-valued feature vector) for each face and then use them to recognize faces.

Coding

Deep Metric Learning:

  • Go through the data set and select images

  • Change the OpenCV (BGR) ordering to dlib ordering (RGB)

  • Detect face(s) per image using detection methods (HOG) through the face_locations function of face_recognition library

  • Then, evaluate the real-valued feature through face_encoding; this will return a value corresponding to each detected face in the image. (NOTE: in our dataset, we only have one face per image)

  • Add each encoding + its name to the encoding vector

  • Ultimately, write the embeddings onto disk

Facial Recognition:

    • The first step is the same as deep metric learning, i.e. we should detect faces in the input image and store their encoding values.

    • So, we will loop over encoding values to see if there is any matching item or not!

    • For this purpose, we use the compare_faces function of the face_recognition library. This function returns a vector of true and false in size of its first argument, which is our embeddings vector from the previous step!

    • Then, we count the number of matches since the largest one represents the most matching face

    • Finally, find the corresponding name of the maximum matched face and store its name!

Results

Acknowledgment

I have done this project by getting help from the contents provided in the following links, as the final project of the Image Processing course at UNB. I would like to thank all authors of the aforementioned websites for providing such precious educational materials.

  1. Modern Face Recognition with Deep Learning. [Link]

  2. Face Recognition with OpenCV, Python and deep learning. [Link]

  3. How does face recognition works? [Link]

  4. Cognitive Service by Microsoft. [Link]

  5. Facial Recognition System. [Link]


Date: March 31st, 2020