face-recognition

Habom's blog

2019-04-20

Source code for this article on Github

Face recognition identifies persons on face images or video frames. In a nutshell, a face recognition system extracts features from an input face image and compares them to the features of labeled faces in a database. Comparison is based on a feature similarity metric and the label of the most similar database entry is used to label the input image. If the similarity value is below a certain threshold the input image is labeled as unknown. Comparing two face images to determine if they show the same person is known as face verification.

This article uses a deep convolutional neural network (CNN) to extract features from faces in input images. Keras is used for the CNN implementation, OpenCV and Dlib are for aligning faces on input images. Face recognition performance is evaluated on a small self-created dataset that you can replace with your own custom dataset, e.g. with images of your family and friends.

CNN architecture

The CNN architecture used here is a variant of the inception architecture. More precisely, it is a variant of the NN4 architecture and identified as nn4.small2 model in the OpenFace project. This article uses a Keras implementation of that model whose definition was taken from the Keras-OpenFace project. The architecture details aren’t too important here, it’s only useful to know that there is a fully connected layer with 128 hidden units followed by an L2 normalization layer on top of the convolutional base. These two top layers are referred to as the embedding layer from which the 128-dimensional embedding vectors can be obtained. A Keras version of the nn4.small2 model can be created with create_model().

1
2
3

from model import create_model

nn4_small2 = create_model()

Dataset

The dataset consists of 20 images of 4 identities with 5 images for each people and were put into a separated folder. You can see the dataset folder structure below.


├───image
│   ├───adam_levine
│   │     ├───1.jpg
│   │     ├───2.jpg
│   │     ├───3.jpg
│   │     ├───4.jpg
│   │     ├───5.jpg
│   ├───adele
│   │     ├───1.jpg
│   │     ├───2.jpg
│   │     ├───3.jpg
│   │     ├───4.jpg
│   │     ├───5.jpg
│   ├───ed_sheeran
│   │     ├───1.jpg
│   │     ├───2.jpg
│   │     ├───3.jpg
│   │     ├───4.jpg
│   │     ├───5.jpg
│   ├──taylor_swift
│   │     ├───1.jpg
│   │     ├───2.jpg
│   │     ├───3.jpg
│   │     ├───4.jpg
│   │     ├───5.jpg

After gathering images for the dataset, we crop faces in those image to prepare for the training by running:

1	python face_detect_and_save.py

The above script detects faces on the images then crops and replaces original images in the dataset folder. Note that the images collected for the dataset should have only one face on each image.

To load file for training:

train_paths = glob.glob("image/*")
nb_classes = len(train_paths)
df_train = pd.DataFrame(columns=['image', 'label', 'name'])

for i,train_path in enumerate(train_paths):
    name = train_path.split("\\")[-1]
    images = glob.glob(train_path + "/*")
    for image in images:
        df_train.loc[len(df_train)]=[image,i,name]

We can see people are labeled.

print(df_train.head())

                                                image label          name
0                   image\adam_levine\adam-levine.jpg     0   adam_levine
1         image\adam_levine\adam-levine_editedjpg.jpg     0   adam_levine
2                       image\adam_levine\BBQakzy.jpg     0   adam_levine
3                  image\adam_levine\MI0004052827.jpg     0   adam_levine
4   image\adam_levine\rs_634x951-171107072148-634....     0   adam_levine
5         image\adele\adele-karriere-aus-abschied.jpg     1         adele
6                             image\adele\adele-t.jpg     1         adele
7                               image\adele\adele.jpg     1         adele
8                        image\adele\MI0003568106.jpg     1         adele
9   image\adele\rs_1024x759-180124143107-1024-Adel...     1         adele
10                      image\ed_sheeran\4e9fe179.jpg     2    ed_sheeran
11                       image\ed_sheeran\asdvs23.jpg     2    ed_sheeran
12                    image\ed_sheeran\ed-sheeran.jpg     2    ed_sheeran
13  image\ed_sheeran\ed-sheeran_glamour_16mar17_re...     2    ed_sheeran
14  image\ed_sheeran\GettyImages-800834188-920x584...     2    ed_sheeran
15  image\taylor_swift\0c2f93cb-4151-4c08-be2e-a85...     3  taylor_swift
16                     image\taylor_swift\416x416.jpg     3  taylor_swift
17                     image\taylor_swift\BBL3h40.jpg     3  taylor_swift
18                      image\taylor_swift\csdaf3.jpg     3  taylor_swift
19  image\taylor_swift\taylor-swift-2016-crop-1523...     3  taylor_swift

Face alignment

The nn4.small2.v1 model was trained with aligned face images, therefore, the face images from the custom dataset must be aligned too. Here, we use Dlib for face detection and OpenCV for image transformation and cropping to produce aligned 96x96 RGB face images. Download model shape_predictor_68_face_landmarks and put it in the project folder. By using the AlignDlib utility from the OpenFace project this is straightforward:

from align import AlignDlib
alignment = AlignDlib('shape_predictor_68_face_landmarks.dat')

def align_face(face):
    (h,w,c) = face.shape
    bb = dlib.rectangle(0, 0, w, h)
    return alignment.align(96, face, bb,landmarkIndices=AlignDlib.OUTER_EYES_AND_NOSE)

Training

Embedding vector of each face in the dataset is used as reference for face comparison. The training step will calculate those embedding vectors and save under train_embs.npy.

def load_and_align_images(filepaths):
    aligned_images = []
    for filepath in filepaths:
        #print(filepath)
        img = cv2.imread(filepath)
        aligned = align_face(img)
        aligned = (aligned / 255.).astype(np.float32)
        aligned = np.expand_dims(aligned, axis=0)
        aligned_images.append(aligned)
            
    return np.array(aligned_images)

def calc_embs(filepaths, batch_size=64):
    pd = []
    for start in tqdm(range(0, len(filepaths), batch_size)):
        aligned_images = load_and_align_images(filepaths[start:start+batch_size])
        pd.append(nn4_small2.predict_on_batch(np.squeeze(aligned_images)))
    embs = np.array(pd)

    return np.array(embs)

train_embs = calc_embs(df_train.image)
np.save("train_embs.npy", train_embs)

Analysis (Choosing a threshold)

The similarity between 2 faces can be determined by the Euclead distance between their embedding vectors. Small distance means those 2 faces are alike and vice versa. Here we use the distance.euclidean of the scipy lib.

In the dataset, we calculate the distance between faces in the same folder (match_distances) and between faces in the different folders (unmatch_distances), then determine a suitable threshold to distinguish between match and unmatch.

label2idx = []

for i in tqdm(range(len(train_paths))):
    label2idx.append(np.asarray(df_train[df_train.label == i].index))

match_distances = []
for i in range(nb_classes):
    ids = label2idx[i]
    distances = []
    for j in range(len(ids) - 1):
        for k in range(j + 1, len(ids)):
            distances.append(distance.euclidean(train_embs[ids[j]].reshape(-1), train_embs[ids[k]].reshape(-1)))
    match_distances.extend(distances)
    
unmatch_distances = []
for i in range(nb_classes):
    ids = label2idx[i]
    distances = []
    for j in range(10):
        idx = np.random.randint(train_embs.shape[0])
        while idx in label2idx[i]:
            idx = np.random.randint(train_embs.shape[0])
        distances.append(distance.euclidean(train_embs[ids[np.random.randint(len(ids))]].reshape(-1), train_embs[idx].reshape(-1)))
    unmatch_distances.extend(distances)

import matplotlib.pyplot as plt

_,_,_=plt.hist(match_distances,bins=100)
_,_,_=plt.hist(unmatch_distances,bins=100,fc=(1, 0, 0, 0.5))

plt.show()

We can choose the threshold=1.1.

Test

For each image to test, we have to find the face in the image.

test_paths = glob.glob("test_image/*.jpg")
for path in test_paths:
    test_image = cv2.imread(path)
    show_image = test_image.copy()

    hogFaceDetector = dlib.get_frontal_face_detector()
    faceRects = hogFaceDetector(test_image, 0)
    
    faces = []
    for faceRect in faceRects:
        x1 = faceRect.left()
        y1 = faceRect.top()
        x2 = faceRect.right()
        y2 = faceRect.bottom()
        face = test_image[y1:y2,x1:x2]
        
        faces.append(face)

then calculate the embedding vector of the face.

print("len(faces) = {0}".format(len(faces)))
if(len(faces)==0):
    print("no face detected!")
    continue
else:    
    test_embs = calc_emb_test(faces)

test_embs = np.concatenate(test_embs)

then calculate Euclead distances with faces in train_embs. The id which has smallest distance and smaller than the threshold is the right label.

people = []
for i in range(test_embs.shape[0]):
    distances = []
    for j in range(len(train_paths)):
        distances.append(np.min([distance.euclidean(test_embs[i].reshape(-1), train_embs[k].reshape(-1)) for k in label2idx[j]]))
    if np.min(distances)>threshold:
        people.append("unknown")
    else:
        res = np.argsort(distances)[:1]
        people.append(res)

Results

Show the results:

for i,faceRect in enumerate(faceRects):
    x1 = faceRect.left()
    y1 = faceRect.top()
    x2 = faceRect.right()
    y2 = faceRect.bottom()
    cv2.rectangle(show_image,(x1,y1),(x2,y2),(255,0,0),3)
    cv2.putText(show_image,names[i],(x1,y1-5), cv2.FONT_HERSHEY_SIMPLEX, 2,(255,0,0),3,cv2.LINE_AA)

Reference

http://krasserm.github.io/2018/02/07/deep-face-recognition/