Face recognition identifies persons on face images or video frames. In a nutshell, a face recognition system extracts features from an input face image and compares them to the features of labeled faces in a database. Comparison is based on a feature similarity metric and the label of the most similar database entry is used to label the input image. If the similarity value is below a certain threshold the input image is labeled as unknown. Comparing two face images to determine if they show the same person is known as face verification.
This article uses a deep convolutional neural network (CNN) to extract features from faces in input images. Keras is used for the CNN implementation, OpenCV and Dlib are for aligning faces on input images. Face recognition performance is evaluated on a small self-created dataset that you can replace with your own custom dataset, e.g. with images of your family and friends.
CNN architecture
The CNN architecture used here is a variant of the inception architecture. More precisely, it is a variant of the NN4 architecture and identified as nn4.small2 model in the OpenFace project. This article uses a Keras implementation of that model whose definition was taken from the Keras-OpenFace project. The architecture details aren’t too important here, it’s only useful to know that there is a fully connected layer with 128 hidden units followed by an L2 normalization layer on top of the convolutional base. These two top layers are referred to as the embedding layer from which the 128-dimensional embedding vectors can be obtained. A Keras version of the nn4.small2 model can be created with create_model().
1 2 3
from model import create_model
nn4_small2 = create_model()
Dataset
The dataset consists of 20 images of 4 identities with 5 images for each people and were put into a separated folder. You can see the dataset folder structure below.
After gathering images for the dataset, we crop faces in those image to prepare for the training by running:
1
python face_detect_and_save.py
The above script detects faces on the images then crops and replaces original images in the dataset folder. Note that the images collected for the dataset should have only one face on each image.
for i,train_path in enumerate(train_paths): name = train_path.split("\\")[-1] images = glob.glob(train_path + "/*") for image in images: df_train.loc[len(df_train)]=[image,i,name]
The nn4.small2.v1 model was trained with aligned face images, therefore, the face images from the custom dataset must be aligned too. Here, we use Dlib for face detection and OpenCV for image transformation and cropping to produce aligned 96x96 RGB face images. Download model shape_predictor_68_face_landmarks and put it in the project folder. By using the AlignDlib utility from the OpenFace project this is straightforward:
1 2 3 4 5 6 7
from align import AlignDlib alignment = AlignDlib('shape_predictor_68_face_landmarks.dat')
Embedding vector of each face in the dataset is used as reference for face comparison. The training step will calculate those embedding vectors and save under train_embs.npy.
The similarity between 2 faces can be determined by the Euclead distance between their embedding vectors. Small distance means those 2 faces are alike and vice versa. Here we use the distance.euclidean of the scipy lib.
In the dataset, we calculate the distance between faces in the same folder (match_distances) and between faces in the different folders (unmatch_distances), then determine a suitable threshold to distinguish between match and unmatch.
for i in tqdm(range(len(train_paths))): label2idx.append(np.asarray(df_train[df_train.label == i].index))
match_distances = [] for i in range(nb_classes): ids = label2idx[i] distances = [] for j in range(len(ids) - 1): for k in range(j + 1, len(ids)): distances.append(distance.euclidean(train_embs[ids[j]].reshape(-1), train_embs[ids[k]].reshape(-1))) match_distances.extend(distances) unmatch_distances = [] for i in range(nb_classes): ids = label2idx[i] distances = [] for j in range(10): idx = np.random.randint(train_embs.shape[0]) while idx in label2idx[i]: idx = np.random.randint(train_embs.shape[0]) distances.append(distance.euclidean(train_embs[ids[np.random.randint(len(ids))]].reshape(-1), train_embs[idx].reshape(-1))) unmatch_distances.extend(distances)
then calculate Euclead distances with faces in train_embs. The id which has smallest distance and smaller than the threshold is the right label.
1 2 3 4 5 6 7 8 9 10
people = [] for i in range(test_embs.shape[0]): distances = [] for j in range(len(train_paths)): distances.append(np.min([distance.euclidean(test_embs[i].reshape(-1), train_embs[k].reshape(-1)) for k in label2idx[j]])) if np.min(distances)>threshold: people.append("unknown") else: res = np.argsort(distances)[:1] people.append(res)
Results
Show the results:
1 2 3 4 5 6 7
for i,faceRect in enumerate(faceRects): x1 = faceRect.left() y1 = faceRect.top() x2 = faceRect.right() y2 = faceRect.bottom() cv2.rectangle(show_image,(x1,y1),(x2,y2),(255,0,0),3) cv2.putText(show_image,names[i],(x1,y1-5), cv2.FONT_HERSHEY_SIMPLEX, 2,(255,0,0),3,cv2.LINE_AA)