A thorough look at the face recognition technology
Thirty years ago face recognition was still a domain of science fiction, ten years – looked technically possible, yet practically almost unusable due to both budget and engineering problems. Today as processing powers grows and price of individual cameras drops face recognizing slowly but surely becomes a very sensible solution for such public places as shopping malls or subway stations. Which means it is right time for us to look into this technology and get acquainted with it's main uses.
To avoid any confusion let's start with getting our terms straight. Detection, recognition, verification and identification – these are all different operations.
Face Detection – is a process of finding objects that resemble a human face inside a shot. This function would be very familiar to any social networks user who experienced uploading a photo: the algorithm encloses faces into frames and suggests highlighting people from your friend list. And even the popular mobile apps that add cartoon elements over one's real features use the same basic principle.
Face recognition– first it performs a number of "housekeeping chores" like extracting a face out of the initial shot, reformatting it to standard scale and crop, standardizing the color scheme, usually to greyscale. Then algorithm proceeds to create a "landscape map" of the face and encode the image into digital data that is ready for comparison with other database entries.
Face verification – remember those classic spy action movies like Mission Impossible where there are always some doors with multi-level access check: fingerprints, voice, numeric passcode? And usually one of those steps includes a scanner sweeping colorful rays over our heroes faces. Well, that's nothing more than a cool looking visual for the facial verification process. The system compares that specific face with the one in the database 1 to 1 and returns it's verdict on whether the two match.
Face identification – unlike the previous step, identification process becomes necessary when the database includes multiple photos for matching. In that case the algorithm goes through the library looking for entries with high correlation. A good example would be a police system that checks every public place visitor, comparing them to the wanted criminals list.
Now let's get a closer look into the recognition process itself. One of the earliest yet still popular patterns include pixel analysis protocols like LBPH (local binary patterns histogram). The main thing to realize about computer recognition is that people are not standardized parts from an assembly line, so a 1 to 1 comparison down to the level of individual pixels is impossible as such. The solution is in the concept of regions – which means breaking down the shot into standardized pixel groups (3x3 or more). The algorithm calculates the color intensity for each group compared to the neighboring groups then turns this map into a histogram. The advantage in using histograms is that there are a number of different mathematical functions (absolute value, Euclidean distance, chi-square) that can be used to look at the correlation between them. Which would drastically speed up the process of going through a large database and finding the matches between thousands of photos.
Recognition by features
Another standard recognition workflow is operating at the level of facial features. Here the algorithm finds the key areas: eyes, nose, mouth. Then it calculates the relative sizes and distances between them and proceeds to the database search for faces with similar proportions. This mechanics is considered less precise, but in practice it proves to be more widely applicable due to much lesser dependence on situational factors like camera angle or lighting.
The Elastic Graph Matching method includes elements of both previous approaches. The base face is encoded as a data grid, which can be either regular (2D) or bent to represent the "terrain" of the face (3D). Similar grid is projected onto the face that undergoes recognition protocol, and it then gets deformed in steps until it matches the base one. In this case the rate of deformation becomes the variable that is used to calculate the match correlation between the recognized and the base faces.
Seeing how rapidly neural networks captured other areas of computing, this approach also found it's way into recognition technologies as well. The advantages and disadvantages of this method (or rather, group of methods, since today there are many specific recognition algorithms based on neural networking) are both predictable and typical for neural networks in general. They require much more resources, especially at a learning stage, but show better flexibility in dealing with problems like change in lighting and camera angle. The most famous example of a face recognizing NN today is the DeepFace system, bought and integrated by Facebook.
Areas of application
The first and main sphere that will definitely be revolutionized by face recognition technologies in the next few years is of course the law enforcement. The results of police test runs of new surveillance systems are already evident. E.g. in Moscow's subway more than a hundred wanted criminals a year are being caught with the help of automatically processed surveillance footage – and this is the result of merely a test program which includes little more than a thousand cameras. It was acknowledged as such a success by the city government that the plan for 2019 included connecting 105 thousands cameras to the system!
Going all the way back to the spy genre and grand robbery type movies like Ocean's Eleven, another type of security implementation for the technology is limiting access to certain areas. Facial scans could work as a substitute or even in combination with passcodes, access control booths and other more traditional mechanisms. This method will come in handy not only in the military, but on all kinds of sensitive sites including science facilities, industrial objects and even in the private sector.
As someone once said, a persons first priority is I don't want to die, while the next one I don't want to die poor. So it's no surprise that the next in line for the new face recognition technology is business and commerce. At the moment this branch is at it's infancy, yet the potential is immense. After a camera at a mall entrance sees an established client, their whole journey within the mall can be customized. Their personal preferences subtly leads their way with help of conveniently changing digital screens, environment audio or even aromas spread around perfume boutiques and restaurants.
Law and ethics
Same as any new industry that goes through it's early growing pains, for now the world of facial recognition mainly exist as a Wild West style frontier. Specific legislation will surely be formulated, but as it stands for today, even the most developed countries don't have specific rules to regulate these interactions. Of course, people are trying to appeal to previously existing laws that defend private life and personal information. E.g. a Russian petition on Change.org to deny police the use of camera footage without a court order already has more than 50 000 signatures. It's creators are referring to paragraph 23 of the Constitution, stating that every person has right to keep their personal data private. But the actual interpreting of the existing laws to understand how they project onto new reality is, for now, the territory claimed by whoever's in charge on each specific case. Which, especially in the countries without precedent law, leads to a wide variety of views and resolutions.
For now, it's not so much legal but ethical side of the story that is at the heart of public discussion. And it seems appropriate to notice that here a technology of the future brings back a reality from the past. In urbanized megalopolises for decades society have been going down the path of anonymization if not de-humanization of a person in the crowd. For very long we've got used to not recognizing and not being recognized. Will a robot noticing our face in the crowd play a similar role to a neighbor in a small town asking us about the health of the family on our way to work?
Chief Technology Officer, Co-Founder, IT Manager with vast experience in software development, managed IT services, with stress on video-processing and AdTech. Please write me on firstname.lastname@example.org.