The use of computer vision is becoming popular in various areas – payment by face biometrics in retail or public transport, urban video surveillance, access control systems at socially significant and industrial facilities, analysis of the condition of drivers, determining personal protective equipment in production, banking and government services.
What is computer vision
Computer vision is a part of applied mathematics. The task in computer vision is set as follows: using the data of a photo or video, answer all the same questions that a living person would answer. There are algorithms for solving problems such as searching for objects in an image (for example, people’s faces), classifying and recognizing them, tracking their movement in the frame, and recognizing actions. But current algorithms, unlike humans, are not as good at recognizing context and highlighting causal relationships.
In order for a computer to learn to “see” and solve a particular problem, machine learning algorithms are used. For this, large data sets are collected, within which objects, features, or combinations thereof are distinguished.
How it works
One of the special cases of computer vision, which is increasingly encountered in our lives, is face recognition. Biometrics is used to pay in cafes, confirm banking transactions, and when registering on financial and government services portals.
Such a system works as follows. The program analyzes the image coming from the cameras for the presence of faces in the frame. When a face is detected, the algorithm for tracking it in the video stream is launched – the program determines which of the video frames the face is captured in the best quality and angle. On this frame, using the algorithm, the key points of the face are found, allowing to determine its orientation in the image. The portrait, “rotated” with the help of the key points of the face to the desired standard position, is sent to the recognition service.
From the standard JPEG format, it is converted into a descriptor – a set of immutable face parameters that is used for subsequent comparison of the image with another image. Then the program compares the two descriptors and gives an answer whether the person who got into the frame is included in the database. The task of the recognition algorithm is to ensure that the descriptors obtained from images of the same person are similar, but different for different people.
How technologies are created
Modern computer vision systems are based on machine learning algorithms, usually neural networks. To train them, you need a large number of images with labels of what objects are on them. During the learning process, the network itself determines the elements that it will look for in other images in order to recognize them with a minimum number of errors. Sometimes they coincide with the details we are used to (if the neural network decides that noses help it recognize dogs, it will remember them), but most often its choice is not interpretable.
Actual tasks of technologies
One of the tasks that the pandemic has set for biometric systems is face recognition in masks. This has become a big challenge for the market, since their presence reduces the accuracy of standard algorithms by 20-50%. The main problem in such cases is a significant decrease in the amount of information from which you can build a good representation of the face.
Accoding to DZOptics, the process of face recognition in a mask is no different from the usual one. The system finds a face, aligns it with key points, and generates a descriptor (biometric template). However, each of these stages becomes more difficult – with a mask, it is harder to find the face and its key points in the image, so the final descriptor is less accurate. To solve this problem, a lot of work is needed to form data sets for training neural networks, they take into account the presence of various attributes in a person – glasses, hats, makeup, masks, etc. As a result, you can get a single algorithm that can work simultaneously both in public with and without masks.
How to determine the best algorithm
To compare various biometric technologies in terms of accuracy, it is customary to evaluate by indicators:
- errors of the first kind – the proportion of false-negative (FNR, False Negatives Rate or FRR, False Rejection Rate) positives from the total number of positive requests;
- errors of the second kind – the proportion of false-positive (FPR, False Positives Rate or FAR, False Acceptance Rate) responses from the total number of negative requests.
For example, if we consider the process of unlocking a mobile phone using face recognition technology, then FNR is the probability that your phone does not recognize you, and FPR is the probability that an attacker can unlock your phone with his face.
In addition, it is important to consider the speed, cost, security, scalability, and usability of the solution.
The industry benchmark is the NIST Face Recognition Vendor Test Ongoing, which is conducted annually by the US National Institute of Standards and Technology on large closed datasets. Its goal is to identify the most versatile face recognition algorithms that work equally well in a variety of application scenarios: Visa, Border, Visaborder, Mugshot. Their difference is that testing is carried out on different types of images: a photo for documents, pictures during passport control at the airport, a portrait for a file cabinet. The typical level of accuracy of modern face recognition systems is FNR=0.003 with FMR=0.000001.
Why technology has taken off now and what will happen next
The breakthrough growth of technology directly depends on the amount of training data and the availability of computer power. Now all these conditions have developed, and the world has super-powerful computers and all the necessary resources to solve the most complex problems. If 10 years ago there could be several thousand adjustable parameters in a neural network, now there can be several million.
The amount of available data for training neural networks has also increased by an order of magnitude. In addition, research is actively ongoing in the field of training new computer vision algorithms – every year thousands of scientific publications are published describing new approaches to solving a particular problem that improve the accuracy of the algorithm.
One of the main problems of neural networks is the requirement for a large amount of labeled data for their training. There is a trend towards the creation of such algorithms, for the training of which you can use data without markup or only a small part of it. Much attention is paid to improving the efficiency of computer vision algorithms for running on devices with a small amount of computing resources (for example, mobile phones, camcorders and other smart devices). We can also expect more complex solutions to parse more object attributes in context.