AI and Computer Vision

Let's Talk Science
Readability
6.2

Learn about how computers see and learn to recognize objects and human faces.

Visual and Facial Recognition Technology

Throughout history, humans have developed machines to do work for us. More recently, this has included machines that imitate our senses, like our vision. Vision recognition technologies are technologies that can see and label things. These technologies let machines, robots, and apps see and understand the world as we see it.

Computer vision (CV) is a type of computer engineering. It involves teaching computers to "see" digital images such as photos and videos. Engineers who work in this field have a variety of tasks. One thing they do is to find ways to use digital cameras with devices and computers. They also find ways to teach computers to recognize images and videos. This is done through coding or machine learning

There are different types of computer vision. They depend on what the computer is trying to identify. The computer may look for text, images or faces. We will look at these three in detail.

Image showing the relationship between different Artificial Intelligence and different aspects of Computer Vision
Image showing the relationship between different Artificial Intelligence and different aspects of Computer Vision (©2021 Let’s Talk Science. Based on an image by deepomatic). 

Optical Character Recognition

Optical character recognition (OCR) is a technology used to look for text. The text may be handwritten or in typed documents.

Let’s see how it works for handwriting.

The first step in OCR is taking pictures of people's handwriting. These pictures are then scanned into a computer. Next, people match handwritten text with the characters on a computer. A character is any letter, number, space, punctuation mark, or symbol. This teaches the computer which handwriting goes with which character. Now the computer will be able to identify and match handwriting with text. This is an example of supervised machine learning. Supervised machine learning involves giving data labels. In OCR, machines learn to identify characters using many labelled images of handwriting. This provides the machine with patterns to look for.

Let’s take the example of the numeral one (1). Rules can be set to look for the following patterns in how humans write the numeral 1.

Pattern rules:

  • Often found close to other known numerals.
  • A long straight vertical line, e.g. l
  • An optional short line that hangs from the top backward at 45 degrees, e.g. 1
  • An optional short horizontal line centred on the bottom, e.g. 1
Illustration of handwritten variations of numeral one
Illustration of handwritten variations of numeral one (Source: Daranz via Wikimedia Commons).

Try this!

How would you describe a pattern for the numeral 3? or the numeral 9?

These types of pattern rules are written as computer code. The code includes a step by step set of instructions, or algorithm. Once a computer has a code, then an OCR program can translate handwriting into computer text.

OCR technologies are now found in some smartphone apps. These apps take photos of your handwritten notes. They then convert them into electronic text. Being able to handwrite our notes and turn them into text is much easier than typing on a small device. Turning visual information, like your handwritten notes, into text data has many advantages. Text data can be searchable, it can put into categories and it takes up a lot less memory on your phone or computer!

Image showing how OCR converts handwriting into typed text
Image showing how OCR converts handwriting into typed text (Source: Piscine via iStockphoto).

 

Object and Visual Recognition

Many manufacturing processes involve machines and robotic systems that detect and recognize objects. Object detection can be as simple as a sensor that uses light to see if an item has passed by. Think of a labelling machine. It detects if a box moving along a conveyor belt is in the correct position. When the system ‘sees’ that the package is in the right location, it prints a label on it.

Today people are developing even more complex visual recognition systems for robots. These let the robots better identify and handle objects. It is important that these systems come close to matching human abilities. For example, a robot needs to recognize and adjust its grip one way for a paper cup and a different way for a glass cup.

Simple visual object detection systems detect where something is. This is like the back-up camera in a car. It uses object detection sensors and cameras to detect objects. But it doesn't tell the driver what the objects are.

Image recognition systems figure out what objects are. This is one of the most important systems in autonomous cars. Like other cars, autonomous cars need to be able to detect objects. But they also need to decide what to do depending on the object and situation. For example, if the car recognizes a stop sign, it needs to stop. But if a car detects a person, it needs to analyze where that person is and what that person is doing. Is the person safely on the sidewalk? Is the person crossing the street? You can imagine that this system needs to be really good at what it does!

Self-driving car and other vehicles on a highway
3D image of self-driving cars. The rectangles show where the car detects other vehicles (Source: 3alexd via iStockphoto).

Autonomous cars are not the only systems that use image recognition. The smartphone app PlantNet is another example. It lets people find out information about plants. Using your phone, you take a picture of the plant. The image recognition system compares your image to many other images of plants. It then suggests what your plant is. Leafsnap and Florist are similar apps that help people to identify trees and flowers from images or their camera.

AI powered flowers identification video (0:28) by Robert Poplawski (2018)

 

Facial Recognition Technologies

Facial Recognition Technology (FRT) is a technology that identifies human faces. The process they use is like the way humans recognize each other. A computer's facial recognition system is like your facial recognition system. You see someone's face with your eyes. A smartphone takes an image of someone's face with its camera. Your brain takes the features of the face and stores it in your memory. This is what lets you remember people later. A computer does the same using algorithms.

Faces are unique. Like a fingerprint, we can measure and compare them. The term for measuring biological features is biometricsFacial biometric software measures and maps parts of a face. This includes things like the shape and colour of eyes, noses, mouths and chins. We call these measurements nodal points. A geometric map of a person's face needs about 80 nodal points.

Facial recognition showing nodal points and measurements
Facial recognition showing nodal points and measurements (Source: Grafissimo via iStockphoto).

 

The image and nodal points are then written as code. We call this code the faceprint of facial signature. Once a faceprint exists, it can be compared to other faceprint codes in a database of pictures. Faceprints are pretty unique, but they are not as unique as an Iris scan or Iris print. An iris scan is an image of a person's iris. The iris is the coloured part of your eye. Your iris is unique to you, like your fingerprint. This makes it a good means of identification.

Did you know?

Iris scanners use around 240 nodal points.

Many areas now use FRTs. The main area is in security. Some smartphones and locks use faceprints or iris prints instead of passwords. The advantage of using yourself is that you don't need to remember your password!

Law enforcement can use FRT to identify criminals from surveillance video footage. Governments could use FRTs to confirm a person's identification. It could also be used when issuing passports or at borders and airport security. Unlike your face, your iris doesn't change over time. So it can be used to identify you throughout your life. But iris prints are not as easy to take as faceprints.

Facial recognition being used at the airport
Facial recognition being used at the airport (Source: izusek via iStockphoto).

Concerns About FRT

FRT is pretty good, but it is not always accurate. One problem is that the pictures and videos we take may not be clear. Photos taken in poor lighting can affect the ability of FRT to make a positive match. Changes in glasses, jewelry, and facial hair can also affect FRT. In those situations, the matching results can be wrong. New software for both 2D and 3D images captured from video are improving FRT. Some systems even allow for changes in hair or things people use to disguise themselves. These improvements will help make FRT more accurate.

Another issue with FRT is the quality of the data given to computers. Algorithms used to analyze biometrics are given thousands of pictures of people. But sometimes computers are not fed enough data on certain groups of people, like minorities. This can lead to false identification. If used in a law enforcement, it can have serious impacts on the lives of people. This is why we need to be careful when using technologies such as FRT for identifying people.

Privacy is a big concern when it comes to FRT. What we look like is a big part of our identity. In some cases, we are okay with others having images of us. This includes groups like the government who provide us with photo identification. What we do not want is people using images of us without our knowledge or permission. For example, some cities in China use FRT for shaming people. The names and pictures of people who break the law are shown on big screens. However, in North America, some cities are already banning facial recognition.

Video about AI in China (2018) by the Washington Post (2:49 min.).

One place where you need to be careful of FRT is with social media. Did you know that when you post a picture on social media, you are giving them permission to use it for their own purposes? Probably not. FRT allows these companies to collect and match faces with names. What they do with the information is not always clear.

More and more object and vision recognition systems are coming into our lives. These technologies can provide us with security and let us do things we could not do before. But we need to be aware that these technologies could also affect our freedom and our privacy. It is up to you how to control how much information you share about yourself. This includes your face.

There are some things you can do. You can be thoughtful about who takes pictures of you and where they are posted. And you should always read the privacy policy for any social media platform you use. You should also pay attention to the news about your country’s regulations on privacy. Being an informed citizen is always a smart choice! 

 

Learn More

Deep Learning for Robots: Learning from Large-Scale Interaction 

On this page with multiple videos, you can learn about a real project that uses machine learning and computer vision so that robotic arms learn to correctly recognize and adapt their grasp to different objects.

Computer Vision: Crash Course Computer Science #35 (2017)

This Crash Course video (11:09) from PBS explains what is computer vision and how it works.

What facial recognition steals from us

This video helps understand how facial recognition works and its uses and dangers. 

What’s Going On With Facial Recognition?

This video (7:31) by Untangled presents some concerns of facial recognition.

References

Bonsor, K. & Johnson, R. (n.d.) How Facial Recognition Systems Work. How Stuff Works.

Electronic identification (n.d.) (2020) Face Recognition: how it works and its safety.

Panda Security. (2019, October) The Complete Guide to Facial Recognition Technology

Symanovich, S. (2019, February 8th). How does facial recognition work? NortonLifeLock.