From Big Data to Machine Learning
Data has a story to tell - if you know how to look for it! In the past, people have looked for patterns and trends in data. When there is a lot of data, this can be difficult. But computers can speed up the process.
So how much data is out there? Short answer - a lot! People generate a lot of data because of the internet and other communication tools. This is often called “Big Data.” Scientists have had to invent bigger and bigger systems to process all this data. Cloud computing is a good example. It was developed because regular computers could not handle the amount of data they received.
All this data changed the relationship between humans and computers. In the past, humans used computers to help organize and represent data. But humans had to make sense of it. Now, machines are figuring out how to understand and explain vast amounts of data. We call this Machine Learning (ML).
The terms machine learning and artificial intelligence are often used together. But they do not mean the same thing. Machine learning is a type of artificial intelligence.
Where is machine learning used?
Machine learning can be used in any situation that involves large amounts of data. This means almost anywhere! Businesses were among the first to use machine learning. This is because they had the money to invest in the technology, which was very expensive. Now, machine learning is less expensive and easier to access. Many machine learning programs are now shared online as Open source files.
Did you know?
Open Source is the term for when people freely share their code online. People can change open source code to suit their needs or to help improve it.
Autonomous vehicles, medical researchers and marketers all use machine learning. But did you know that machine learning is also used in sports, restaurants and even in writing rap lyrics? The possibilities of ML are endless and we are just learning to use this powerful tool.
How does machine learning work?
There are many types of machine learning (ML). Each has its own strengths and weaknesses. Below is a table that summarizes the different types.
Computers provided with data that is labelled by people
Computers look for patterns in data that are difficult for people to label
Computer creates its own data and supervises its learning
Mostly used when...
We have a known task that is time-consuming
We want to find unknown patterns in data
We have a goal but we may not know the best way to achieve it
Supervised Machine Learning
The first type of machine learning is Supervised Machine Learning. As you might guess, in this type of ML people need to supervise, or train, the computer. Let's look at an example.
Say you were designing a self-driving car. You would want the car to know the difference between different types of road signs. The output could be for the car to know the difference between a stop sign and other types of signs. The goal of the ML system would be to create an algorithm to do this. An algorithm is a series of steps necessary to solve a problem or achieve a particular goal.
To help the computer figure out the algorithm, people need to teach the computer what a stop sign looks like. The computer is first given pictures of stop signs and other road signs. Each picture is given a label, either “stop” or “non-stop”. In computer language, each picture is the input data. The label name, “stop” or “non-stop”, is what we want the computer to be able to identify later. For stop signs, the algorithm may tell the computer to look for the shape of an octagon.
Image - Text Version
The supervised ML system is given a variety of traffic signs as training images. A stop sign is the desired output. The rule is that stop signs have an octagonal shape. The system is given new traffic signs as raw data. After the algorithm, if the algorithm is correct, the system will identify octagonal signs as stop signs and other other signs as non-stop signs.
Once there is an algorithm, AI engineers test it using new data. The algorithm should be able to identify stop sign images that it has never seen before. If it’s not able to, then it needs more training. Sound familiar? It's a lot like learning new things yourself!
Below is a video showing a simple version of supervised machine learning. As you can see, the computer is first shown the required output (images of Waldo). It then analyzes images to match its output data with input from its camera system.
You may wonder how good ML is at doing this. We call the percentage of correct solutions accuracy. For example, if the ML identified stop signs 98 times out of 100, the accuracy would be 98%. To get the most accurate results, the system needs to be given the same amount of data for each object. Imagine a system that is given 98 images of cats and 2 images of dogs. It can recognize cats with a 98% accuracy if it guesses cat every time!
Identifying a stop sign may seem easy for you, but it's hard for a computer. Look at this set of pictures and try to describe them only using shapes and colors!
The previous examples involved using ML to classify things. Supervised machine learning can also be used to make predictions. For example, a company could use an ML to predict how long people will stay with the company. The ML could analyze different information such as education and years of experience. Once the ML creates an algorithm, it can be used when new employees are hired.
The biggest disadvantage of supervised ML is that it needs to have good labeled data for the system to train on. A study about data labeling found that up to 80% of people's time was spent on making sure the labels were correct.
Unsupervised Machine Learning
Unsupervised ML is used to find patterns in data that is difficult to label. One example of this type of data is human speech. All people's speech sounds different. Because of this, it is hard to tell a computer exactly how a word should sound. Unsupervised machine learning can be used to analyze spoken words.
Another example is in medicine. To help treat or cure a disease, scientists may want to figure out if the disease involves specific genes. Genes carry the information that makes you who you are. Each of your cells contains about 25 000 to 35 000 genes. Researchers could use unsupervised ML to look for similarities in the genes of people who have the disease.
To see how unsupervised ML works, let's go back to the self-driving car example. In unsupervised ML, the system is not given training images and the output is unknown. The system takes the raw data and then looks for patterns itself. Once it has determined a pattern, an algorithm is developed. The algorithm can then be used in a similar way to supervised ML.
Image - Text Version
An unsupervised ML system is not given any training images or a desired output. The ML system is given a variety of traffic signs as raw data. the system looks for patterns in the data. Based on the patterns, an algorithm is developed. The algorithm given road signs in different groupings as an output.
Reinforcement or Self-supervised Machine Learning
The third type of ML is reinforcement or self-supervised machine learning. In this type of ML, the machine learns by trial and error. Unlike the other two types of ML, self-supervised ML systems can improve themselves without human supervision.
Below is an example of self-supervised ML in action. This video shows how a robotic arm uses computer vision to toss different objects. The goal is for the robot to correctly throw the object in the bin as quickly as possible.
Again, this might seem simple to us. But a robot needs to be able to consider many things to complete this task. First, it needs to be able to locate and pick up an object. It also needs to consider its gripping force, the force of the throw, as well as the weight and shape of the object. This requires understanding many physics principles. It would be pretty tricky to create a program that takes into account all these things. This is what makes it a good task for self-supervised ML. Having robots use self-supervised ML would be useful in places, such as recycling plants, where robots sort materials.
Another well-known example is when computers beat humans at games. Computers can use self-supervised ML to find the fastest way to win a game. Two computers can even play against each other using self-supervised ML. For example, in the video below you will see how machines discovered a flaw in a game.
There are many areas in which self-supervised ML is used to improve systems. One area is computer security or cybersecurity. Keeping data safe is very important when the data is confidential. This includes data used by banks and the government. To test data security systems, self-supervised ML can pretend to be hackers. This lets people find the flaws in a system before an actual hacker does!
So how could self-supervised ML be used in our self-driving car? Such a system could create virtual driving simulations to test if a car stops if the camera sees a red octagon.
Is there a best type of machine learning?
Choosing the right technique of machine learning will depend on the problem to solve. And combining solutions can help us get even better results. Remember the example above about genes? We could use unsupervised ML to identify a potential gene. We could then use this information to create an algorithm for a supervised ML. We could test its accuracy using input data from people with and without a disease.
Further Thoughts About Machine Learning
We know a lot about how our brains work, but some things are still a mystery. This is a lot like machine learning. It is great that the machines do what we want. But it is not enough. We also want to understand how they work. Without knowing how they make decisions, how can we know that the decisions they make are fair and ethical? This is especially true if people use ML with data about the general public. Being able to explain how ML works is called transparency or explainable artificial intelligence.
You may also be thinking, if machines can learn, will humans still be needed? The answer is yes! A machine learning algorithm is only as good as its data. This is why many experts need to make sure that data used by ML models is accurate and appropriate. We also need qualified people to make sure these technologies are used wisely and fairly. Many experts are currently working on this, but even more will be needed in the future.
But Let’s end on this great note, Abu’s story…