How Machines Beat Humans at Everything

Watch our video submission for the IJCAI-21 AI Video Competition (adapted)

You have probably heard that the world champion of chess, go, and even some video games like Dota is a machine. Recent progress in Artificial Intelligence allowed researchers to defeat the best human players in the world in these games, thanks to a technique called Reinforcement Learning. This same technique also allowed robots to walk, open doors, or even play soccer. But what is this technique exactly? This short article aims to introduce the basics of this technology and provide an overview of how it works.

How Does it Work

Reinforcement Learning is inspired by living beings: optimizing both positive and negative rewards.

Reinforcement Learning is a technology inspired by living beings. Living beings, in general, are learning certain behaviors to obtain rewards or avoid punishment. If you are eating something tasty, you may want to eat it again. If you are touching a hot stove, it is quite likely that you’re not going to want to do it again. Reinforcement Learning is about doing the same thing: teaching machines how to obtain positive rewards and avoid negative rewards. We call these machines “agents.”

An agent evolving in its environment (a soccer field).

These agents evolve in an environment. They are going to observe this environment and take action based on these observations. Depending on the result of their actions, they will be given a reward, either positive or negative. At first, the agent will behave randomly, but it will become better and better through trial and error. In other words, they are learning to maximize the amount of reward they are getting throughout their life.

A Simple Example…

Let’s have a look at a simple example. You are on an imaginary line, with a cake ready to eat on one side and a burning fire camp on the other side. What would you do in this situation? Typically, your answer would be to walk straight for the cake. Otherwise, you will get hurt by walking in the fire camp. But how is a computer going to know this and learn the same decision-making process? Through trial and error!

As we discussed, at first, the agent is going to behave randomly. Half of the time, it will go to the left and the other half to the right. But at one moment, it will reach one of the rewards, either positive and negative.

A simple example of reinforcement learning where the agent explores its environment to maximize rewards.

At this moment, the agent learns that going to the left hurts, or in reverse, if it was lucky enough, it learns how great a cake tastes. That’s it! Once it learns about these rewards, it can have optimal behavior in this environment and go directly straight to the cake each time.

This is a simple example, as the only possibility of the agent is to go right or left. However, usually, it would have many more possible paths. Even if it already found a good reward in such a complicated environment, it needs to keep looking for better rewards. In other words, maybe a bigger cake is waiting for us over the next corner, so from time to time, we need to take the chance and have a look.

We can make a comparison with the real world. If you’re like me, you are used to ordering the same pizza at the same pizzeria regularly, but what if you try a new one once in a while? You may appreciate it even more and decide that it’s your new favorite. You would’ve never discovered this improvement without trying something new, even if you already enjoyed the taste of the first one.

Conclusion

Of course, not every scenario is as simple, but in Reinforcement Learning, every problem can be seen this way. The only change with each new challenge the agent will face is the kind of environment it will evolve in. Whether it is a chessboard, a video game, or even the motors’ states of a robot learning how to walk, the logic is the same: the agent tries things, sees how the environment reacts to his actions, and adapts to do better in the future. You can see reinforcement learning as machines learning in a Darwinism way.

If this makes sense to you, congratulations! You now understand what reinforcement learning is and how it works. There are, of course, more technical details, but this is the gist of this astonishing technique, with amazing capabilities.

Article by

— Malrick Costantini, LinkedIn, website. If you like this article, you can also check Malrick's other work, including this website (DealTracker) that compares second-hand products!

— Elias Ilmari, LinkedIn, website

— Louis-François Bouchard, LinkedIn, website

Come chat with us in our Discord community: Learn AI Together and share your projects, papers, best courses, find Kaggle teammates, and much more!

If you like my work and want to stay up-to-date with AI, you should definitely follow me on my other social media accounts (LinkedIn, Twitter) and subscribe to my weekly AI newsletter!