Reinforcement Learning: How To Train Your Machine

We learn a variety of abilities throughout our lives by imitation, trial and error, and intuition. Now, machines can "learn" similarly to us and Reinforcement Learning (RL) holds the key to teaching them.

RL may seem like a new topic, but it actually dates to the early days of cybernetics and work in statistics, psychology, neuroscience, and computer science. Due to its generality, RL is studied in many disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, and statistics.

In simple words, RL makes a model figure out how to complete a task on its own, learning from its past mistakes based on how much it is “rewarded”. It is akin to what we humans call trial and error, which evaluates our performance based on how well we did and determines the next action accordingly.

How does it work?

RL is composed of 4 main elements: the agent, the environment, the policy of the agent and the reward based on actions.

  • The agent (or learner) makes the decision of what action to take in a specific scenario. To make such a decision, the agent observes the environment and any internal rules it has. Typically, the current state is an exogenous variable provided by the environment, which is processed by the agent using a policy function.
  • The environment is the world context in which the agent operates. It is a model or simulation the agent interacts with by taking actions, receiving rewards or penalties, and moving to new states based on those actions. The environment provides the agent with its initial state and the current state after each action. It also determines the reward associated with each state-action pair.
  • The policyt is the exact strategy an agent implies to reach its goal. It dictates the actions the agent is supposed to execute as a function of the current state.
  • The reward is what motivates the agent, so it has an incentive and a form of evaluation when progressing or completing its goal. Without rewards, agents would have no way of evaluating their actions and would be unlikely to make any progress in completing their assigned task or solve a certain control problem. Rewards provide feedback to the agent, allowing it to learn and adapt its strategy.
Bringing together these 4 components, you are able to create a complete RL learning algorithm. The RL problem involves the agent exploring an unknown environment to achieve a goal. It is based on the hypothesis that all goals can be described by the maximization of expected cumulative reward. The agent must learn to sense and perturb the state of the environment using its actions to attain the maximal score.

Advantages of Reinforcement Learning

Compared to other types of Machine Learning, Reinforcement Learning has some clear advantages when it comes to the process of learning and the results that come with it.

For example, conversely to supervised learning, RL lacks the need of large labeled datasets. It's a massive advantage because as the amount of data in the world grows, it becomes more and more costly to label it for all required applications.

Another advantage is the innovative aspect. The fact it can not only imitate the data structure, but come up with a new way to solve a problem altogether puts it ahead of any other algorithm or method when it comes to actual “learning”.

A third advantage is that it's goal oriented. It is not just used to get an output from input data, but as means to obtain a sequence of optimal actions. This is apparent in self-driving cars, or bots in video games.

The biggest edge reinforcement learning has over machine is its ability to adapt. Being able to learn from past mistakes and deduce the best course of action according to your environment is single-handedly the greatest advantage that puts it over any other method or form of learning.

Real-Life Applications of Reinforcement Learning

Automation

Robots are programmable machines designed to perform tasks autonomously. Even if most robots we have don't look like they belong to the Star Wars universe, their capabilities are no less amazing and through RL they can improve their accuracy and speed up their completion of difficult tasks. They can also carry out tasks that humans might find perilous with significantly fewer repercussions. Because of these factors, aside from the fact that they need supervision and ongoing upkeep, robots are an affordable and productive substitute for manual labor.

Robots, for instance, are used by certain restaurants to bring meals to tables. Robots are being used by grocery retailers to detect low shelf areas and place additional goods orders. Thus far, automated robots have been employed in common environments for product assembly, defect inspection, inventory counting, tracking and management, and delivery of goods.

Language

Predictive text, text summarization, question answering, and machine translation are all examples of natural language processing that uses reinforcement learning. By studying typical language patterns, agents can mimic and predict how people speak to each other every day. This includes the actual language used, as well as syntax and the choice of words.

In 2016, researchers from Stanford University, Ohio State University, and Microsoft Research used this learning to generate dialogue, like what's used for chatbots. Using two virtual agents, they simulated conversations and used policy gradient methods to reward important attributes such as coherence, informativity, and ease of answering. This research was unique in that it didn't only focus on the question at hand, but also on how an answer could influence future outcomes. This approach to reinforcement learning in natural language processing is now widely adopted and used by customer service departments in many major organizations.

Image Processing

Have you ever taken a security test that asked you to identify objects in frames, such as “Click on the photos that have a street sign in them”? This is in fact a way to train Ais to become better at detecting objects. A method widely used is the “Captcha” you may see around the web.

When asked to process an image, agents will search for an entire image as their starting point, then identify objects sequentially until everything is registered. Artificial vision systems also use deep convolutional neural networks, made up of large, labeled datasets, to map images to human-generated scene descriptions from simulation engines.

Self-Driving Cars

Various papers have proposed Deep Reinforcement Learning for autonomous driving. In self-driving cars, there are various aspects to consider, such as speed limits at various places, drivable zones, avoiding collisions - just to mention a few.

Some of the autonomous driving tasks where reinforcement learning could be applied include trajectory optimization, motion planning, dynamic pathing, controller optimization, and scenario-based learning policies for highways.

For example, parking can be achieved by learning automatic parking policies. Lane changing can be achieved using Q-Learning while overtaking can be implemented by learning an overtaking policy while avoiding collision and maintaining a steady speed thereafter.

AWS DeepRacer is an autonomous racing car that has been designed to test out in a physical track. It uses cameras to visualize the runway and a reinforcement learning model to control the throttle and direction.

Conclusion

Reinforcement Learning provides interesting and refreshing application possibilities in a wide range of industries and opens up entirely new horizons through the use of neural networks and artificial intelligence. The outlook for the future is bright. As machine learning and artificial intelligence progresses, more potent RL-algorithms will emerge. The increasing scalability and effectiveness RL techniques will make them more widely applicable. Furthermore, advancements in the creation of hybrid approaches— such as Deep Reinforcement Learning (DRL) which merges RL with Deep Learning (DL)—are anticipated to tackle even more challenging issues. Due to all this, RL is set to become a crucial tool for tackling complicated problems.

Ledio Duda

BSc in Economics, Management and Computer Science at Bocconi University.