Reinforcement learning is an area in Machine Learning (ML). Its aim is to maximize reward in a particular situation. In RL, either the software or machines have to choose the best possible behaviour or decide about the path to be taken.
Reinforcement learning (RL) is different from the supervised learning where the training data has the answer key or the model is trained with the correct answer. There is no training dataset in reinforced learning (RL), and it is bound to learn from experience.
In reinforcement learning, the reinforcement agent has to decide what to do in a particular situation to perform a given task. Between the agent and reward there are hurdles. The agent is supposed to find the best possible path to reach the reward.
Thus RL is a feedback-based ML technique. An agent learns to behave in a particular environment while performing action and seeing results of these actions.
The good actions amount to positive feedback and the bad actions to negative feedback or penalty. Thus the agent learns automatically using the feedback without any labelled data, unlike supervised learning. In the absence of labelled data, the agent learns by experience only.
RL solves a specific type of problem where decision making is sequential and the goal is long term, e.g. game playing, robotics etc. In RL, the agent interacts with the environment, and explores it by itself with a view to having best performance by getting maximum possible rewards. It learns by hit and trial, based on experience. The agent is an intelligent agent — computer programme.
The core part of AI here is that the agent works on the concept of RL. There is no need to pre-programme it. It learns by experience. And there is no human intervention.
A typical RL problem is that of a maze. A robot wants to get Kohinoor diamond. It has to avoid the hurdles of fire. It tries all possible paths. It has to choose a path with the least hurdles. A right step earns it a reward, and a wrong one substracts a reward.
Terms used
An agent is an entity that can perceive and/or explore the environment.
Environment is a situation where the agent is present or is surrounded by. The environment in RL is supposed to be stochastic or random in nature.
Action are the moves taken by an agent within the environment.
State is a situation returned by the environment after each action taken by an agent.
Reward is the feedback returned to the agent from the environment to evaluate the action of the agent.
Policy is the strategy by the agent for the next action based on the current state.
Q-value is similar to value but takes one more parameter as a current action.
Key Features
In RL, agent is not instructed about the environment, and what actions to take. It is hit and trial based. The agent takes next state and changes states in the light of feedback.
Salient Features
Input denotes the starting point of the model. Output could be more than one, as there are many solutions to the problem. Training is based on input. The model is either rewarded or punished depending upon the state it returns. The model continues to learn. The best solution is one that gives maximum reward.
Distinction between RL and SL
RL is sequential decision making. SL decision is based on initial input or the input at the start.
RL decision is dependent. SL takes independent decisions.
The chess game illustrates RL. Object identification illustrates SL.
Types of Reinforcement
Positive when an event occurs due to a particular behaviour. Negative when negative condition is stopped or avoided.
Implementation of RL
Value-based : Find the optional value function or maximum value at a state under any policy. Here the agent expects long-term return at any state under policy TT.
Policy-based: Optional policy for maximum future rewards. Here the agent tries to apply such a policy that the action performed in each step facilities maximum future reward.
The subcategories are deterministic where the action is produced by the policy ( TT ) at any state or stochastic where probability determines the produced action.
Model-based: Here a virtual model is created for the environment. The agent explores the environment to learn from it. There is no particular solution or algorithm for this approach as model representation is different for each environment.
Applications
RL is applied in robotics, automation, ML and data processing and customised training.