Bellman Equations in RL

Bellman concepts refer to state that is numerical representation of what an agent observes at a particular point in an environment, action input and reward feedback.

Bellman equations attempt to answer questions such as:

Agent in state S. All possible actions are taken. What long term reward can the agent expect? Or what is the value of state the agent is currently in.

Bellman Equations are RL algorithms used for deterministic environments. V(s) C max a (R (s,a) +GammaV(s’) )

In Bellman Optimality Equations, there are large state spaces. It is difficult to solve explicitly. Here Dynamic Programming is used. The problem is broken into simpler sub-problems. A look-up table is created to estimate the value of each state.

There are two classes of Dynamic Programming — Value Iteration and Policy Iteration.

Q-learning combines the policy and value functions, and tells us how useful a given action is jointly to obtain some future reward.

Quality is assigned to a state-action pair as Q (s, a) based on the future value expected. The agent learns this Q-Function. It then searches the best possible action at state (s) that yields highest quality.

print

Leave a Reply

Your email address will not be published. Required fields are marked *