- Reinforcement Learning with TensorFlow
- Sayon Dutta
- 81字
- 2025-02-23 17:35:22
The policy model for optimality
Policy is defined as the model that guides the agent with action selection in different states. Policy is denoted as .
is basically the probability of a certain action given a particular state:
data:image/s3,"s3://crabby-images/308b0/308b08016fc4e4426a185889b244a89bf55c1d9d" alt=""
Thus, a policy map will provide the set of probabilities of different actions given a particular state. The policy along with the value function create a solution that helps in agent navigation as per the policy and the calculated value of the state.