The agents were trained based on the provided dqn exercises and achieved an average score of 13 or higher. I ran the code in the workspace provided by Udacity. The installation steps for the review requirements were skipped.
Environment: State: 37 discrete spaces[37]. Action: Move forward, move backward, turn right, turn left [4]. Reward: 1. +1 if you get a yellow banana, -1 if you get a blue banana
Using the Bellman operator, we obtain an action value function from historical data. The approximation function of the action value function is a neural network. The input is the state and the output is the action. The value is obtained by reducing the error between the approximation function and the target function. Double DQN is not applied.