Cartpole

REINFORCE

REINFORCE is a simple RL algorithm(perhaps too simple), which uses stochatic policy gradient function to find out which action would have the highest probability to give the maximum reward in a particular state.

A policy gradient method is a reinforcement learning approach that tries to directly learn a policy by generally using a parameterized function as a policy function (e.g., a neural network) and training it to increase the probability of actions based on the observed rewards.

REINFORCE is the simplest implementation of a policy gradient method; it essentially maximizes the probability of an action times the observed reward after tak- ing that action, such that each action’s probability (given a state) is adjusted according to the size of the observed reward. REINFORCE is an effective and very simple way of training a policy function, but it’s a little too simple. For CartPole it works very well, since the state space is very small and there are only two actions. If we’re dealing with an environment with many more possible actions, reinforcing all of them each episode and hoping that on average it will only reinforce the good actions becomes less and less reliable.

Training examples:

Initial Stages

Here the pole keeps falling of pretty soon when the training is in the initial stages.

After training

Now the cart is able to travel further without the pole dropping. This distance can be increased by making the network a bit deeper(I have used a very simple sequential network), and by increasing the number of maximum steps from 200 to a greater number. For this you can edit the MAX_DUR variable.

Evolutionary Algorithm: Genetic Algo

Evolutionary algorithms provide us with more powerful tools for our toolkit. Based on biological evolution, we
- Produce individuals
- Select the best from the current generation
- Shuffle the genes around
- Mutate them to introduce some variation
- Mate them to create new generations for the next population
Evolutionary algorithms tend to be more data hungry and less data-efficient than gradient-based approaches; in some circumstances this may be fine, notably if you have a simulator.
Evolutionary algorithms can optimize over nondifferentiable or even discrete functions, which gradient-based methods cannot do.
Evolutionary strategies (ES) are a subclass of evolutionary algorithms that do not involve biological-like mating and recombination, but instead use copying with noise and weighted sums to create new individuals from a population.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
EvolutionaryAlgos.ipynb		EvolutionaryAlgos.ipynb
LICENSE		LICENSE
README.md		README.md
cartpole.py		cartpole.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cartpole

REINFORCE

Training examples:

Initial Stages

After training

Evolutionary Algorithm: Genetic Algo

About

Languages

License

yashk2000/Cartpole

Folders and files

Latest commit

History

Repository files navigation

Cartpole

REINFORCE

Training examples:

Initial Stages

After training

Evolutionary Algorithm: Genetic Algo

About

Topics

Resources

License

Stars

Watchers

Forks

Languages