Execute test.py and subsequently plot.py to plot the taken actions, rewards and states.
Exchange environment from SineEnv (learning does not work) to from inverted pendulum (learning does work).
Goal, train an agent to represent a 1-D transfer function from Input to Outpout and follow a given Trajectory ( Out_is(t) = const* Input_in(t), Out_wanted(t) = Sine (t) )