Is there an easy way to train PPO offline using the current API, i.e. pre-collecting state, action, reward tuples and training on those (as opposed to using a memory agent with observations)?
The Runner documentation on https://tensorforce.readthedocs.io/en/latest/runner.html says you should be able to call something like agent.observe(state, action, reward, terminal_state)
Is there an easy way to train PPO offline using the current API, i.e. pre-collecting state, action, reward tuples and training on those (as opposed to using a memory agent with observations)?
The Runner documentation on https://tensorforce.readthedocs.io/en/latest/runner.html says you should be able to call something like
agent.observe(state, action, reward, terminal_state)