Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 1.14 KB

objective.rst

File metadata and controls

33 lines (23 loc) · 1.14 KB

Objectives

Objectives are pipes that take in the reward signal generated by the environment and transform it into an objective for the agent to optimize toward. This enables more complex training behaviour, but is not always necessary. q2 comes with a Passthrough objective that simply passes the reward that the environment yielded directly to the agent. Passthrough is used by default unless you override it with your own.

Run:

q2 generate objective my_objective

to start working on your own objective. You need to implement the following interface:

.. py:class:: q2.objectives.Objective

    An abstract base class (interface) that specifies what the environment
    must implement. You will fill in your own definitions for each of these
    methods.

    .. py:method:: reset()

        This is called by the training regimen before each episode. It gives
        you the opportunity to wipe any accumulated state.

    .. py:method:: step(state, action, reward) -> float

        Accepts a ``state`` and ``reward`` from the environment, an
        ``action`` from the agent and returns the new objective for the agent
        to learn.