In [1]:
from rich.pretty import pprint

## The Theory Benchmark
Let's take a look at the theory benchmark. The task is to control (1+1)-RLS optimally as given by the theory. First, let's create an instance of the benchmark:

In [2]:
from dacbench.benchmarks import TheoryBenchmark
bench = TheoryBenchmark()

Now let's take a look at the elements of the config in this benchmark:

In [3]:
pprint(list(bench.config.keys()))

The 'benchmark_info' tells us some things about this benchmark already:

In [4]:
pprint(bench.config["benchmark_info"])

The 'config_space' specifies which actions are taken, i.e. which hyperparameters are configured. We can see that we're configuring a single integer valued between 0 and 5:

In [5]:
pprint(bench.config["config_space"])

The reward in this task has the following reward range:

In [6]:
pprint(bench.config["reward_range"])

Finally, the cutoff shows how many steps of the sequence are necessary for solution:

In [7]:
pprint(bench.config["cutoff"])

The config also contains some standard keys like the seed, instance set or observation space config. The observation space usually does not need to be configured at all while the seed should be varied between runs. 'discrete_action' and 'action_choices' are benchmark-specific: they determine whether the problem should be discretized and which action choices are available.

## Theory Instances
Now let's take a look at how a theory instance looks. To do so, we first read the default instance set and look at its first element:

In [8]:
pprint(bench.config["instance_set_path"])
bench.read_instance_set()
pprint(bench.config.instance_set[0])

As you can see, the instance is very simple: the problem size as well as the initialization is specified

## Running Theory
Lastly, let's look at the theory benchmark in action. The default state contains the problem size and the last function value:

In [9]:
env = bench.get_environment()
pprint(env.reset())

Now let's take a step:

In [10]:
action = env.action_space.sample()
state, reward, terminated, truncated, info = env.step(action)
pprint(state)

We also get a reward and termination and truncation signals. Truncation will be set to true after the number of steps exceeds the cutoff.

In [11]:
pprint(f"Reward {reward}")
pprint(f"Terminated {terminated}")
pprint(f"Truncated {truncated}")