Implementation of algorithm distillation on darkroom environments
https://arxiv.org/abs/2210.14215
Evaluation goals: [array([4, 2]), array([5, 6]), array([6, 8]), array([7, 2]), array([3, 6]), array([0, 5]), array([5, 8]), array([5, 4])]
Mean reward per environment: [17.062 17.102 14.094 0.022 16.1 14.434 6.82 0.49 ]
Overall mean reward: 10.7655
Std deviation: 7.961595929837183


