Skip to content

tommyma3/Algorithm_Distillation

Repository files navigation

Algorithm_Distillation

Implementation of algorithm distillation on darkroom environments

Original Paper

https://arxiv.org/abs/2210.14215

Results (after 50000 training timesteps)

Evaluation goals: [array([4, 2]), array([5, 6]), array([6, 8]), array([7, 2]), array([3, 6]), array([0, 5]), array([5, 8]), array([5, 4])]
Mean reward per environment: [17.062 17.102 14.094 0.022 16.1 14.434 6.82 0.49 ]
Overall mean reward: 10.7655
Std deviation: 7.961595929837183

Figures

Training Loss:
training_loss

Testing Loss: testing_loss

Learning Rate Schedule lr_schedule

About

Implementation of algorithm distillation on darkroom environments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages