This is a repo for our paper Offline Supervised Learning V.S. Online Direct Policy Optimization: A Comparative Study and A Unified Training Paradigm for Neural Network-Based Closed-Loop Optimal Control.
python==3.8
torch==1.8.1
scipy==1.7.3
numpy==1.21.4
tensorboardX==2.4.1
tqdm==4.62.3
All scripts are in ./scripts
.
- Generate data.
sh scripts/gen.sh
- The dataset in satellite's optimal attitude control problem is generated by HJB_NN.
- The adaptive dataset in quadrotor's optimal landing problem is generated by IVP Enhanced Sampling.
- You can fasten the generation by multi-processing, i.e.,
--num_processors 24
.
- Train with supervised learning.
sh scripts/sl.sh
- Train with direct policy optimization.
sh scripts/direct.sh
- Note that we apply torch_ACA in the implementation.
- Fine-tune a pre-trained network.
sh scripts/finetune.sh
- Compare performances via closed-loop simulations.
scripts/test.sh