No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning

Requirements

We assume you have access to a gpu that can run CUDA 11.6. Then, the simplest way to install all required dependencies is to create an anaconda environment by running:

conda env create -f conda_env.yml

After the installation ends you can activate your environment with:

source activate npm

Instructions

Phase 1: Training N-value network

To train similarity factor model on the GoToR3 task from image-based observations run:

python -m min_red.train \
    --f min_red/config/babyaiar \
    --algorithm_type PPO \
    --env_id GoToPositionBonus-v0 \
    --method Nill \
    --algorithm.learn.log_interval 10 \
    --total_timesteps 5000000

Run grid_search_*.py will excute multiple commands at the same time.

python min_red/grid_search_babyai.py

This will produce 'log' folder, where all the outputs are going to be stored including N-value network(mfmodel). One can attacha tensorboard to monitor training by running:

tensorboard --logdir log

and opening up tensorboad in your browser.

Phase 2: Utilizing N-value network

Add the path of N-value network(mfmodel) stored above in makppo/train.py/Env_mask_dict, run

python maskppo/grid_search_babyai.py

This will produce 'log' folder, where all the outputs are going to be stored.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
common		common
envs		envs
maskppo		maskppo
min_red		min_red
pureppo		pureppo
.gitattributes		.gitattributes
README.md		README.md
__init__.py		__init__.py
conda_env.yml		conda_env.yml
minigrid_render.py		minigrid_render.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning

Requirements

Instructions

Phase 1: Training N-value network

Phase 2: Utilizing N-value network

About

Releases

Packages

Languages

zhongdy15/npm

Folders and files

Latest commit

History

Repository files navigation

No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning

Requirements

Instructions

Phase 1: Training N-value network

Phase 2: Utilizing N-value network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages