Highly Imitative Reinforcement Learning for UCAV

Result Charts

Models and Results

The best models trained by BC、TD3、SAC、E-SAC、HIRL (our method) are stored in the ./results folder. The validation results of the models are as follows (validation results without random initialization and with random initialization are obtained by running 50 episodes with 5 different random seeds, '±' indicates standard deviation; the number of hits and launches are obtained by running 10 episodes; the best results are highlighted in bold).

Validation Results without Random Initialization

Methods	Shoot-down Success Rate	Hit Success Rate	Rewards
HIRL (adaptive)	100.0% ± 0.0%	100.0% ± 0.0%	-680.8 ± 6.7
HIRL (linear)	100.0% ± 0.0%	100.0% ± 0.0%	-953.9 ± 13.8
TD3	0.0% ± 0.0%	0.0% ± 0.0%	-4707.2 ± 0.0
E-SAC	100.0% ± 0.0%	100.0% ± 0.0%	-1431.2 ± 0.2
SAC	100.0% ± 0.0%	0.0% ± 0.0%	-2985.7 ± 0.0
BC	62.8% ± 1.0%	62.8% ± 1.0%	-12228.3 ± 880.2

Validation Results with Random Initialization

Methods	Shoot-down Success Rate	Hit Success Rate	Rewards
HIRL (adaptive)	98.0% ± 1.3%	98.0% ± 1.3%	-1436.0 ± 238.9
HIRL (linear)	86.0% ± 5.4%	86.0% ± 5.4%	-5800.8 ± 1420.3
TD3	0.0% ± 0.0%	0.0% ± 0.0%	-5720.9 ± 715.8
E-SAC	90.0% ± 2.8%	90.0% ± 2.8%	-3722.2 ± 395.5
SAC	44.0% ± 3.3%	0.0% ± 0.0%	-8318.1 ± 822.8
BC	22.4% ± 3.2%	22.4% ± 3.2%	-20504.7 ± 1156.3

Launch Efficiency Results

Methods	Hits / Launches
HIRL (adaptive)	100.0%
HIRL (linear)	100.0%
E-SAC	11.4%
BC	92.3%

Policy Display

Display the effectiveness of the policy trained by the HIRL.

One Missile

Infinite Missiles

Getting Started

Installation Requirements

It is recommended to use a computer with Windows operating system (we have tried using Linux, but it seems that Harfang3D is not compatible).
Install Harfang3D sandbox from the release or source. It is recommended to install from source for more flexibility, such as customizing the network port of the environment.
Install the dependencies required for this code.
```
conda env create -f environment.yaml
```

Training

In the Harfang3D sandbox folder, use the following command to open Harfang3D sandbox. You can specify the port number with network_port. After opening, you need to manually enter the network mode.
```
cd source
python main.py network_port 12345
```

In the HIRL4UCAV folder, use the following command to start training (note to modify the IP number in the train_all.py; use --render to enable training rendering, and use --plot to draw visualization results).

# HIRL (adaptive)
python train_all.py --agent ROT --port 12345 --type soft --model_name srot

# HIRL (linear)
python train_all.py --agent ROT --port 12345 --type linear --bc_weight 1 --model_name lrot

# HIRL (fixed)
python train_all.py --agent ROT --port 12345 --type fixed --bc_weight 0.5 --model_name frot

# TD3
python train_all.py --agent TD3 --port 12345 --model_name td3

# BC
python train_all.py --agent BC --port 12345 --model_name bc

# SAC
python train_sac.py --type sac --port 12345 --model_name sac

# E-SAC
python train_sac.py --type esac --port 12345 --model_name esac

Validation

In the Harfang3D sandbox folder, use the following command to open Harfang3D sandbox. You can specify the port number with network_port. After opening, you need to manually enter the network mode.
```
cd source
python main.py network_port 12345
```

To test the BC, TD3, and HIRL models, use the following command in the HIRL4UCAV folder (note to modify the IP number and the model name in the train_all.py (only the name before 'xxx_Harfang_GYM' is needed); use --render to enable test rendering).

# Sucess Rate Validation
# Add '--test --test_mode n' to the end of the corresponding training command. 'test mode 1' is the random initialization mode, 'test mode 2' is the infinite missiles mode, and 'test mode 3' is the original environment
# Here's an example
python train_all.py --agent ROT --port 12345 --type soft --model_name srot --test --test_mode 1 --seed 1

# Reward Validation
# Add '--test --test_mode n' to the end of the corresponding training command. 'test mode 4' is the random initialization mode, and 'test mode 5' is the original environment
# Here's an example
python train_all.py --agent ROT --port 12345 --type soft --model_name srot --test --test_mode 4 --seed 1

To test the SAC and E-SAC models, use the following command in the HIRL4UCAV folder (types of test mode are as described above).
```
python validate_sac.py --test_mode 1 --port 12345 --seed 1 
```

Citation

@misc{li2024imitative,
    title={An Imitative Reinforcement Learning Framework for Autonomous Dogfight}, 
    author={Siyuan Li and Rongchang Zuo and Peng Liu and Yingnan Zhao},
    year={2024},
    eprint={2406.11562},
    archivePrefix={arXiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Environment		Environment
Network		Network
Tools		Tools
expert_data		expert_data
results		results
videos		videos
.DS_Store		.DS_Store
.gitignore		.gitignore
Constants.py		Constants.py
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
ReplayMemory.py		ReplayMemory.py
dogfight_client.py		dogfight_client.py
environment.yaml		environment.yaml
socket_lib.py		socket_lib.py
train_all.py		train_all.py
train_sac.py		train_sac.py
validate_sac.py		validate_sac.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Highly Imitative Reinforcement Learning for UCAV

Result Charts

Models and Results

Validation Results without Random Initialization

Validation Results with Random Initialization

Launch Efficiency Results

Policy Display

One Missile

Infinite Missiles

Getting Started

Installation Requirements

Training

Validation

Citation

About

Releases

Packages

Languages

License

zrc0622/HIRL4UCAV

Folders and files

Latest commit

History

Repository files navigation

Highly Imitative Reinforcement Learning for UCAV

Result Charts

Models and Results

Validation Results without Random Initialization

Validation Results with Random Initialization

Launch Efficiency Results

Policy Display

One Missile

Infinite Missiles

Getting Started

Installation Requirements

Training

Validation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages