김영민 |
곽민지 |
이다인 |
김영은 |
- STMC-Transformer : Paper Review(KYM), Paper Review(LDI), Paper Review(KYE)
- STMC : Paper Review(KYM), Paper Review(KYE)
- NSLT : Paper Review(LDI), Paper Review(KYE), Paper Review(KYM), Paper Review(KMJ)
- OS : Ubuntu 18.04.5(Docker) LTS or Colab
- Cuda : 10.0
- GPU : Tesla V100-32GB
- sample video downlaod -
$ sh download_sh/sample_data_dowonload.sh
$ pip install -r requirements.txt
$ python -m pip install cython
$ sudo apt-get install libyaml-dev
- Setting(Alphapose)
$ git clone https://github.com/winston1214/Sign-Language-project.git && cd Sign-Language-project
$ python setup.py build develop
If you don't run in the COLAB environment or the cuda version is 10.0, refer to this link.
- Download pretrained File(Please Download)
If you run this command, you can download weight file at once. $ sh downlaod_sh/weight_download.sh
1. Split frame
$ python frame_split.py # You have to add the main code.
2. Extract KeyPoint(Alphapose)
python scripts/demo_inference.py --cfg configs/halpe_136/resnet/256x192_res50_lr1e-3_2x-regression.yaml --checkpoint pretrained_models/halpe136_fast_res50_256x192.pt --indir ${img_folder_path} --outdir ${save_dir_path} --form boaz --vis_fast --sp
If you use multi-gpu, you don't have to sp option
$ python train.py --X_path ${X_train.pickle path} --save_path ${model save directory} \
--pt_name ${save pt model name} --model ${LSTM or GRU} --batch ${BATCH SIZE}
## Example
$ python train.py --X_path /sign_data/ --save_path pt_file/ \
--pt_name model1.pt --model GRU --batch 128 --epochs 100 --dropout 0.5
- X_train.pickle : For convenience, we stored and used the values extracted from the keypoint in pickle file format.
- (shape : [video_len, max_frame_len, keypoint_len] # [7129, 376, 246] )
$ python inference.py --video ${VIDEO_NAME} --outdir ${SAVE_PATH} --pt ${WEIGHT_PATH} --model ${MODEL NAME}
You can simply enjoy demo video at the COLAB
Model | Hyperparameter | Metrics | Final Model |
---|---|---|---|
GRU-Attention | Adam CrossEntropy |
BLEU | 93.4 |
Accuracy | 93.5 | ||
AdamW Scheduler |
BLEU | 95.1 | |
Accuracy | 95.0 | ||
LSTM | Adam CrossEntropy |
BLEU | 49.6 |
Accuracy | 50.0 | ||
AdamW Scheduler |
BLEU | 51.5 | |
Accuracy | 51.5 |
We selected a method that applied the (HAND+BODY Keypoint) + (All Frame Random Augmentation) + (Frame Noramlization) technique as the final model.
More experimental results are shown here.
final_video.mp4
@misc{https://doi.org/10.48550/arxiv.2204.10511,
doi = {10.48550/ARXIV.2204.10511},
url = {https://arxiv.org/abs/2204.10511},
author = {Kim, Youngmin and Kwak, Minji and Lee, Dain and Kim, Yeongeun and Baek, Hyeongboo},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Keypoint based Sign Language Translation without Glosses},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}