Sign-Language-project

Contributors

_김영민

_곽민지

_이다인

_김영은

Abstract

📑Keypoint based Sign Language Translation without glosses

Sign Language Translation (SLT) is a task that has not been studied relatively much compared to the study of Sign Language Recognition (SLR). However, the SLR is a study that recognizes the unique grammar of sign language, which is different from the spoken language and has a problem that non-disabled people cannot easily interpret. So, we're going to solve the problem of translating directly spoken language in sign language video. To this end, we propose a new keypoint normalization method for performing translation based on the skeleton point of the signer and robustly normalizing these points in sign language translation. It contributed to performance improvement by a customized normalization method depending on the body parts. In addition, we propose a stochastic frame selection method that enables frame augmentation and sampling at the same time. Finally, it is translated into the spoken language through an Attention-based translation model. Our method can be applied to various datasets in a way that can be applied to datasets without glosses. In addition, quantitative experimental evaluation proved the excellence of our method.

Survey

STMC-Transformer : Paper Review(KYM), Paper Review(LDI), Paper Review(KYE)
STMC : Paper Review(KYM), Paper Review(KYE)
NSLT : Paper Review(LDI), Paper Review(KYE), Paper Review(KYM), Paper Review(KMJ)

Environment

OS : Ubuntu 18.04.5(Docker) LTS or Colab
Cuda : 10.0
GPU : Tesla V100-32GB

Data

sample video downlaod - $ sh download_sh/sample_data_dowonload.sh

DataSet Download

Enviorment Setting

$ pip install -r requirements.txt
$ python -m pip install cython
$ sudo apt-get install libyaml-dev

Setting(Alphapose)

$ git clone https://github.com/winston1214/Sign-Language-project.git && cd Sign-Language-project
$ python setup.py build develop

If you don't run in the COLAB environment or the cuda version is 10.0, refer to this link.

Download pretrained File(Please Download)

If you run this command, you can download weight file at once. $ sh downlaod_sh/weight_download.sh

PreProcessing

1. Split frame

$ python frame_split.py # You have to add the main code.

2. Extract KeyPoint(Alphapose)

python scripts/demo_inference.py --cfg configs/halpe_136/resnet/256x192_res50_lr1e-3_2x-regression.yaml --checkpoint pretrained_models/halpe136_fast_res50_256x192.pt --indir ${img_folder_path} --outdir ${save_dir_path} --form boaz --vis_fast --sp

If you use multi-gpu, you don't have to sp option

Extract KeyPoint

Train

$ python train.py --X_path ${X_train.pickle path} --save_path ${model save directory} \
--pt_name ${save pt model name} --model ${LSTM or GRU} --batch ${BATCH SIZE}

## Example

$ python train.py --X_path /sign_data/ --save_path pt_file/ \
--pt_name model1.pt --model GRU --batch 128 --epochs 100 --dropout 0.5

X_train.pickle : For convenience, we stored and used the values extracted from the keypoint in pickle file format.
- (shape : [video_len, max_frame_len, keypoint_len] # [7129, 376, 246] )

Inference

$ python inference.py --video ${VIDEO_NAME} --outdir ${SAVE_PATH} --pt ${WEIGHT_PATH} --model ${MODEL NAME}

You can simply enjoy demo video at the COLAB

Result

Model	Hyperparameter	Metrics	Final Model
GRU-Attention	Adam CrossEntropy	BLEU	93.4
	Adam CrossEntropy	Accuracy	93.5
	AdamW Scheduler	BLEU	95.1
	AdamW Scheduler	Accuracy	95.0
LSTM	Adam CrossEntropy	BLEU	49.6
	Adam CrossEntropy	Accuracy	50.0
	AdamW Scheduler	BLEU	51.5
	AdamW Scheduler	Accuracy	51.5

We selected a method that applied the (HAND+BODY Keypoint) + (All Frame Random Augmentation) + (Frame Noramlization) technique as the final model.

More experimental results are shown here.

Demo Video

youtube link

final_video.mp4

Citation

@misc{https://doi.org/10.48550/arxiv.2204.10511,
  doi = {10.48550/ARXIV.2204.10511},
  
  url = {https://arxiv.org/abs/2204.10511},
  
  author = {Kim, Youngmin and Kwak, Minji and Lee, Dain and Kim, Yeongeun and Baek, Hyeongboo},
  
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Keypoint based Sign Language Translation without Glosses},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution 4.0 International}
}

Name		Name	Last commit message	Last commit date
Latest commit History 469 Commits
.idea		.idea
__pycache__		__pycache__
alphapose		alphapose
asl_ann		asl_ann
colab		colab
configs		configs
detector		detector
docs		docs
download_sh		download_sh
keti_ann		keti_ann
model		model
picture		picture
preprocessing		preprocessing
presentation		presentation
pretrained_models		pretrained_models
pytorch-openpose		pytorch-openpose
scripts		scripts
survey_paper		survey_paper
test		test
trackers		trackers
utils		utils
.gitignore		.gitignore
Inference.ipynb		Inference.ipynb
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sign-Language-project

Contributors

Abstract

Survey

Environment

Data

Enviorment Setting

PreProcessing

Extract KeyPoint

Train

Inference

Result

Demo Video

Citation

About

Releases

Packages

Contributors 4

Languages

winston1214/Sign-Language-project

Folders and files

Latest commit

History

Repository files navigation

Sign-Language-project

Contributors

Abstract

Survey

Environment

Data

Enviorment Setting

PreProcessing

Extract KeyPoint

Train

Inference

Result

Demo Video

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages