sahajBERT

Downstream evaluation

We have two downstream task NER and NCC

The datasets have been used here are:

NER: wikiann, bn

NCC: indic_glue, sna.bn

To read more about the datasets visit WikiANN, IndicGLUE

Model link - sahajBERT-xlarge

NER

1. Clone the sahajbert repo and prepare the env by intalling requirements.

git clone https://github.com/tanmoyio/sahajBERT.git
cd sahajbert
pip install -r requirements.txt
pip install -q https://github.com/learning-at-home/hivemind/archive/sahaj2.zip
pip install seqeval

2. Run the following command

!python train_ner.py \
  --model_name_or_path Upload/sahajbert2 --output_dir sahajbert/ner \
  --learning_rate 3e-5 --max_seq_length 256 --num_train_epochs 20 \
  --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --gradient_accumulation_steps 8 \
  --early_stopping_patience 3 --early_stopping_threshold 0.01

This will give you a prompt, and you need to provide your Huggingface username and password. (We don't store huggingface password) this is only to allow your score to be reflected in the leaderboard.

Leaderboard link - sahajBERT2-xlarge-ner

If you are using GPU, or finetuning it with colab GPU then you might want to adjust the per_device_train_batch_size, per_device_train_batch_size.

NCC

!python train_ncc.py \
  --model_name_or_path Upload/sahajbert2 --output_dir sahajbert/ner \
  --learning_rate 1e-5 --max_seq_length 128 --num_train_epochs 20 \
  --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --gradient_accumulation_steps 8 \
  --early_stopping_patience 3 --early_stopping_threshold 0.01

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
lib		lib
tests		tests
tokenizer		tokenizer
.gitignore		.gitignore
README.md		README.md
arguments.py		arguments.py
callback.py		callback.py
data.py		data.py
data_collator.py		data_collator.py
huggingface_auth.py		huggingface_auth.py
model.json		model.json
requirements.txt		requirements.txt
run_trainer.py		run_trainer.py
run_trainer_tpu.py		run_trainer_tpu.py
run_training_monitor.py		run_training_monitor.py
tokenization_albert_bengali_fast.py		tokenization_albert_bengali_fast.py
train_ncc.py		train_ncc.py
train_ner.py		train_ner.py
utils.py		utils.py

tanmoyio/sahajBERT

Folders and files

Latest commit

History

Repository files navigation

sahajBERT

Downstream evaluation

NER

1. Clone the sahajbert repo and prepare the env by intalling requirements.

2. Run the following command

This will give you a prompt, and you need to provide your Huggingface username and password. (We don't store huggingface password) this is only to allow your score to be reflected in the leaderboard.

NCC

About

Resources

Stars

Watchers

Forks

Languages