## Cyrillic to Mongolian script converter

This demo converts texts written with the Mongolian cyrillic script (modern spoken language in Mongolia) into the traditional Mongolian script (old pronunciation used many hundred years ago) using deep learning. To test this demo, click on "**Runtime->Run All**" (Google account required).


For more information like implementation details, visit the repo: https://github.com/tugstugi/mongolian-nlp/tree/master/bichig2cyrillic


## Install dependencies

In [0]:
import os
from os.path import exists, join, expanduser

if not exists('fairseq'):
  ! git clone --quiet https://github.com/pytorch/fairseq.git && cd fairseq && git checkout 9dd8724 && pip install -q -r requirements.txt

## Download pretrained model

In [0]:
def download_from_google_drive(file_id, file_name):
  # download a file from the Google Drive link
  !rm -f ./cookie
  !curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id={file_id}" > /dev/null
  confirm_text = !awk '/download/ {print $NF}' ./cookie
  confirm_text = confirm_text[0]
  !curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm={confirm_text}&id={file_id}" -o {file_name}
  
checkpoint_name = 'cyrillic2bichig-checkpoint'
if not exists(checkpoint_name):
  checkpoint_file_name = '%s.tar.gz' % checkpoint_name
  download_from_google_drive('1OwFyo6EV1_8Tz79YIXC57UV8OvM2I8gf', checkpoint_file_name)
  ! tar xvfz {checkpoint_file_name}

In [0]:
if not exists('cyrillic2bichig.py'):
  ! wget -q -O cyrillic2bichig.py https://raw.githubusercontent.com/tugstugi/mongolian-nlp/master/bichig2cyrillic/cyrillic2bichig.py

## Convert

Each line should be no longer than 120 characters!

In [4]:
sentences = [
    "Хэн хүнтэй үг ярина гэдэг",
    "Хэрэг дээрээ тулалдаан юм",
    "Халуун хүйтэн ямар ч зэвсгээс",
    "Хатуу зөөлөн үг хүчтэй",
    
    "Үгээр хүнийг захирч болно",
    "Үхүүлж ч болно сэхээж ч болно",
    "Шалдлуулж, инээлгэж, уйлуулж болно",
    "Шархлуулж бас эдгээж болно"
]
sentences = "\n".join(sentences)

! echo "{sentences}" | python3 cyrillic2bichig.py --path {checkpoint_name}/checkpoint.pt --source-lang cyrillic --target-lang bichig {checkpoint_name}

| [cyrillic] dictionary: 40 types
| [bichig] dictionary: 40 types
ᠬᠡᠨ ᠬᠦᠮᠦᠨ ᠲᠡᠢ ᠦᠭᠡ ᠶᠠᠷᠢᠨ᠎ᠠ ᠭᠡᠳᠡᠭ
ᠬᠡᠷᠡᠭ ᠳᠡᠭᠡᠷ᠎ᠡ ᠪᠡᠨ ᠲᠤᠯᠤᠯᠳᠤᠭᠠᠨ ᠶᠤᠮ
ᠬᠠᠯᠠᠭᠤᠨ ᠬᠦᠢᠲᠡᠨ ᠶᠠᠮᠠᠷ ᠴᠤ ᠵᠡᠪᠰᠡᠭ ᠡᠴᠡ
ᠬᠠᠲᠠᠭᠤ ᠵᠥᠭᠡᠯᠡᠨ ᠦᠭᠡ ᠬᠦᠴᠦᠲᠡᠢ
ᠦᠭᠡ ᠪᠡᠷ ᠬᠦᠮᠦᠨ ᠢ ᠵᠠᠬᠢᠷᠴᠢ ᠪᠣᠯᠤᠨ᠎ᠠ
ᠦᠬᠦᠭᠦᠯᠵᠦ ᠴᠤ ᠪᠣᠯᠤᠨ᠎ᠠ ᠰᠡᠬᠡᠭᠡᠵᠦ ᠴᠤ ᠪᠣᠯᠤᠨ᠎ᠠ
ᠱᠠᠯᠳᠠᠯᠠᠭᠤᠯᠵᠤ ᠢᠨᠢᠶᠡᠯᠭᠡᠵᠦ ᠤᠬᠢᠯᠠᠭᠤᠯᠵᠤ ᠪᠣᠯᠤᠨ᠎ᠠ
ᠰᠢᠷᠬᠠᠯᠠᠭᠤᠯᠵᠤ ᠪᠠᠰᠠ ᠡᠳᠡᠭᠡᠭᠡᠵᠦ ᠪᠣᠯᠤᠨ᠎ᠠ
